# Tabular Playground Series - Oct 2021

#### Oct 01, 2021 to Oct 31, 2021

#### https://www.kaggle.com/c/tabular-playground-series-oct-2021/

#### _**Predicting the biological response of molecules given various chemical properties**_

Notebook Author:

| Name  | Pradip Kumar Das  |
| ------------: | :------------ |
| **Profile:**  | [LinkedIn](https://www.linkedin.com/in/daspradipkumar/ "LinkedIn") l [GitHub](https://github.com/PradipKumarDas "GitHub") l [Kaggle](https://www.kaggle.com/pradipkumardas "Kaggle")  |
| **Contact:**  | pradipkumardas@hotmail.com (Email)  |
| **Location:**  | Bengaluru, India  |

**Sections:**

* Dependencies
* Exploratory Data Analysis (EDA) & Preprocessing
* Modeling & Evaluation
* Submission

## Dependencies

In [1]:
# Loads required packages

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

from sklearn.model_selection import StratifiedKFold, cross_val_score
from sklearn.metrics import roc_auc_score

from xgboost import XGBClassifier
import xgboost as xgb

from hyperopt import hp, fmin, tpe, Trials, STATUS_OK

## Exploratory Data Analysis (EDA) & Preprocessing

In [2]:
# Loads train dataset
train = pd.read_csv("../input/tabular-playground-series-oct-2021/train.csv")

In [3]:
# Checks how the train data set looks
with pd.option_context('display.max_rows', 10, 'display.max_columns', None): 
    display(train)

Unnamed: 0,id,f0,f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f11,f12,f13,f14,f15,f16,f17,f18,f19,f20,f21,f22,f23,f24,f25,f26,f27,f28,f29,f30,f31,f32,f33,f34,f35,f36,f37,f38,f39,f40,f41,f42,f43,f44,f45,f46,f47,f48,f49,f50,f51,f52,f53,f54,f55,f56,f57,f58,f59,f60,f61,f62,f63,f64,f65,f66,f67,f68,f69,f70,f71,f72,f73,f74,f75,f76,f77,f78,f79,f80,f81,f82,f83,f84,f85,f86,f87,f88,f89,f90,f91,f92,f93,f94,f95,f96,f97,f98,f99,f100,f101,f102,f103,f104,f105,f106,f107,f108,f109,f110,f111,f112,f113,f114,f115,f116,f117,f118,f119,f120,f121,f122,f123,f124,f125,f126,f127,f128,f129,f130,f131,f132,f133,f134,f135,f136,f137,f138,f139,f140,f141,f142,f143,f144,f145,f146,f147,f148,f149,f150,f151,f152,f153,f154,f155,f156,f157,f158,f159,f160,f161,f162,f163,f164,f165,f166,f167,f168,f169,f170,f171,f172,f173,f174,f175,f176,f177,f178,f179,f180,f181,f182,f183,f184,f185,f186,f187,f188,f189,f190,f191,f192,f193,f194,f195,f196,f197,f198,f199,f200,f201,f202,f203,f204,f205,f206,f207,f208,f209,f210,f211,f212,f213,f214,f215,f216,f217,f218,f219,f220,f221,f222,f223,f224,f225,f226,f227,f228,f229,f230,f231,f232,f233,f234,f235,f236,f237,f238,f239,f240,f241,f242,f243,f244,f245,f246,f247,f248,f249,f250,f251,f252,f253,f254,f255,f256,f257,f258,f259,f260,f261,f262,f263,f264,f265,f266,f267,f268,f269,f270,f271,f272,f273,f274,f275,f276,f277,f278,f279,f280,f281,f282,f283,f284,target
0,0,0.205979,0.410993,0.176775,0.223581,0.423543,0.476140,0.413590,0.612021,0.534873,0.147295,0.026177,0.106613,0.200924,0.713191,0.155750,0.557335,0.341702,0.285720,0.230396,0.203957,0.509588,0.706972,1,0.007793,0.247765,0.263750,0.259555,0.231730,0.138379,0.197824,0.054392,0.194153,0.281500,0.034818,0.025334,0.114432,0.139203,0.246157,0.251371,0.701423,0.301182,0.193924,0.267497,0,0.193430,0.238630,0.154770,0.249857,0.210685,0.406662,0.214810,0.258668,0.377518,0.192042,0.340855,0.199660,0.264074,0.205550,0.075109,0.205688,0.178962,0.245008,0.519336,0.306419,0.127139,0.367479,0.236380,0.195694,0.013195,0.199588,0.283367,0.168824,0.004855,0.117723,0.257688,0.197262,0.211452,0.372637,0.198157,0.689970,0.449955,0.713110,0.212041,0.183619,0.288667,0.648678,0.600398,0.223267,0.590163,0.248847,0.795641,0.139932,0.618696,0.639142,0.008549,0.559151,0.573640,0.138808,0.499156,0.112203,0.181498,0.165887,0.093171,0.106952,0.127861,0.250924,0.501673,0.036740,0.111361,0.075918,0.019444,0.250760,0.465093,0.087502,0.004185,0.195936,0.166389,0.171328,0.146014,0.199232,0.133999,0.168191,0.010242,0.294490,0.012977,0.003969,0.013739,0.040076,0.170711,0.250246,0.195538,0.708556,0.448925,0.550352,0.217984,0.751629,0.822459,0.186298,0.024197,0.044097,0.078943,0.181147,0.022591,0.576712,0.406843,0.510578,0.799434,0.651125,0.460708,0.636714,0.350704,0.872989,0.007751,0.017103,0.019875,0.203042,0.864594,0.595877,0.542969,0.990250,0.020373,0.006238,0.011040,0.006019,0.407014,0.080140,0.013502,0.144265,0.007229,0.003256,0.014556,0.123806,0.133871,0.011531,0.010250,0.253750,0.090162,0.147857,0.303087,0.112764,0.104344,0.168583,0.011342,0.239028,0.008018,0.167653,0.217342,0.184178,0.179060,0.078009,0.135768,0.006534,0.009832,0.013317,0.390079,0.004801,0.056600,0.114139,0.012599,0.014818,0.446073,0.216079,0.152113,0.111237,0.170896,0.190477,0.011936,0.005227,0.429740,0.013060,0.199369,0.258410,0.208863,0.129545,0.006978,0.012948,0.049466,0.008804,0.114205,0.119683,0.191210,0.169976,0.188199,0.355674,0.013164,0.304878,0.007213,0.011277,0.083186,0.010624,0.031199,0.200306,0.195791,0.203470,0.036314,0.157711,0.199117,0.007444,0.189048,0.202540,0.273267,0.167211,1,0,1,1,1,0,0,0,0,0,0,1,0,1,1,0,0,0,1,0,0,1,1,0,0,0,1,1,0,1,0,1,1,0,0,1,0,0,0,0,0,0,0,1
1,1,0.181004,0.473119,0.011734,0.213657,0.619678,0.441593,0.230407,0.686013,0.281971,0.238509,0.493411,0.107277,0.231828,0.457150,0.395200,0.617088,0.459358,0.209225,0.201098,0.199383,0.366578,0.585788,1,0.285311,0.400367,0.162493,0.249365,0.141160,0.133688,0.247906,0.139251,0.216444,0.109674,0.033018,0.017458,0.189336,0.168785,0.184251,0.202753,0.218451,0.324364,0.255499,0.287433,0,0.821982,0.284351,0.177537,0.207924,0.209812,0.186463,0.177476,0.321973,0.244173,0.224053,0.186513,0.305412,0.130774,0.170331,0.241071,0.246026,0.146584,0.225636,0.447242,0.176352,0.201398,0.269459,0.292691,0.180824,0.013497,0.228739,0.472396,0.168721,0.004312,0.300805,0.415982,0.196219,0.172790,0.352968,0.232610,0.606188,0.460567,0.703051,0.213396,0.233853,0.022559,0.529151,0.496101,0.188039,0.312640,0.210123,0.835844,0.169867,0.688726,0.678031,0.004745,0.145737,0.162314,0.232351,0.017236,0.111834,0.210767,0.322257,0.094693,0.107586,0.015167,0.366396,0.161434,0.037784,0.187185,0.118134,0.013152,0.232125,0.146513,0.084309,0.138346,0.281625,0.166673,0.172128,0.087087,0.177970,0.233744,0.217005,0.008799,0.028630,0.007347,0.004784,0.012283,0.081418,0.172358,0.254012,0.130545,0.708918,0.427950,0.551945,0.222525,0.747199,0.820548,0.186604,0.861636,0.536550,0.080744,0.019845,0.032847,0.623617,0.034981,0.714859,0.772612,0.238379,0.134555,0.458582,0.349000,0.874695,0.026635,0.617291,0.011202,0.201488,0.866689,0.041795,0.632950,0.994544,0.013255,0.006908,0.003739,0.006478,0.090468,0.081765,0.015086,0.144639,0.008808,0.005708,0.011317,0.007997,0.133067,0.008010,0.005768,0.022016,0.091508,0.108732,0.180742,0.008115,0.147373,0.196966,0.012900,0.402807,0.366473,0.137001,0.254902,0.139658,0.173887,0.077933,0.133428,0.132297,0.008790,0.623407,0.407350,0.012872,0.059680,0.051551,0.007860,0.014882,0.161739,0.240681,0.202786,0.057226,0.171921,0.093504,0.011285,0.008301,0.438884,0.008535,0.197430,0.269382,0.476575,0.193125,0.164553,0.193796,0.051939,0.112805,0.117136,0.007497,0.188227,0.170947,0.017015,0.014936,0.009356,0.008095,0.215460,0.011031,0.339500,0.009640,0.028568,0.233639,0.195675,0.203766,0.084015,0.206317,0.249256,0.003758,0.269871,0.200669,0.166494,0.211146,1,0,0,0,1,0,0,1,0,1,0,1,1,0,0,0,1,0,0,0,0,1,1,0,1,1,0,0,0,1,0,0,1,1,0,1,0,0,0,0,0,0,0,1
2,2,0.182583,0.307431,0.325950,0.207116,0.605699,0.309695,0.493337,0.751107,0.536272,0.286813,0.139532,0.107222,0.247791,0.631949,0.347463,0.642173,0.257763,0.162548,0.327377,0.193583,0.495440,0.636742,0,0.007133,0.309747,0.221081,0.284810,0.230828,0.138271,0.199742,0.060408,0.146746,0.208131,0.035977,0.022631,0.113542,0.274871,0.182770,0.151502,0.570035,0.271744,0.206439,0.207690,0,0.162094,0.176569,0.198756,0.203000,0.209620,0.189327,0.175949,0.179615,0.215551,0.239486,0.211676,0.175939,0.410515,0.169127,0.078051,0.254501,0.246476,0.271976,0.749593,0.175384,0.124792,0.171549,0.262281,0.300476,0.216310,0.164643,0.280466,0.203379,0.004507,0.216858,0.274105,0.196220,0.162854,0.318504,0.261242,0.564383,0.441597,0.664491,0.218721,0.154807,0.017467,0.645717,0.364822,0.185882,0.211491,0.293360,0.477802,0.495697,0.512464,0.702827,0.014696,0.144596,0.163786,0.228586,0.171948,0.110486,0.284144,0.340463,0.090675,0.374007,0.009785,0.383777,0.214091,0.038442,0.108502,0.156583,0.020024,0.214525,0.247209,0.085933,0.009039,0.182059,0.165421,0.250517,0.088849,0.166452,0.022483,0.189192,0.180499,0.266176,0.009663,0.003502,0.016130,0.042034,0.167598,0.274962,0.129131,0.705977,0.429651,0.575250,0.224012,0.715692,0.819017,0.186921,0.029521,0.573076,0.080627,0.026106,0.408483,0.070608,0.738622,0.511274,0.749761,0.653871,0.495641,0.377781,0.343877,0.785149,0.019259,0.003287,0.425050,0.205863,0.401814,0.686110,0.497687,0.992871,0.028340,0.390101,0.014232,0.010712,0.090032,0.077405,0.013557,0.142851,0.280431,0.006170,0.018172,0.017071,0.135464,0.008306,0.012026,0.009468,0.094520,0.105624,0.205675,0.011306,0.106958,0.255710,0.011141,0.213636,0.010074,0.194227,0.333723,0.182061,0.016332,0.075166,0.135050,0.252896,0.009649,0.015806,0.013116,0.122997,0.143064,0.052889,0.011710,0.012503,0.289838,0.163251,0.152532,0.058919,0.234702,0.099286,0.009230,0.007023,0.015642,0.009003,0.198217,0.188984,0.277815,0.155073,0.010954,0.006752,0.049881,0.008853,0.116385,0.127872,0.154898,0.152490,0.019486,0.346187,0.011628,0.004198,0.006899,0.009546,0.084795,0.009715,0.026858,0.221741,0.195907,0.204326,0.142590,0.291537,0.205421,0.008807,0.125082,0.199523,0.196465,0.238307,0,0,1,1,1,1,0,0,1,0,0,0,0,0,0,0,1,1,0,1,0,0,0,0,0,0,0,0,1,1,0,0,1,0,0,0,0,1,1,0,0,0,0,1
3,3,0.180240,0.494592,0.008367,0.223580,0.760618,0.439211,0.432055,0.776147,0.483958,0.260886,0.147122,0.105433,0.287755,0.455777,0.247971,0.616628,0.335907,0.337025,0.239127,0.176163,0.538269,0.706468,1,0.008829,0.353799,0.219977,0.266858,0.145617,0.138590,0.234925,0.059817,0.140886,0.205023,0.319312,0.009972,0.112291,0.288915,0.332632,0.140831,0.473845,0.423955,0.243941,0.259576,0,0.834834,0.176857,0.158319,0.387808,0.334391,0.187999,0.498320,0.177429,0.235699,0.175250,0.187267,0.231274,0.130341,0.180271,0.177084,0.277188,0.149774,0.240298,0.605277,0.172028,0.130512,0.258989,0.191919,0.228763,0.209845,0.179141,0.481151,0.167182,0.002806,0.296244,0.260443,0.287999,0.211924,0.465532,0.178460,0.545248,0.477845,0.682800,0.228464,0.160022,0.451457,0.621233,0.618612,0.165683,0.204384,0.159530,0.496470,0.266110,0.459905,0.127136,0.008347,0.146811,0.162876,0.283790,0.016037,0.361132,0.184569,0.332192,0.093752,0.168246,0.310551,0.255281,0.134932,0.036258,0.112279,0.122642,0.023160,0.246768,0.280350,0.085584,0.081637,0.167441,0.295566,0.171693,0.148490,0.164541,0.026670,0.168976,0.090270,0.021582,0.010865,0.008915,0.010688,0.040018,0.168854,0.249082,0.137523,0.708682,0.416802,0.571273,0.248067,0.723089,0.819892,0.184337,0.731664,0.651212,0.079189,0.177090,0.030526,0.549967,0.692899,0.731004,0.132668,0.654234,0.498540,0.577893,0.346137,0.876881,0.023904,0.023135,0.428267,0.206447,0.874224,0.602061,0.023516,0.992893,0.699298,0.011072,0.004720,0.013113,0.091527,0.080883,0.008909,0.208734,0.008751,0.006102,0.008031,0.015318,0.133295,0.010310,0.011034,0.018553,0.395968,0.105859,0.267114,0.012912,0.234323,0.169078,0.402167,0.145057,0.007870,0.209357,0.220980,0.139231,0.005936,0.078739,0.137634,0.007462,0.005653,0.004250,0.014109,0.011378,0.144935,0.324393,0.006294,0.011595,0.395579,0.163644,0.152773,0.053289,0.173113,0.142118,0.007412,0.008155,0.012873,0.008182,0.199155,0.189123,0.117435,0.292135,0.010339,0.007676,0.053168,0.005171,0.113209,0.130026,0.190459,0.173361,0.018331,0.013659,0.008731,0.004390,0.007438,0.006251,0.083015,0.009723,0.029619,0.191215,0.196332,0.204260,0.031132,0.198538,0.280397,0.002342,0.125658,0.198827,0.171466,0.216006,1,0,1,1,1,1,0,1,1,0,1,1,0,1,0,0,0,0,0,0,0,0,1,0,1,1,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1
4,4,0.177172,0.495513,0.014263,0.548819,0.625396,0.562493,0.117158,0.561255,0.077115,0.158321,0.260239,0.102561,0.265285,0.503776,0.269776,0.545945,0.319548,0.278538,0.214922,0.200239,0.534551,0.728652,0,0.004840,0.323546,0.166292,0.285516,0.208651,0.200394,0.198155,0.414729,0.251997,0.193405,0.034490,0.016465,0.197312,0.207429,0.255857,0.139875,0.321039,0.182084,0.250127,0.210941,0,0.844187,0.180589,0.162991,0.240422,0.242435,0.186832,0.178233,0.179031,0.216712,0.249345,0.204219,0.172806,0.286873,0.183107,0.085606,0.334205,0.142723,0.204273,0.415167,0.173777,0.131868,0.312570,0.191401,0.183941,0.213248,0.149717,0.479134,0.172482,0.004219,0.201649,0.215576,0.195598,0.154246,0.345664,0.189286,0.572118,0.413406,0.684964,0.216381,0.212603,0.007848,0.536482,0.557684,0.198738,0.178637,0.245034,0.514771,0.427435,0.511809,0.521353,0.007111,0.148517,0.162357,0.226104,0.008161,0.113454,0.255760,0.265153,0.093830,0.107421,0.126748,0.030215,0.207409,0.132724,0.109432,0.076661,0.026545,0.158171,0.307366,0.083699,0.014569,0.185126,0.166700,0.233375,0.089167,0.180105,0.018862,0.169766,0.254478,0.315526,0.003778,0.005913,0.012726,0.047912,0.274214,0.252435,0.437068,0.706935,0.416283,0.550451,0.219750,0.723113,0.821810,0.186963,0.019616,0.541864,0.076189,0.012322,0.019452,0.786651,0.027390,0.632851,0.798959,0.657175,0.490563,0.763264,0.350492,0.739302,0.010828,0.010421,0.010650,0.200671,0.863533,0.564180,0.023965,0.996923,0.010334,0.009325,0.010400,0.009673,0.092484,0.081759,0.323262,0.143824,0.010349,0.003228,0.008909,0.012079,0.132252,0.011380,0.008832,0.118965,0.092664,0.186997,0.150604,0.188779,0.187091,0.170887,0.009321,0.137853,0.386982,0.146626,0.296619,0.096961,0.099653,0.077411,0.177526,0.279102,0.010257,0.005231,0.015248,0.013281,0.148521,0.056396,0.330496,0.129898,0.404324,0.233770,0.153135,0.054062,0.170713,0.094292,0.013731,0.003193,0.026043,0.010720,0.197807,0.227081,0.116094,0.258921,0.003894,0.009375,0.051085,0.008813,0.111952,0.010975,0.156167,0.176744,0.024253,0.006803,0.012142,0.007353,0.007295,0.006527,0.079727,0.008673,0.027532,0.206406,0.194853,0.205720,0.174073,0.189884,0.165742,0.001957,0.169822,0.199123,0.141719,0.217416,0,0,1,1,0,0,0,1,1,1,0,1,0,1,0,0,1,1,0,1,0,0,0,1,0,0,0,0,1,1,0,0,1,0,0,1,1,0,1,0,0,1,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
999995,999995,0.204312,0.344754,0.262267,0.228333,0.610727,0.357463,0.490586,0.613655,0.509203,0.160930,0.262791,0.631298,0.327143,0.541911,0.277571,0.582371,0.322029,0.171293,0.196608,0.213131,0.497520,0.683594,0,0.273762,0.339086,0.388890,0.237183,0.142245,0.132996,0.276708,0.051839,0.234631,0.325391,0.039059,0.020510,0.112941,0.142593,0.183563,0.151597,0.604457,0.354327,0.322144,0.231860,1,0.143882,0.177168,0.156287,0.202758,0.320450,0.201398,0.173150,0.177977,0.215866,0.296305,0.205367,0.335649,0.128713,0.184210,0.106770,0.208409,0.255632,0.189426,0.404138,0.181171,0.125931,0.228369,0.469063,0.253408,0.209276,0.179139,0.476825,0.174419,0.261347,0.028785,0.196505,0.195525,0.272124,0.209617,0.211587,0.572790,0.454028,0.601744,0.237045,0.411318,0.019518,0.809855,0.629503,0.196614,0.279508,0.309201,0.493604,0.580058,0.528924,0.807524,0.008013,0.144023,0.471936,0.199511,0.010479,0.114480,0.338298,0.334263,0.091984,0.106869,0.119831,0.132316,0.176935,0.040265,0.111680,0.119953,0.034820,0.268049,0.147452,0.084836,0.143958,0.175745,0.165248,0.169723,0.084025,0.164808,0.019314,0.171139,0.009111,0.206969,0.008785,0.008422,0.014787,0.043309,0.328523,0.208353,0.301877,0.739353,0.426631,0.634769,0.221282,0.744722,0.823364,0.269432,0.016156,0.650936,0.078721,0.020639,0.379987,0.608681,0.032552,0.358423,0.701103,0.652738,0.498671,0.724220,0.790237,0.876148,0.019966,0.476051,0.012790,0.204292,0.871301,0.038343,0.020175,0.992825,0.009467,0.007308,0.005729,0.005328,0.089148,0.079079,0.005574,0.144828,0.007971,0.005913,0.011181,0.728613,0.220897,0.006928,0.349449,0.012203,0.138917,0.253249,0.209656,0.007979,0.105003,0.170216,0.016581,0.143087,0.011837,0.187388,0.229987,0.181827,0.009409,0.076444,0.136258,0.015296,0.010703,0.005157,0.016683,0.010571,0.145721,0.120922,0.012015,0.009416,0.023406,0.489649,0.152926,0.061690,0.170378,0.141320,0.008872,0.008115,0.413068,0.005838,0.198859,0.235731,0.120439,0.131237,0.014420,0.193438,0.050700,0.005392,0.248631,0.128347,0.154303,0.151102,0.374185,0.170396,0.012666,0.008278,0.007109,0.169886,0.081241,0.012633,0.027533,0.172959,0.193821,0.199214,0.038276,0.185779,0.316066,0.008759,0.175332,0.376086,0.143318,0.167316,0,0,0,1,0,0,1,1,1,0,0,1,1,0,1,0,1,0,1,0,0,1,1,1,0,0,0,0,0,1,0,0,1,0,0,0,0,1,0,0,1,0,0,1
999996,999996,0.182004,0.564019,0.242564,0.241178,0.453623,0.469513,0.477518,0.659226,0.519219,0.252664,0.151836,0.107096,0.250046,0.615979,0.396692,0.538448,0.304368,0.215488,0.202722,0.231195,0.451034,0.699873,1,0.006190,0.467464,0.176998,0.330839,0.228066,0.287378,0.198834,0.058392,0.253481,0.232040,0.041643,0.010746,0.111441,0.242826,0.194174,0.179340,0.411838,0.408308,0.194856,0.230781,0,0.831722,0.179148,0.267685,0.264252,0.283755,0.205622,0.212045,0.179343,0.217948,0.190500,0.321179,0.172668,0.131849,0.212800,0.282618,0.283933,0.207634,0.167699,0.700562,0.274738,0.130939,0.230047,0.278051,0.207579,0.015519,0.177257,0.667227,0.182722,0.007106,0.113616,0.223535,0.197228,0.218947,0.392425,0.284678,0.542062,0.433928,0.696884,0.212564,0.188292,0.095760,0.555219,0.546134,0.175947,0.201767,0.370116,0.768884,0.402668,0.456946,0.772608,0.010423,0.148375,0.162938,0.152747,0.011981,0.114454,0.274279,0.143905,0.094905,0.110673,0.009859,0.387326,0.179530,0.039231,0.112227,0.162196,0.029823,0.215861,0.346315,0.085631,0.011113,0.233266,0.166987,0.168005,0.087749,0.166233,0.016084,0.170126,0.007258,0.149269,0.010375,0.143687,0.011726,0.047983,0.166849,0.252629,0.348820,0.716965,0.415401,0.463807,0.439116,0.730924,0.819734,0.185267,0.856728,0.504144,0.078888,0.256357,0.456653,0.554994,0.665625,0.507423,0.754791,0.653042,0.493919,0.593197,0.352377,0.875844,0.723320,0.010398,0.011662,0.198880,0.867038,0.734270,0.013421,0.992164,0.022759,0.009542,0.006215,0.011724,0.089011,0.082866,0.013595,0.143343,0.006807,0.008393,0.011872,0.010705,0.132370,0.007026,0.007465,0.022354,0.209804,0.108308,0.260232,0.011169,0.187441,0.167224,0.011988,0.252582,0.009635,0.114828,0.264793,0.183052,0.014049,0.074885,0.135522,0.011657,0.005552,0.013887,0.010070,0.014198,0.347125,0.055031,0.008907,0.013478,0.157264,0.222701,0.395282,0.056803,0.169963,0.187497,0.005090,0.382667,0.405214,0.011174,0.198609,0.187433,0.213740,0.165185,0.181448,0.011325,0.048471,0.010840,0.115288,0.004712,0.157102,0.175718,0.366764,0.011083,0.009548,0.007750,0.095298,0.007327,0.079321,0.011516,0.028821,0.197119,0.195571,0.209714,0.082263,0.223362,0.198111,0.333040,0.185873,0.202242,0.140645,0.256743,0,0,1,1,1,0,1,1,0,0,0,1,0,0,1,0,1,0,0,0,0,1,0,0,1,0,1,0,1,1,0,1,1,0,0,0,0,0,0,0,0,0,1,0
999997,999997,0.250304,0.491553,0.098547,0.235656,0.771272,0.368018,0.531642,0.598111,0.618474,0.349413,0.146687,0.105224,0.201161,0.461524,0.453123,0.621385,0.361558,0.238448,0.196454,0.197983,0.440465,0.753770,1,0.005730,0.323028,0.264435,0.240686,0.246721,0.198616,0.236846,0.053880,0.246775,0.121799,0.040441,0.014607,0.113290,0.136289,0.185081,0.153965,0.554320,0.317836,0.170675,0.210172,0,0.829959,0.214854,0.167270,0.204425,0.207603,0.214872,0.177838,0.177401,0.215970,0.172265,0.564714,0.177384,0.127724,0.171045,0.418026,0.419267,0.178717,0.242309,0.575183,0.235976,0.127742,0.277368,0.218757,0.274577,0.010492,0.185630,0.291784,0.200597,0.005277,0.310151,0.230207,0.198534,0.253164,0.228401,0.194988,0.579560,0.355812,0.722612,0.419433,0.141334,0.016099,0.591480,0.423312,0.165282,0.382978,0.291380,0.441505,0.513151,0.504736,0.582442,0.004722,0.667242,0.161424,0.196792,0.012504,0.110759,0.172330,0.307804,0.092301,0.110125,0.007843,0.262202,0.158444,0.037983,0.114197,0.163163,0.017500,0.292601,0.149319,0.083830,0.159477,0.181171,0.168418,0.293026,0.087096,0.337476,0.016723,0.168287,0.179555,0.114671,0.010799,0.004001,0.010950,0.048459,0.219269,0.269575,0.224229,0.704714,0.428079,0.549130,0.388084,0.731298,0.822909,0.285036,0.824862,0.628221,0.077263,0.009721,0.025361,0.692569,0.040634,0.509393,0.654217,0.564621,0.498897,0.660776,0.341280,0.875650,0.721345,0.014918,0.016321,0.203851,0.868391,0.043779,0.013496,0.994289,0.011349,0.010214,0.005287,0.013781,0.090721,0.081362,0.009589,0.142914,0.002715,0.158965,0.008909,0.316707,0.133977,0.188346,0.012698,0.134650,0.092753,0.107639,0.234485,0.013983,0.104144,0.169435,0.009984,0.189639,0.007896,0.260245,0.231204,0.092080,0.007247,0.074662,0.175511,0.235885,0.214203,0.010917,0.410929,0.114360,0.055684,0.276538,0.768796,0.013407,0.159436,0.159939,0.184537,0.116934,0.169235,0.099579,0.014156,0.009407,0.006936,0.165844,0.228290,0.217841,0.114998,0.131204,0.177493,0.006543,0.050168,0.109772,0.113389,0.375718,0.313816,0.152342,0.352534,0.010419,0.011857,0.006104,0.519054,0.010302,0.261381,0.017592,0.162579,0.398824,0.299641,0.206128,0.194759,0.133946,0.198851,0.004049,0.142808,0.194735,0.172187,0.219081,1,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,1,1,0,0,0,1,0,1,0,1,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0
999998,999998,0.203572,0.534923,0.180118,0.213109,0.654544,0.535152,0.316271,0.652522,0.398026,0.252192,0.010743,0.104855,0.291573,0.657054,0.371032,0.518749,0.401528,0.184174,0.189672,0.216462,0.479028,0.739938,0,0.671710,0.207771,0.199325,0.271526,0.218218,0.135076,0.309460,0.131328,0.252792,0.201226,0.037319,0.137812,0.113331,0.197228,0.243956,0.222377,0.552005,0.360481,0.177539,0.212581,0,0.820721,0.211693,0.351672,0.203991,0.211667,0.184178,0.254872,0.178577,0.215464,0.249055,0.215373,0.179384,0.128054,0.220289,0.160627,0.285546,0.145576,0.133526,0.739840,0.175488,0.132239,0.252432,0.191897,0.225451,0.011318,0.189785,0.277586,0.201924,0.006667,0.110221,0.224078,0.196936,0.228875,0.321442,0.193608,0.640193,0.445075,0.678204,0.216470,0.157191,0.011130,0.678079,0.779595,0.182949,0.194865,0.324227,0.501468,0.629459,0.469109,0.588067,0.393527,0.146765,0.164142,0.237932,0.319619,0.112294,0.235346,0.275476,0.093671,0.106110,0.011668,0.256915,0.217734,0.036016,0.111000,0.125197,0.013516,0.243475,0.195093,0.084877,0.143995,0.266513,0.167569,0.173756,0.086379,0.200168,0.020733,0.170062,0.011607,0.160199,0.327582,0.005210,0.012312,0.042191,0.165824,0.354605,0.128434,0.707395,0.427290,0.551053,0.223382,0.740666,0.821252,0.184339,0.032315,0.757312,0.080477,0.699589,0.016463,0.577943,0.033972,0.510176,0.781476,0.656360,0.422647,0.782357,0.351762,0.877864,0.007398,0.495588,0.015479,0.202176,0.872004,0.031602,0.020327,0.994314,0.017781,0.009217,0.009679,0.385207,0.089857,0.081588,0.012171,0.141399,0.012073,0.151867,0.009736,0.014754,0.131690,0.013196,0.357263,0.015236,0.187838,0.109391,0.212086,0.005980,0.106337,0.167825,0.008127,0.168689,0.007845,0.388059,0.280902,0.257821,0.194812,0.072705,0.138305,0.005383,0.013580,0.011989,0.007928,0.008797,0.057961,0.049092,0.007879,0.014050,0.248630,0.165508,0.154885,0.063995,0.170830,0.192243,0.016238,0.002075,0.023772,0.009879,0.199464,0.281366,0.112660,0.132679,0.001733,0.008773,0.051293,0.226703,0.112315,0.011016,0.151657,0.216992,0.185538,0.015338,0.010568,0.004443,0.005804,0.009515,0.171433,0.014902,0.027857,0.251941,0.193358,0.182367,0.198919,0.413553,0.316141,0.004965,0.411975,0.018198,0.168192,0.183503,1,0,1,1,1,1,0,0,1,0,1,0,1,1,0,0,1,1,0,0,0,0,0,0,0,0,1,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,1


In [7]:
# Drops ID column as it is not required
train.drop(["id"], axis=1, inplace=True)

In [4]:
# Checks for data types used in the data set
train.dtypes.unique()

array([dtype('int64'), dtype('float64')], dtype=object)

In [5]:
# Checks for nubmer of row having any missing values ('0' indicates no rows have missing values)
sum(train.isna().sum())

0

In [6]:
def reduce_mem_usage(df):
    """ iterate through all the columns of a dataframe and modify the data type
        to reduce memory usage.        
    """
    start_mem = df.memory_usage().sum() / 1024**2
    print('Memory usage of dataframe is {:.2f} MB'.format(start_mem))
    
    for col in df.columns:
        col_type = df[col].dtype
        
        if col_type != object:
            c_min = df[col].min()
            c_max = df[col].max()
            if str(col_type)[:3] == 'int':
                if c_min > np.iinfo(np.int8).min and c_max < np.iinfo(np.int8).max:
                    df[col] = df[col].astype(np.int8)
                elif c_min > np.iinfo(np.int16).min and c_max < np.iinfo(np.int16).max:
                    df[col] = df[col].astype(np.int16)
                elif c_min > np.iinfo(np.int32).min and c_max < np.iinfo(np.int32).max:
                    df[col] = df[col].astype(np.int32)
                elif c_min > np.iinfo(np.int64).min and c_max < np.iinfo(np.int64).max:
                    df[col] = df[col].astype(np.int64)  
            else:
#                 if c_min > np.finfo(np.float16).min and c_max < np.finfo(np.float16).max:
#                     df[col] = df[col].astype(np.float16)
#                 el
                if c_min > np.finfo(np.float32).min and c_max < np.finfo(np.float32).max:
                    df[col] = df[col].astype(np.float32)
                else:
                    df[col] = df[col].astype(np.float64)
        else:
            df[col] = df[col].astype('category')

    end_mem = df.memory_usage().sum() / 1024**2
    print('Memory usage after optimization is: {:.2f} MB'.format(end_mem))
    print('Decreased by {:.1f}%'.format(100 * (start_mem - end_mem) / start_mem))
    
    return df

In [8]:
# Compresses the training data as Kaggle kernel resets due to large size of the training data 
train = reduce_mem_usage(train)

Memory usage of dataframe is 2182.01 MB
Memory usage after optimization is: 959.40 MB
Decreased by 56.0%


In [9]:
# Shows the column data types after data compression
train.dtypes

f0        float32
f1        float32
f2        float32
f3        float32
f4        float32
           ...   
f281         int8
f282         int8
f283         int8
f284         int8
target       int8
Length: 286, dtype: object

In [10]:
# Checks distribution of categorical target variable
train.target.value_counts()

1    500485
0    499515
Name: target, dtype: int64

**As `target` is equaly distributed, it itself can be used as bins in stratified K-Fold validation**

## Modeling & Evaluation

In [11]:
# Seperates predictor variables from target

y = train.target
train.drop(["target"], axis=1, inplace=True)

In [12]:
# Create stratification object for K-Fold cross validation
sk_fold = StratifiedKFold(n_splits=5)

In [13]:
# Performs cross validation on XGB Classifier

cv_generator = sk_fold.split(train, y)

model = XGBClassifier(
    n_estimators=100,
    objective='binary:logistic', 
    eval_metric='auc',
    tree_method='gpu_hist'
)

cv_scores = cross_val_score(model, train, y, scoring='roc_auc', cv=cv_generator, n_jobs=-1, verbose=10)

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=-1)]: Done   1 tasks      | elapsed:   35.6s
[Parallel(n_jobs=-1)]: Done   3 out of   5 | elapsed:  1.0min remaining:   41.3s
[Parallel(n_jobs=-1)]: Done   5 out of   5 | elapsed:  1.3min remaining:    0.0s
[Parallel(n_jobs=-1)]: Done   5 out of   5 | elapsed:  1.3min finished


In [14]:
print("ROC AUC score of XGBoost (with default parameters) Model:", cv_scores.mean())

ROC AUC score of XGBoost (with default parameters) Model: 0.8510695329013235


In [15]:
del cv_scores, model, cv_generator



[CV]  ................................................................
[CV] .................................... , score=0.851, total=  33.3s
[CV]  ................................................................
[CV] .................................... , score=0.849, total=  26.4s
[CV]  ................................................................
[CV] .................................... , score=0.853, total=  33.3s
[CV]  ................................................................
[CV] .................................... , score=0.853, total=  26.2s
[CV]  ................................................................
[CV] .................................... , score=0.849, total=  13.1s




**Automated Hyperparameter Tuning with Hyperopt**

In [22]:
# Instead of performing cross validation during hyperparameter tunining, 
# the tuning is done over fixed train and validation data set to save significant amount of time
# The following code snippet extract that stratified set of train and validation set

cv_generator = sk_fold.split(train, y)

for fold, (idx_train, idx_val) in enumerate(cv_generator):
    y_val = y.iloc[idx_val]
    dtrain = xgb.DMatrix(data=train.iloc[idx_train], label=y.iloc[idx_train])
    dval = xgb.DMatrix(data=train.iloc[idx_val], label=y.iloc[idx_val])
    break

In [18]:
# Sets up a search space for XGBoost hyperparameters
space = {
    'learning_rate': hp.uniform('learning_rate', 0.01, 0.3),
    'max_depth': hp.quniform("max_depth", 2, 6, 1),
    'min_child_weight' : hp.quniform('min_child_weight', 1, 8, 1),
    'reg_alpha' : hp.uniform('reg_alpha', 1e-8, 100),
    'reg_lambda' : hp.uniform('reg_lambda', 1e-8, 100),
    'gamma': hp.uniform ('gamma', 0.0, 1.0),
    'subsample': hp.uniform("subsample", 0.1, 1.0),
    'colsample_bytree': hp.uniform('colsample_bytree', 0.1, 1.0)
}

In [28]:
def trial_loss(space):
    """
    Trial function for Hyperopt to call by passing a set a trial hyperparamets
    to train model and perform predictions.
    
    Parameters:
    ----------
    space: A set a trial hyperparamets
    
    Returns metric for Hyperopt to estimate for further tuning in search space.
    """
    
    # Converts parameter value to int as required by XGBoost
    space["max_depth"] = int(space["max_depth"])
    space["objective"] = "binary:logistic"
    space["eval_metric"] = "auc"
    space["tree_method"] = "gpu_hist"
    
    model = xgb.train(
        space, 
        dtrain, 
        num_boost_round=2000, 
        evals=[(dtrain, 'train'), (dval, 'eval')],
        early_stopping_rounds=50, verbose_eval=False)
    
    predictions = model.predict(dval)
    
    roc_auc = roc_auc_score(y_val, predictions)
    
    del predictions, model, space
    
    return {"loss": -roc_auc, "status": STATUS_OK}

In [29]:
# Starts hyperparameters tuning
trials = Trials()
best_trial = fmin(fn=trial_loss, space=space, algo=tpe.suggest, max_evals=50, trials=trials)

100%|██████████| 50/50 [45:43<00:00, 54.87s/trial, best loss: -0.8563927977799835]


In [30]:
# Views the best hyperparameters
best_trial

{'colsample_bytree': 0.5509440040918419,
 'gamma': 0.2749290513553893,
 'learning_rate': 0.055610269598680366,
 'max_depth': 4.0,
 'min_child_weight': 8.0,
 'reg_alpha': 97.43879743549955,
 'reg_lambda': 75.11623552980602,
 'subsample': 0.844084987947726}

In [31]:
del dtrain, dval, y_val, cv_generator

## Submission

In [32]:
# Loads test data set
test = pd.read_csv("../input/tabular-playground-series-oct-2021/test.csv")

# Removes ID column as it is not required for prediction
test.drop(["id"], axis=1, inplace=True)

# Loads submission data set that acts just as a template for submission
submission = pd.read_csv("../input/tabular-playground-series-oct-2021/sample_submission.csv")

**Prepares final XGBoost model with optimized parameters**

In [34]:
# Adds other important parameters
best_trial["max_depth"] = int(best_trial["max_depth"])
best_trial["objective"] = "binary:logistic"
best_trial["eval_metric"] = "auc"
best_trial["tree_method"] = "gpu_hist"

In [43]:
# Gets the model trained over cross validation and predictions 
# against each iteration is stored

test_predictions = []

cv_generator = sk_fold.split(train, y)

dtest = xgb.DMatrix(data=test)

for fold, (idx_train, idx_val) in enumerate(cv_generator):
    print("fold", fold)

    dtrain = xgb.DMatrix(data=train.iloc[idx_train], label=y.iloc[idx_train])
    dval = xgb.DMatrix(data=train.iloc[idx_val], label=y.iloc[idx_val])
    
    model = xgb.train(
        best_trial, 
        dtrain, 
        num_boost_round=2000, 
        evals=[(dtrain, 'train'), (dval, 'eval')],
        early_stopping_rounds=50, verbose_eval=200)
    
    predictions = model.predict(dtest)
    
    test_predictions.append(predictions)
    
    del predictions, model, dval, dtrain

fold 0
[0]	train-auc:0.59823	eval-auc:0.59433
[200]	train-auc:0.84836	eval-auc:0.84682
[400]	train-auc:0.85485	eval-auc:0.85184
[600]	train-auc:0.85810	eval-auc:0.85391
[800]	train-auc:0.86015	eval-auc:0.85494
[1000]	train-auc:0.86164	eval-auc:0.85555
[1200]	train-auc:0.86279	eval-auc:0.85591
[1400]	train-auc:0.86377	eval-auc:0.85612
[1600]	train-auc:0.86462	eval-auc:0.85628
[1800]	train-auc:0.86539	eval-auc:0.85634
[1999]	train-auc:0.86610	eval-auc:0.85639
fold 1
[0]	train-auc:0.59664	eval-auc:0.59901
[200]	train-auc:0.84758	eval-auc:0.84919
[400]	train-auc:0.85414	eval-auc:0.85405
[600]	train-auc:0.85741	eval-auc:0.85602
[800]	train-auc:0.85949	eval-auc:0.85688
[1000]	train-auc:0.86100	eval-auc:0.85732
[1200]	train-auc:0.86217	eval-auc:0.85753
[1400]	train-auc:0.86315	eval-auc:0.85759
[1437]	train-auc:0.86331	eval-auc:0.85761
fold 2
[0]	train-auc:0.59790	eval-auc:0.59533
[200]	train-auc:0.84776	eval-auc:0.84795
[400]	train-auc:0.85424	eval-auc:0.85328
[600]	train-auc:0.85740	eval-auc

In [54]:
test_predictions

[array([0.7308241 , 0.25364307, 0.9059377 , ..., 0.2811955 , 0.5688376 ,
        0.39456698], dtype=float32),
 array([0.75234586, 0.26519367, 0.88730127, ..., 0.27117756, 0.51428294,
        0.46941733], dtype=float32),
 array([0.72856605, 0.25492337, 0.9054919 , ..., 0.33932757, 0.47629607,
        0.41199732], dtype=float32),
 array([0.73929745, 0.2237292 , 0.9052295 , ..., 0.28358278, 0.5110287 ,
        0.37440345], dtype=float32),
 array([0.7509796 , 0.28724867, 0.8912888 , ..., 0.30526415, 0.4860767 ,
        0.38827506], dtype=float32)]

In [48]:
del dtest, cv_generator, test, train

In [55]:
# Predictions stored against each cross validation iteration finally gets aeveraged
# and target column is set with that averaged predictions
submission["target"] = np.mean(np.column_stack(test_predictions), axis=1)

# Checks for sumbission file before saving
submission

Unnamed: 0,id,target
0,1000000,0.740403
1,1000001,0.256948
2,1000002,0.899050
3,1000003,0.850196
4,1000004,0.262870
...,...,...
499995,1499995,0.971875
499996,1499996,0.816845
499997,1499997,0.296109
499998,1499998,0.511304


In [56]:
# Saves test predictions
submission.to_csv("./submission.csv", index=False)