# Modeling

Once we have preprocessed our data let's start creating the model

I start importing libraries and also showing the versions I used

In [1]:
# Imports
%matplotlib inline
import pickle
import gc
import matplotlib
import pandas as pd
import xgboost as xgb
from sklearn.model_selection import train_test_split

# Libraries version
print("VERSIONS:")
print("Xgboost: ", xgb.__version__)
print("Pandas: ", pd.__version__)
print("Matplotlib: ", matplotlib.__version__)
plt = matplotlib.pyplot


VERSIONS:
Xgboost:  1.2.1
Pandas:  1.1.4
Matplotlib:  3.3.3


## Read processed data

We have previously saved our data, then is time to load it

In [2]:
df = pd.read_pickle("data-pre/df.pkl")
drop_columns = [c for c in df if c[-1] not in ['1', '2', '3', '4', '5', '6', '7', '8', '9', '0'] and c.startswith('target')]
drop_columns += ['ID']
features = df.columns.difference(drop_columns)

Now I'm going to prepare the data for train, validation and test from the data frame loaded

In [3]:
# Train
f0 = df.date_block_num < 34
# Test
f1 = df.date_block_num == 34

train, val = train_test_split(df[f0], test_size=0.2, stratify=df[f0]['target'])
test = df[f1]

Train = xgb.DMatrix(train[features], train['target'])
Val = xgb.DMatrix(val[features], val['target'])
Test = xgb.DMatrix(test[features])

Delete the data frame to free memory

In [4]:
del df
gc.collect()

47

## XGBoost Training

Now it's time to define the parameters of our model and then start training it

In [5]:
%%time

xgb_params = {
    'eval_metric': 'rmse',
    'lambda': '0.171', 
    'gamma': '0.124',
    'booster': 'gbtree', 
    'alpha': '0.170',
    'objective': 'reg:squarederror',
    'colsample_bytree': '0.715',
    'subsample': '0.874', 
    'min_child_weight': 26,
    'eta': '0.148',
    'max_depth': 6,
    'tree_method': 'gpu_hist',
}


model = xgb.train(xgb_params, Train, 1500, [(Train, 'Train'), (Val, 'Val')], early_stopping_rounds=10, verbose_eval=1)

[0]	Train-rmse:1.16781	Val-rmse:1.16873
Multiple eval metrics have been passed: 'Val-rmse' will be used for early stopping.

Will train until Val-rmse hasn't improved in 10 rounds.
[1]	Train-rmse:1.11190	Val-rmse:1.11333
[2]	Train-rmse:1.06704	Val-rmse:1.06897
[3]	Train-rmse:1.03323	Val-rmse:1.03550
[4]	Train-rmse:1.00706	Val-rmse:1.00967
[5]	Train-rmse:0.98754	Val-rmse:0.99031
[6]	Train-rmse:0.97079	Val-rmse:0.97380
[7]	Train-rmse:0.95905	Val-rmse:0.96227
[8]	Train-rmse:0.94810	Val-rmse:0.95150
[9]	Train-rmse:0.93880	Val-rmse:0.94225
[10]	Train-rmse:0.93167	Val-rmse:0.93516
[11]	Train-rmse:0.92610	Val-rmse:0.92974
[12]	Train-rmse:0.92198	Val-rmse:0.92574
[13]	Train-rmse:0.91825	Val-rmse:0.92208
[14]	Train-rmse:0.91500	Val-rmse:0.91899
[15]	Train-rmse:0.91214	Val-rmse:0.91613
[16]	Train-rmse:0.91021	Val-rmse:0.91438
[17]	Train-rmse:0.90809	Val-rmse:0.91243
[18]	Train-rmse:0.90609	Val-rmse:0.91058
[19]	Train-rmse:0.90455	Val-rmse:0.90915
[20]	Train-rmse:0.90304	Val-rmse:0.90771
[21]	Tra

[195]	Train-rmse:0.84323	Val-rmse:0.85742
[196]	Train-rmse:0.84303	Val-rmse:0.85726
[197]	Train-rmse:0.84295	Val-rmse:0.85721
[198]	Train-rmse:0.84291	Val-rmse:0.85719
[199]	Train-rmse:0.84238	Val-rmse:0.85661
[200]	Train-rmse:0.84220	Val-rmse:0.85644
[201]	Train-rmse:0.84201	Val-rmse:0.85631
[202]	Train-rmse:0.84186	Val-rmse:0.85622
[203]	Train-rmse:0.84176	Val-rmse:0.85619
[204]	Train-rmse:0.84167	Val-rmse:0.85615
[205]	Train-rmse:0.84156	Val-rmse:0.85606
[206]	Train-rmse:0.84148	Val-rmse:0.85604
[207]	Train-rmse:0.84137	Val-rmse:0.85596
[208]	Train-rmse:0.84084	Val-rmse:0.85540
[209]	Train-rmse:0.84078	Val-rmse:0.85537
[210]	Train-rmse:0.84065	Val-rmse:0.85531
[211]	Train-rmse:0.84040	Val-rmse:0.85511
[212]	Train-rmse:0.84033	Val-rmse:0.85507
[213]	Train-rmse:0.84020	Val-rmse:0.85498
[214]	Train-rmse:0.84009	Val-rmse:0.85490
[215]	Train-rmse:0.84005	Val-rmse:0.85490
[216]	Train-rmse:0.83999	Val-rmse:0.85487
[217]	Train-rmse:0.83987	Val-rmse:0.85479
[218]	Train-rmse:0.83980	Val-rmse:

[391]	Train-rmse:0.81909	Val-rmse:0.83993
[392]	Train-rmse:0.81900	Val-rmse:0.83984
[393]	Train-rmse:0.81894	Val-rmse:0.83982
[394]	Train-rmse:0.81887	Val-rmse:0.83982
[395]	Train-rmse:0.81883	Val-rmse:0.83982
[396]	Train-rmse:0.81881	Val-rmse:0.83981
[397]	Train-rmse:0.81876	Val-rmse:0.83980
[398]	Train-rmse:0.81868	Val-rmse:0.83972
[399]	Train-rmse:0.81861	Val-rmse:0.83969
[400]	Train-rmse:0.81856	Val-rmse:0.83967
[401]	Train-rmse:0.81850	Val-rmse:0.83964
[402]	Train-rmse:0.81817	Val-rmse:0.83938
[403]	Train-rmse:0.81810	Val-rmse:0.83937
[404]	Train-rmse:0.81803	Val-rmse:0.83933
[405]	Train-rmse:0.81792	Val-rmse:0.83923
[406]	Train-rmse:0.81780	Val-rmse:0.83915
[407]	Train-rmse:0.81737	Val-rmse:0.83874
[408]	Train-rmse:0.81719	Val-rmse:0.83859
[409]	Train-rmse:0.81715	Val-rmse:0.83858
[410]	Train-rmse:0.81706	Val-rmse:0.83854
[411]	Train-rmse:0.81703	Val-rmse:0.83854
[412]	Train-rmse:0.81699	Val-rmse:0.83853
[413]	Train-rmse:0.81695	Val-rmse:0.83851
[414]	Train-rmse:0.81688	Val-rmse:

[587]	Train-rmse:0.80578	Val-rmse:0.83228
[588]	Train-rmse:0.80572	Val-rmse:0.83226
[589]	Train-rmse:0.80566	Val-rmse:0.83223
[590]	Train-rmse:0.80563	Val-rmse:0.83222
[591]	Train-rmse:0.80560	Val-rmse:0.83222
[592]	Train-rmse:0.80554	Val-rmse:0.83216
[593]	Train-rmse:0.80550	Val-rmse:0.83215
[594]	Train-rmse:0.80546	Val-rmse:0.83215
[595]	Train-rmse:0.80543	Val-rmse:0.83213
[596]	Train-rmse:0.80540	Val-rmse:0.83213
[597]	Train-rmse:0.80537	Val-rmse:0.83213
[598]	Train-rmse:0.80534	Val-rmse:0.83210
[599]	Train-rmse:0.80532	Val-rmse:0.83210
[600]	Train-rmse:0.80526	Val-rmse:0.83209
[601]	Train-rmse:0.80523	Val-rmse:0.83210
[602]	Train-rmse:0.80486	Val-rmse:0.83179
[603]	Train-rmse:0.80483	Val-rmse:0.83178
[604]	Train-rmse:0.80477	Val-rmse:0.83174
[605]	Train-rmse:0.80473	Val-rmse:0.83171
[606]	Train-rmse:0.80471	Val-rmse:0.83170
[607]	Train-rmse:0.80465	Val-rmse:0.83166
[608]	Train-rmse:0.80462	Val-rmse:0.83166
[609]	Train-rmse:0.80455	Val-rmse:0.83162
[610]	Train-rmse:0.80451	Val-rmse:

[783]	Train-rmse:0.79439	Val-rmse:0.82591
[784]	Train-rmse:0.79437	Val-rmse:0.82592
[785]	Train-rmse:0.79433	Val-rmse:0.82589
[786]	Train-rmse:0.79431	Val-rmse:0.82590
[787]	Train-rmse:0.79421	Val-rmse:0.82583
[788]	Train-rmse:0.79416	Val-rmse:0.82581
[789]	Train-rmse:0.79412	Val-rmse:0.82578
[790]	Train-rmse:0.79410	Val-rmse:0.82578
[791]	Train-rmse:0.79394	Val-rmse:0.82563
[792]	Train-rmse:0.79391	Val-rmse:0.82562
[793]	Train-rmse:0.79387	Val-rmse:0.82561
[794]	Train-rmse:0.79381	Val-rmse:0.82556
[795]	Train-rmse:0.79375	Val-rmse:0.82554
[796]	Train-rmse:0.79373	Val-rmse:0.82555
[797]	Train-rmse:0.79367	Val-rmse:0.82551
[798]	Train-rmse:0.79363	Val-rmse:0.82551
[799]	Train-rmse:0.79344	Val-rmse:0.82535
[800]	Train-rmse:0.79338	Val-rmse:0.82531
[801]	Train-rmse:0.79335	Val-rmse:0.82531
[802]	Train-rmse:0.79332	Val-rmse:0.82530
[803]	Train-rmse:0.79331	Val-rmse:0.82530
[804]	Train-rmse:0.79327	Val-rmse:0.82531
[805]	Train-rmse:0.79326	Val-rmse:0.82531
[806]	Train-rmse:0.79321	Val-rmse:

[979]	Train-rmse:0.78439	Val-rmse:0.82042
[980]	Train-rmse:0.78436	Val-rmse:0.82039
[981]	Train-rmse:0.78434	Val-rmse:0.82038
[982]	Train-rmse:0.78429	Val-rmse:0.82037
[983]	Train-rmse:0.78429	Val-rmse:0.82037
[984]	Train-rmse:0.78427	Val-rmse:0.82037
[985]	Train-rmse:0.78424	Val-rmse:0.82035
[986]	Train-rmse:0.78422	Val-rmse:0.82035
[987]	Train-rmse:0.78420	Val-rmse:0.82035
[988]	Train-rmse:0.78416	Val-rmse:0.82034
[989]	Train-rmse:0.78414	Val-rmse:0.82034
[990]	Train-rmse:0.78412	Val-rmse:0.82035
[991]	Train-rmse:0.78408	Val-rmse:0.82033
[992]	Train-rmse:0.78405	Val-rmse:0.82033
[993]	Train-rmse:0.78402	Val-rmse:0.82032
[994]	Train-rmse:0.78397	Val-rmse:0.82030
[995]	Train-rmse:0.78392	Val-rmse:0.82029
[996]	Train-rmse:0.78390	Val-rmse:0.82028
[997]	Train-rmse:0.78388	Val-rmse:0.82027
[998]	Train-rmse:0.78386	Val-rmse:0.82027
[999]	Train-rmse:0.78378	Val-rmse:0.82021
[1000]	Train-rmse:0.78375	Val-rmse:0.82020
[1001]	Train-rmse:0.78373	Val-rmse:0.82018
[1002]	Train-rmse:0.78358	Val-rm

[1171]	Train-rmse:0.77675	Val-rmse:0.81667
[1172]	Train-rmse:0.77673	Val-rmse:0.81667
[1173]	Train-rmse:0.77670	Val-rmse:0.81667
[1174]	Train-rmse:0.77669	Val-rmse:0.81667
[1175]	Train-rmse:0.77665	Val-rmse:0.81666
[1176]	Train-rmse:0.77658	Val-rmse:0.81662
[1177]	Train-rmse:0.77653	Val-rmse:0.81661
[1178]	Train-rmse:0.77650	Val-rmse:0.81661
[1179]	Train-rmse:0.77648	Val-rmse:0.81661
[1180]	Train-rmse:0.77646	Val-rmse:0.81662
[1181]	Train-rmse:0.77643	Val-rmse:0.81661
[1182]	Train-rmse:0.77640	Val-rmse:0.81661
[1183]	Train-rmse:0.77638	Val-rmse:0.81661
[1184]	Train-rmse:0.77636	Val-rmse:0.81662
[1185]	Train-rmse:0.77634	Val-rmse:0.81661
[1186]	Train-rmse:0.77632	Val-rmse:0.81661
[1187]	Train-rmse:0.77626	Val-rmse:0.81654
[1188]	Train-rmse:0.77621	Val-rmse:0.81652
[1189]	Train-rmse:0.77617	Val-rmse:0.81649
[1190]	Train-rmse:0.77615	Val-rmse:0.81650
[1191]	Train-rmse:0.77610	Val-rmse:0.81649
[1192]	Train-rmse:0.77608	Val-rmse:0.81649
[1193]	Train-rmse:0.77605	Val-rmse:0.81648
[1194]	Trai

[1362]	Train-rmse:0.76925	Val-rmse:0.81302
[1363]	Train-rmse:0.76923	Val-rmse:0.81301
[1364]	Train-rmse:0.76920	Val-rmse:0.81303
[1365]	Train-rmse:0.76919	Val-rmse:0.81302
[1366]	Train-rmse:0.76914	Val-rmse:0.81299
[1367]	Train-rmse:0.76912	Val-rmse:0.81298
[1368]	Train-rmse:0.76910	Val-rmse:0.81298
[1369]	Train-rmse:0.76908	Val-rmse:0.81299
[1370]	Train-rmse:0.76905	Val-rmse:0.81298
[1371]	Train-rmse:0.76902	Val-rmse:0.81298
[1372]	Train-rmse:0.76901	Val-rmse:0.81298
[1373]	Train-rmse:0.76898	Val-rmse:0.81297
[1374]	Train-rmse:0.76890	Val-rmse:0.81293
[1375]	Train-rmse:0.76889	Val-rmse:0.81292
[1376]	Train-rmse:0.76886	Val-rmse:0.81292
[1377]	Train-rmse:0.76885	Val-rmse:0.81292
[1378]	Train-rmse:0.76882	Val-rmse:0.81292
[1379]	Train-rmse:0.76881	Val-rmse:0.81291
[1380]	Train-rmse:0.76879	Val-rmse:0.81290
[1381]	Train-rmse:0.76878	Val-rmse:0.81290
[1382]	Train-rmse:0.76877	Val-rmse:0.81290
[1383]	Train-rmse:0.76875	Val-rmse:0.81290
[1384]	Train-rmse:0.76871	Val-rmse:0.81286
[1385]	Trai

## Predict test data

Once the model has been trained its time to predict

In [6]:
test['item_cnt_month'] = model.predict(Test).clip(0, 20)
test[['ID', 'item_cnt_month']].sort_values('ID')

Unnamed: 0,ID,item_cnt_month
10913804,0,0.522799
10913805,1,0.570254
10913806,2,0.813800
10913807,3,0.397277
10913808,4,2.123299
...,...,...
11127999,214195,0.131681
11128000,214196,0.046467
11128001,214197,0.054717
11128002,214198,0.033526


## Save submission and model

Finally I save the prediction in a csv file and save the model

In [7]:
test[['ID', 'item_cnt_month']].sort_values('ID').to_csv('submissions/submission.csv', index=False)
pickle.dump(model, open('xgb.pickle', 'wb'))