# Week 4 - Models and Experimentation

## Step 1 Training a model

For the purposes of this demo, we will be using this [adapted demo](https://www.datacamp.com/tutorial/xgboost-in-python) and training an XGBoost model, and then doing some experimentation and hyperparameter tuning.


If running this notebook locally, use the following steps to create virtual environment:
- Don't use past python 3.10
- To create virtual environment use "venv"

`python -m venv NAME`

- Try to avoid anaconda, poetry or similar package management platforms
- To install a package use pip

`python -m pip install <package-name>`

- once you are done working with this virtual environment, deactivate it with `deactivate`

### Install packages

# Experimentation Overview  

For this experiment, we are trying to tune the hyper parameters for the XG Boost algorithm. We are using the WandB logging to track the experiments.

Hyper-parameters chosen to tune :
1. Learning Rate : [0.01,0.03, 0.1, 0.3, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.2, 1.3, 1.4, 1.5 ]
2. Max Depth : [1,2,3,4,5,6]


- W and B Result link : https://api.wandb.ai/links/zero-monster/1s6p85rk
- Best Parameters : LR = 0.01 , Max Depth = 6

- Best Performance : RMSE on Validation Set = 533.2
- Original Best Performance : 549.10
- Improvement in Performance : 2%

In [2]:
!pip install wandb -qU

In [2]:
import xgboost as xgb
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error


### Import data

We will be using Diamonds dataset imported from Seaborn. It is also available on [Kaggle](https://www.kaggle.com/datasets/shivam2503/diamonds).

Read about the features by following the link. We will be predicting the price of diamonds.

In [3]:
diamonds = sns.load_dataset('diamonds')
diamonds.head()

Unnamed: 0,carat,cut,color,clarity,depth,table,price,x,y,z
0,0.23,Ideal,E,SI2,61.5,55.0,326,3.95,3.98,2.43
1,0.21,Premium,E,SI1,59.8,61.0,326,3.89,3.84,2.31
2,0.23,Good,E,VS1,56.9,65.0,327,4.05,4.07,2.31
3,0.29,Premium,I,VS2,62.4,58.0,334,4.2,4.23,2.63
4,0.31,Good,J,SI2,63.3,58.0,335,4.34,4.35,2.75


In [5]:
diamonds.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 53940 entries, 0 to 53939
Data columns (total 10 columns):
 #   Column   Non-Null Count  Dtype   
---  ------   --------------  -----   
 0   carat    53940 non-null  float64 
 1   cut      53940 non-null  category
 2   color    53940 non-null  category
 3   clarity  53940 non-null  category
 4   depth    53940 non-null  float64 
 5   table    53940 non-null  float64 
 6   price    53940 non-null  int64   
 7   x        53940 non-null  float64 
 8   y        53940 non-null  float64 
 9   z        53940 non-null  float64 
dtypes: category(3), float64(6), int64(1)
memory usage: 3.0 MB


In [6]:
diamonds.shape

(53940, 10)

In [4]:
X,y = diamonds.drop('price', axis=1), diamonds[['price']]

# For the cut, color and clarity use pandas category to enable XGBoost ability to deal with categorical data.

X['cut'] = X['cut'].astype('category')
X['color'] = X['color'].astype('category')
X['clarity'] = X['clarity'].astype('category')

### Split the data and train a model

In [5]:
# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create DMatrix
dtrain = xgb.DMatrix(X_train, label=y_train, enable_categorical=True)
dtest = xgb.DMatrix(X_test, label=y_test, enable_categorical=True)

In [6]:
# Define hyperparameters
params = {"objective": "reg:squarederror", "tree_method": "gpu_hist"}

n = 100
model = xgb.train(
   params=params,
   dtrain=dtrain,
   num_boost_round=n,
)


    E.g. tree_method = "hist", device = "cuda"



In [7]:
# Define evaluation metrics - Root Mean Squared Error

predictions = model.predict(dtest)
rmse = mean_squared_error(y_test, predictions, squared=False)
print(f"RMSE: {rmse}")

RMSE: 532.8838153117543



    E.g. tree_method = "hist", device = "cuda"



### Incorporate validation

In [11]:
params = {"objective": "reg:squarederror", "tree_method": "gpu_hist"}
n = 100

# Create the validation set
evals = [(dtrain, "train"), (dtest, "validation")]

In [12]:
evals = [(dtrain, "train"), (dtest, "validation")]

model = xgb.train(
   params=params,
   dtrain=dtrain,
   num_boost_round=n,
   evals=evals,
   verbose_eval=10,
)

[0]	train-rmse:2859.49097	validation-rmse:2851.62630
[10]	train-rmse:550.99470	validation-rmse:571.16640
[20]	train-rmse:491.51435	validation-rmse:544.08058



    E.g. tree_method = "hist", device = "cuda"



[30]	train-rmse:464.38845	validation-rmse:537.01895
[40]	train-rmse:445.99106	validation-rmse:533.85127
[50]	train-rmse:430.36010	validation-rmse:532.90320
[60]	train-rmse:418.87898	validation-rmse:533.04629
[70]	train-rmse:409.66247	validation-rmse:533.58046
[80]	train-rmse:397.34048	validation-rmse:534.31963
[90]	train-rmse:389.94294	validation-rmse:532.61946
[99]	train-rmse:377.70831	validation-rmse:532.88383


In [13]:
# Incorporate early stopping
n = 10000


model = xgb.train(
   params=params,
   dtrain=dtrain,
   num_boost_round=n,
   evals=evals,
   verbose_eval=50,
   # Activate early stopping
   early_stopping_rounds=50
)

[0]	train-rmse:2859.49097	validation-rmse:2851.62630
[50]	train-rmse:430.36010	validation-rmse:532.90320
[100]	train-rmse:377.56825	validation-rmse:532.79980
[103]	train-rmse:375.44970	validation-rmse:532.50220


In [14]:
# Cross-validation

params = {"objective": "reg:squarederror", "tree_method": "gpu_hist"}
n = 1000

results = xgb.cv(
   params, dtrain,
   num_boost_round=n,
   nfold=5,
   early_stopping_rounds=20
)



    E.g. tree_method = "hist", device = "cuda"



In [15]:
results.head()

Unnamed: 0,train-rmse-mean,train-rmse-std,test-rmse-mean,test-rmse-std
0,2861.153015,8.266765,2861.773555,36.937516
1,2081.378004,5.534608,2084.973481,32.064109
2,1545.361682,3.287745,1553.681211,31.059209
3,1182.364236,3.585787,1192.464771,26.157805
4,941.828819,2.971779,958.467497,23.613538


In [16]:
best_rmse = results['test-rmse-mean'].min()

best_rmse

549.1039652582465

## Start W&B


- Login into your W&B profile using the code below
- Alternatively you can set environment variables. There are several env variables which you can set to change the behavior of W&B logging. The most important are:
    - WANDB_API_KEY - find this in your "Settings" section under your profile
    - WANDB_BASE_URL - this is the url of the W&B server

- Find your API Token in "Profile" -> "Setttings" in the W&B App



In [17]:
# Log in to your W&B account
import wandb

wandb.login()

[34m[1mwandb[0m: Currently logged in as: [33mbhavin-rathava-bwin[0m ([33mzero-monster[0m). Use [1m`wandb login --relogin`[0m to force relogin


True

In [18]:
# TO DO
# Start experiment tracking with W&B
# Do at least 5 experiments with various hyperparameters
# Choose any method for hyperparameter tuning: grid search, random search, bayesian search
# Describe your findings and what you see

In [19]:
from sklearn.model_selection import GridSearchCV

wandb.init(
    # set the wandb project where this run will be logged
    project="Practicum Experimentation Week 4",
    config = {
        "Logging": "RMSE ERROR",
        "MODEL TYPE": "XGBOOST",
        "Hyper Parameter": "Learning Rate"
    }
)

learning_rate = np.random.rand(10)


for learning_ in [0.01,0.03, 0.1, 0.3, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.2, 1.3, 1.4, 1.5 ]:
    params = {"objective": "reg:squarederror", "tree_method": "gpu_hist", "learning_rate" : learning_}
    model = xgb.train(
    params=params,
    dtrain=dtrain,
    num_boost_round=n,
    evals=evals
    )

    predictions = model.predict(dtest)
    rmse = mean_squared_error(y_test, predictions, squared=False)

    wandb.log({"Learning Rate": learning_, "RMSE_Loss": rmse})

[0]	train-rmse:3951.92462	validation-rmse:3949.02748
[1]	train-rmse:3914.29025	validation-rmse:3911.21998
[2]	train-rmse:3877.10275	validation-rmse:3873.86672
[3]	train-rmse:3840.30732	validation-rmse:3836.90210
[4]	train-rmse:3803.89438	validation-rmse:3800.32256
[5]	train-rmse:3767.86467	validation-rmse:3764.12914
[6]	train-rmse:3732.20828	validation-rmse:3728.30718
[7]	train-rmse:3696.92823	validation-rmse:3692.85775
[8]	train-rmse:3662.02397	validation-rmse:3657.78952
[9]	train-rmse:3627.46912	validation-rmse:3623.08587
[10]	train-rmse:3593.28635	validation-rmse:3588.73717
[11]	train-rmse:3559.44161	validation-rmse:3554.71475
[12]	train-rmse:3525.97874	validation-rmse:3521.06781
[13]	train-rmse:3492.86570	validation-rmse:3487.77529
[14]	train-rmse:3460.11145	validation-rmse:3454.83778



    E.g. tree_method = "hist", device = "cuda"



[15]	train-rmse:3427.65383	validation-rmse:3422.21992
[16]	train-rmse:3395.56677	validation-rmse:3389.96690
[17]	train-rmse:3363.75667	validation-rmse:3358.08757
[18]	train-rmse:3332.31965	validation-rmse:3326.47775
[19]	train-rmse:3301.17180	validation-rmse:3295.26749
[20]	train-rmse:3270.41375	validation-rmse:3264.35438
[21]	train-rmse:3239.92519	validation-rmse:3233.88165
[22]	train-rmse:3209.73054	validation-rmse:3203.59201
[23]	train-rmse:3179.80215	validation-rmse:3173.60675
[24]	train-rmse:3150.18892	validation-rmse:3144.01912
[25]	train-rmse:3120.92669	validation-rmse:3114.64985
[26]	train-rmse:3091.93693	validation-rmse:3085.54884
[27]	train-rmse:3063.14791	validation-rmse:3056.74740
[28]	train-rmse:3034.75572	validation-rmse:3028.31985
[29]	train-rmse:3006.57676	validation-rmse:3000.08035
[30]	train-rmse:2978.78205	validation-rmse:2972.18011
[31]	train-rmse:2951.11499	validation-rmse:2944.49426
[32]	train-rmse:2923.86171	validation-rmse:2917.20940
[33]	train-rmse:2896.92357	v


    E.g. tree_method = "hist", device = "cuda"


    E.g. tree_method = "hist", device = "cuda"



[17]	train-rmse:2399.75751	validation-rmse:2392.73783
[18]	train-rmse:2334.92635	validation-rmse:2328.14500
[19]	train-rmse:2272.34291	validation-rmse:2265.94534
[20]	train-rmse:2211.58871	validation-rmse:2205.37752
[21]	train-rmse:2152.81578	validation-rmse:2146.73568
[22]	train-rmse:2095.88970	validation-rmse:2089.98593
[23]	train-rmse:2040.79954	validation-rmse:2034.90674
[24]	train-rmse:1987.38858	validation-rmse:1981.52309
[25]	train-rmse:1935.58414	validation-rmse:1929.56333
[26]	train-rmse:1885.33063	validation-rmse:1879.42155
[27]	train-rmse:1836.75235	validation-rmse:1830.87821
[28]	train-rmse:1790.06353	validation-rmse:1784.05102
[29]	train-rmse:1744.38750	validation-rmse:1738.34204
[30]	train-rmse:1700.40420	validation-rmse:1694.41990
[31]	train-rmse:1657.88109	validation-rmse:1652.06219
[32]	train-rmse:1616.42909	validation-rmse:1610.79635
[33]	train-rmse:1576.28755	validation-rmse:1570.73065
[34]	train-rmse:1537.64826	validation-rmse:1532.03750
[35]	train-rmse:1500.24850	v


    E.g. tree_method = "hist", device = "cuda"


    E.g. tree_method = "hist", device = "cuda"



[18]	train-rmse:821.45477	validation-rmse:819.79389
[19]	train-rmse:780.15752	validation-rmse:780.41317
[20]	train-rmse:744.73249	validation-rmse:746.78021
[21]	train-rmse:712.48331	validation-rmse:716.67222
[22]	train-rmse:685.73700	validation-rmse:691.37735
[23]	train-rmse:662.00527	validation-rmse:669.30421
[24]	train-rmse:641.58231	validation-rmse:651.29228
[25]	train-rmse:622.49311	validation-rmse:634.41151
[26]	train-rmse:607.23098	validation-rmse:620.24614
[27]	train-rmse:593.31149	validation-rmse:608.92380
[28]	train-rmse:581.00149	validation-rmse:598.52109
[29]	train-rmse:571.35028	validation-rmse:590.04102
[30]	train-rmse:562.18272	validation-rmse:582.61892
[31]	train-rmse:554.44091	validation-rmse:576.08962
[32]	train-rmse:547.95331	validation-rmse:570.85828
[33]	train-rmse:542.14337	validation-rmse:566.15872
[34]	train-rmse:536.73555	validation-rmse:562.66761
[35]	train-rmse:532.00657	validation-rmse:559.28528
[36]	train-rmse:527.87215	validation-rmse:556.43122
[37]	train-r


    E.g. tree_method = "hist", device = "cuda"


    E.g. tree_method = "hist", device = "cuda"



[11]	train-rmse:539.01439	validation-rmse:565.17923
[12]	train-rmse:530.62463	validation-rmse:559.39162
[13]	train-rmse:524.33704	validation-rmse:556.15341
[14]	train-rmse:517.59062	validation-rmse:552.42195
[15]	train-rmse:512.35072	validation-rmse:551.05408
[16]	train-rmse:507.32448	validation-rmse:548.97517
[17]	train-rmse:503.91155	validation-rmse:547.50042
[18]	train-rmse:498.93293	validation-rmse:545.37947
[19]	train-rmse:494.44195	validation-rmse:544.57762
[20]	train-rmse:491.51435	validation-rmse:544.08058
[21]	train-rmse:488.80720	validation-rmse:542.67309
[22]	train-rmse:486.98827	validation-rmse:542.42037
[23]	train-rmse:483.96415	validation-rmse:540.62613
[24]	train-rmse:480.64905	validation-rmse:539.88073
[25]	train-rmse:476.63674	validation-rmse:538.82752
[26]	train-rmse:475.11921	validation-rmse:538.82649
[27]	train-rmse:472.90550	validation-rmse:539.95905
[28]	train-rmse:472.32210	validation-rmse:539.67457
[29]	train-rmse:467.54833	validation-rmse:537.77308
[30]	train-r


    E.g. tree_method = "hist", device = "cuda"


    E.g. tree_method = "hist", device = "cuda"



[18]	train-rmse:474.21977	validation-rmse:553.33251
[19]	train-rmse:470.67214	validation-rmse:553.39302
[20]	train-rmse:469.24745	validation-rmse:553.22352
[21]	train-rmse:465.83821	validation-rmse:552.53810
[22]	train-rmse:461.18133	validation-rmse:551.23318
[23]	train-rmse:460.67115	validation-rmse:550.88557
[24]	train-rmse:460.02830	validation-rmse:551.11663
[25]	train-rmse:454.08959	validation-rmse:550.84460
[26]	train-rmse:447.32282	validation-rmse:552.83134
[27]	train-rmse:444.61330	validation-rmse:552.14450
[28]	train-rmse:442.51691	validation-rmse:552.56948
[29]	train-rmse:441.38407	validation-rmse:551.65275
[30]	train-rmse:438.58600	validation-rmse:551.91702
[31]	train-rmse:434.92079	validation-rmse:551.47387
[32]	train-rmse:432.27885	validation-rmse:550.66614
[33]	train-rmse:429.29787	validation-rmse:548.69378
[34]	train-rmse:425.12014	validation-rmse:548.20638
[35]	train-rmse:424.96257	validation-rmse:548.12692
[36]	train-rmse:422.53461	validation-rmse:547.46791
[37]	train-r


    E.g. tree_method = "hist", device = "cuda"


    E.g. tree_method = "hist", device = "cuda"



[18]	train-rmse:475.21748	validation-rmse:563.02526
[19]	train-rmse:472.51135	validation-rmse:565.53016
[20]	train-rmse:467.16946	validation-rmse:566.57755
[21]	train-rmse:464.00109	validation-rmse:565.71009
[22]	train-rmse:459.78066	validation-rmse:565.60474
[23]	train-rmse:458.51100	validation-rmse:566.11718
[24]	train-rmse:454.41844	validation-rmse:564.75797
[25]	train-rmse:450.19513	validation-rmse:563.41454
[26]	train-rmse:448.50823	validation-rmse:564.19414
[27]	train-rmse:444.11984	validation-rmse:565.14565
[28]	train-rmse:443.49091	validation-rmse:565.41003
[29]	train-rmse:440.61302	validation-rmse:566.35483
[30]	train-rmse:440.33833	validation-rmse:566.18684
[31]	train-rmse:437.87702	validation-rmse:566.44464
[32]	train-rmse:432.38913	validation-rmse:566.53641
[33]	train-rmse:427.87687	validation-rmse:566.69057
[34]	train-rmse:424.43045	validation-rmse:568.59618
[35]	train-rmse:421.40864	validation-rmse:568.26136
[36]	train-rmse:417.64990	validation-rmse:568.89673
[37]	train-r


    E.g. tree_method = "hist", device = "cuda"


    E.g. tree_method = "hist", device = "cuda"



[17]	train-rmse:472.22038	validation-rmse:569.20167
[18]	train-rmse:465.92816	validation-rmse:570.17823
[19]	train-rmse:463.90589	validation-rmse:569.63129
[20]	train-rmse:456.61571	validation-rmse:568.85839
[21]	train-rmse:451.76917	validation-rmse:568.72865
[22]	train-rmse:451.14536	validation-rmse:568.27990
[23]	train-rmse:448.14518	validation-rmse:570.06965
[24]	train-rmse:445.15069	validation-rmse:568.94243
[25]	train-rmse:442.25511	validation-rmse:568.92495
[26]	train-rmse:437.43040	validation-rmse:570.01368
[27]	train-rmse:435.26816	validation-rmse:569.69806
[28]	train-rmse:431.83608	validation-rmse:571.14470
[29]	train-rmse:430.97903	validation-rmse:570.91507
[30]	train-rmse:427.56423	validation-rmse:571.09930
[31]	train-rmse:425.63855	validation-rmse:570.42743
[32]	train-rmse:422.63721	validation-rmse:570.73681
[33]	train-rmse:418.97623	validation-rmse:571.72596
[34]	train-rmse:417.39598	validation-rmse:571.06783
[35]	train-rmse:414.59674	validation-rmse:571.09569
[36]	train-r


    E.g. tree_method = "hist", device = "cuda"


    E.g. tree_method = "hist", device = "cuda"



[11]	train-rmse:511.36754	validation-rmse:585.74208
[12]	train-rmse:507.69413	validation-rmse:583.02132
[13]	train-rmse:505.71074	validation-rmse:584.44584
[14]	train-rmse:498.78890	validation-rmse:581.12804
[15]	train-rmse:489.00503	validation-rmse:581.64056
[16]	train-rmse:486.58108	validation-rmse:580.96200
[17]	train-rmse:480.90715	validation-rmse:581.81284
[18]	train-rmse:478.88702	validation-rmse:580.90869
[19]	train-rmse:472.29281	validation-rmse:580.82671
[20]	train-rmse:466.89488	validation-rmse:581.05452
[21]	train-rmse:463.90118	validation-rmse:580.49305
[22]	train-rmse:461.44625	validation-rmse:580.59466
[23]	train-rmse:456.37408	validation-rmse:584.67473
[24]	train-rmse:453.14719	validation-rmse:583.87567
[25]	train-rmse:452.28333	validation-rmse:583.72124
[26]	train-rmse:445.62984	validation-rmse:586.93188
[27]	train-rmse:443.82245	validation-rmse:586.55203
[28]	train-rmse:441.62998	validation-rmse:586.39219
[29]	train-rmse:439.52881	validation-rmse:585.56268
[30]	train-r


    E.g. tree_method = "hist", device = "cuda"


    E.g. tree_method = "hist", device = "cuda"



[14]	train-rmse:489.40182	validation-rmse:610.42336
[15]	train-rmse:487.85750	validation-rmse:609.68240
[16]	train-rmse:482.51808	validation-rmse:611.62881
[17]	train-rmse:476.24925	validation-rmse:613.94495
[18]	train-rmse:471.94255	validation-rmse:613.22640
[19]	train-rmse:467.06731	validation-rmse:612.96862
[20]	train-rmse:464.75644	validation-rmse:612.97498
[21]	train-rmse:461.90143	validation-rmse:614.77383
[22]	train-rmse:461.12294	validation-rmse:614.26619
[23]	train-rmse:458.20176	validation-rmse:613.28877
[24]	train-rmse:453.05996	validation-rmse:616.73213
[25]	train-rmse:447.02175	validation-rmse:621.08328
[26]	train-rmse:444.33702	validation-rmse:621.88112
[27]	train-rmse:439.21726	validation-rmse:621.09490
[28]	train-rmse:435.75939	validation-rmse:620.60131
[29]	train-rmse:429.53268	validation-rmse:620.46144
[30]	train-rmse:426.28300	validation-rmse:621.71103
[31]	train-rmse:424.12325	validation-rmse:622.78660
[32]	train-rmse:421.94421	validation-rmse:623.73254
[33]	train-r


    E.g. tree_method = "hist", device = "cuda"


    E.g. tree_method = "hist", device = "cuda"



[15]	train-rmse:481.88289	validation-rmse:616.17644
[16]	train-rmse:474.15490	validation-rmse:617.57460
[17]	train-rmse:467.80670	validation-rmse:616.29988
[18]	train-rmse:462.64034	validation-rmse:617.64919
[19]	train-rmse:460.42357	validation-rmse:617.90774
[20]	train-rmse:459.41657	validation-rmse:617.34662
[21]	train-rmse:455.14380	validation-rmse:619.24698
[22]	train-rmse:451.47170	validation-rmse:619.32644
[23]	train-rmse:447.18370	validation-rmse:623.97567
[24]	train-rmse:443.66658	validation-rmse:622.12417
[25]	train-rmse:437.74016	validation-rmse:620.29732
[26]	train-rmse:436.24065	validation-rmse:619.33118
[27]	train-rmse:431.94956	validation-rmse:620.94879
[28]	train-rmse:429.12783	validation-rmse:619.19715
[29]	train-rmse:424.24780	validation-rmse:619.29201
[30]	train-rmse:421.53975	validation-rmse:620.99924
[31]	train-rmse:420.84347	validation-rmse:620.66397
[32]	train-rmse:419.47690	validation-rmse:620.48880
[33]	train-rmse:413.48655	validation-rmse:623.29945
[34]	train-r


    E.g. tree_method = "hist", device = "cuda"


    E.g. tree_method = "hist", device = "cuda"



[16]	train-rmse:503.40618	validation-rmse:662.89781
[17]	train-rmse:496.31334	validation-rmse:662.80607
[18]	train-rmse:492.82120	validation-rmse:661.62221
[19]	train-rmse:487.48301	validation-rmse:657.24311
[20]	train-rmse:483.95493	validation-rmse:655.89871
[21]	train-rmse:478.37307	validation-rmse:658.95532
[22]	train-rmse:474.42818	validation-rmse:660.37208
[23]	train-rmse:471.11415	validation-rmse:659.82646
[24]	train-rmse:465.03408	validation-rmse:660.25113
[25]	train-rmse:461.22660	validation-rmse:660.80892
[26]	train-rmse:459.24004	validation-rmse:660.05447
[27]	train-rmse:456.21221	validation-rmse:661.65559
[28]	train-rmse:450.54696	validation-rmse:663.94693
[29]	train-rmse:447.36929	validation-rmse:661.16116
[30]	train-rmse:445.29586	validation-rmse:660.34846
[31]	train-rmse:443.13211	validation-rmse:659.74763
[32]	train-rmse:436.61550	validation-rmse:670.95230
[33]	train-rmse:432.99205	validation-rmse:669.14765
[34]	train-rmse:430.96021	validation-rmse:668.36556
[35]	train-r


    E.g. tree_method = "hist", device = "cuda"


    E.g. tree_method = "hist", device = "cuda"



[11]	train-rmse:566.50393	validation-rmse:679.20302
[12]	train-rmse:556.06512	validation-rmse:670.20825
[13]	train-rmse:546.71842	validation-rmse:677.68449
[14]	train-rmse:538.46169	validation-rmse:679.28433
[15]	train-rmse:531.90119	validation-rmse:678.57106
[16]	train-rmse:525.29424	validation-rmse:682.97083
[17]	train-rmse:517.19804	validation-rmse:676.30612
[18]	train-rmse:511.40343	validation-rmse:679.04866
[19]	train-rmse:505.70847	validation-rmse:682.75039
[20]	train-rmse:501.69680	validation-rmse:689.25073
[21]	train-rmse:498.25361	validation-rmse:691.20092
[22]	train-rmse:495.42390	validation-rmse:690.38327
[23]	train-rmse:489.82999	validation-rmse:690.15406
[24]	train-rmse:483.84209	validation-rmse:691.58852
[25]	train-rmse:477.28526	validation-rmse:686.32912
[26]	train-rmse:472.80075	validation-rmse:688.05870
[27]	train-rmse:468.01156	validation-rmse:687.85882
[28]	train-rmse:463.39743	validation-rmse:687.15365
[29]	train-rmse:461.22769	validation-rmse:688.95691
[30]	train-r


    E.g. tree_method = "hist", device = "cuda"


    E.g. tree_method = "hist", device = "cuda"



[13]	train-rmse:553.81830	validation-rmse:709.28330
[14]	train-rmse:550.02216	validation-rmse:712.79736
[15]	train-rmse:542.89987	validation-rmse:717.05506
[16]	train-rmse:535.44238	validation-rmse:721.61541
[17]	train-rmse:530.02217	validation-rmse:723.70083
[18]	train-rmse:525.65447	validation-rmse:729.32842
[19]	train-rmse:524.01731	validation-rmse:728.59179
[20]	train-rmse:518.09794	validation-rmse:729.72151
[21]	train-rmse:511.75337	validation-rmse:731.01028
[22]	train-rmse:507.14502	validation-rmse:737.69910
[23]	train-rmse:501.85109	validation-rmse:737.84057
[24]	train-rmse:494.16009	validation-rmse:740.88080
[25]	train-rmse:488.95273	validation-rmse:743.10089
[26]	train-rmse:483.69956	validation-rmse:739.80747
[27]	train-rmse:479.09845	validation-rmse:739.09008
[28]	train-rmse:475.90155	validation-rmse:743.66176
[29]	train-rmse:470.37327	validation-rmse:742.65028
[30]	train-rmse:466.76824	validation-rmse:745.06672
[31]	train-rmse:463.43831	validation-rmse:745.52624
[32]	train-r


    E.g. tree_method = "hist", device = "cuda"


    E.g. tree_method = "hist", device = "cuda"



[15]	train-rmse:549.03911	validation-rmse:722.22679
[16]	train-rmse:540.86606	validation-rmse:722.01172
[17]	train-rmse:534.17797	validation-rmse:722.25443
[18]	train-rmse:528.03425	validation-rmse:722.81126
[19]	train-rmse:522.12570	validation-rmse:733.14451
[20]	train-rmse:517.62842	validation-rmse:734.34435
[21]	train-rmse:512.99695	validation-rmse:738.49378
[22]	train-rmse:507.14087	validation-rmse:738.29062
[23]	train-rmse:501.95816	validation-rmse:737.26503
[24]	train-rmse:497.07403	validation-rmse:738.70110
[25]	train-rmse:492.55389	validation-rmse:740.06291
[26]	train-rmse:489.04221	validation-rmse:740.90728
[27]	train-rmse:486.16750	validation-rmse:744.17010
[28]	train-rmse:481.91837	validation-rmse:742.24585
[29]	train-rmse:477.68831	validation-rmse:742.83934
[30]	train-rmse:472.44020	validation-rmse:743.84049
[31]	train-rmse:471.27908	validation-rmse:744.24822
[32]	train-rmse:467.88311	validation-rmse:747.56274
[33]	train-rmse:465.82314	validation-rmse:745.97170
[34]	train-r


    E.g. tree_method = "hist", device = "cuda"



In [8]:
# As evident from the Learning Rate Vs RMSE Loss, we can see that on higher values of LR, the RMSE is increasing monotonically meaning that the model is overfitting

bestLR = 0.01
bestDepth = 6

params = {"objective": "reg:squarederror", "tree_method": "gpu_hist", "learning_rate" : bestLR, "max_depth" : bestDepth}

n = 1000

results = xgb.cv(
   params, dtrain,
   num_boost_round=n,
   nfold=5,
   early_stopping_rounds=20
)

print(results.head())

print(f"Best RMSE : {results['test-rmse-mean'].min()}")
# https://wandb.ai/zero-monster/Practicum%20Experimentation%20Week%204/reports/RMSE_Loss-24-04-17-18-43-53---Vmlldzo3NTkwMzkw


    E.g. tree_method = "hist", device = "cuda"



   train-rmse-mean  train-rmse-std  test-rmse-mean  test-rmse-std
0      3951.951893       10.524378     3951.843391      42.362551
1      3914.374514       10.448655     3914.268714      42.156624
2      3877.186551       10.370155     3877.079497      41.947674
3      3840.389114       10.296209     3840.289917      41.740940
4      3803.986712       10.234358     3803.909143      41.537761
Best RMSE : 533.2951820146366
