This chapter will teach you how to make your XGBoost models as performant as possible. You'll learn about the variety of parameters that can be adjusted to alter the behavior of XGBoost and how to tune them efficiently so that you can supercharge the performance of your models.

# 1- Why tune your model?


video

# 2- When is tuning your model a bad idea?


<p>Now that you&apos;ve seen the effect that tuning has on the overall performance of your XGBoost model, let&apos;s turn the question on its head and see if you can figure out when tuning your model might not be the best idea. <strong>Given that model tuning can be time-intensive and complicated, which of the following scenarios would NOT call for careful tuning of your model</strong>?</p>


- You have lots of examples from some dataset and very many features at your disposal.

- You are very short on time before you must push an initial model to production and have little data to train your model on. (answer)

- You have access to a multi-core (64 cores) server with lots of memory (200GB RAM) and no time constraints.

- You must squeeze out every last bit of performance out of your xgboost model.



# 3- Tuning the number of boosting rounds

<p>Let&apos;s start with parameter tuning by seeing how the number of boosting rounds (number of trees you build) impacts the out-of-sample performance of your XGBoost model. You&apos;ll use <code>xgb.cv()</code> inside a <code>for</code> loop and build one model per <code>num_boost_round</code> parameter.</p>
<p>Here, you&apos;ll continue working with the Ames housing dataset. The features are available in the array <code>X</code>, and the target vector is contained in <code>y</code>.</p>

<ul>
<li>Create a <code>DMatrix</code> called <code>housing_dmatrix</code> from <code>X</code> and <code>y</code>.</li>
<li>Create a parameter dictionary called <code>params</code>, passing in the appropriate <code>&quot;objective&quot;</code> (<code>&quot;reg:linear&quot;</code>) and <code>&quot;max_depth&quot;</code> (set it to <code>3</code>).</li>
<li>Iterate over <code>num_rounds</code> inside a <code>for</code> loop and perform 3-fold cross-validation. In each iteration of the loop, pass in the current number of boosting rounds (<code>curr_num_rounds</code>) to <code>xgb.cv()</code> as the argument to <code>num_boost_round</code>. </li>
<li>Append the final boosting round RMSE for each cross-validated XGBoost model to the <code>final_rmse_per_round</code> list.</li>
<li><code>num_rounds</code> and <code>final_rmse_per_round</code> have been zipped and converted into a DataFrame so you can easily see how the model performs with each boosting round. Hit &apos;Submit Answer&apos; to see the results!</li>
</ul>

In [4]:
import pandas as pd
import xgboost as xgb
import numpy as np
housing_data = pd.read_csv("datasets/ames_housing_trimmed_processed.csv")
X=housing_data.drop('SalePrice', axis=1)
y=housing_data['SalePrice']

In [5]:
 # Create the DMatrix: housing_dmatrix
housing_dmatrix = xgb.DMatrix(data=X, label=y)

# Create the parameter dictionary for each tree: params 
params = {"objective":"reg:linear", "max_depth":3}

# Create list of number of boosting rounds
num_rounds = [5, 10, 15]

# Empty list to store final round rmse per XGBoost model
final_rmse_per_round = []

# Iterate over num_rounds and build one model per num_boost_round parameter
for curr_num_rounds in num_rounds:

    # Perform cross-validation: cv_results
    cv_results = xgb.cv(dtrain=housing_dmatrix, params=params, nfold=3, num_boost_round=curr_num_rounds, metrics="rmse", as_pandas=True, seed=123)
    
    # Append final round RMSE
    final_rmse_per_round.append(cv_results["test-rmse-mean"].tail().values[-1])

# Print the resultant DataFrame
num_rounds_rmses = list(zip(num_rounds, final_rmse_per_round))
print(pd.DataFrame(num_rounds_rmses,columns=["num_boosting_rounds","rmse"]))

  if getattr(data, 'base', None) is not None and \
  data.base is not None and isinstance(data, np.ndarray) \


   num_boosting_rounds          rmse
0                    5  50903.300781
1                   10  34774.192708
2                   15  32895.097005


# 4- Automated boosting round selection using early_stopping

<p>Now, instead of attempting to cherry pick the best possible number of boosting rounds, you can very easily have XGBoost automatically select the number of boosting rounds for you within <code>xgb.cv()</code>. This is done using a technique called <strong>early stopping</strong>. </p>
<p><strong>Early stopping</strong> works by testing the XGBoost model after every boosting round against a hold-out dataset and stopping the creation of additional boosting rounds (thereby finishing training of the model early) if the hold-out metric (<code>&quot;rmse&quot;</code> in our case) does not improve for a given number of rounds. Here you will use the <code>early_stopping_rounds</code> parameter in <code>xgb.cv()</code> with a large possible number of boosting rounds (50). Bear in mind that if the holdout metric continuously improves up through when <code>num_boost_rounds</code> is reached, then early stopping does not occur.</p>
<p>Here, the <code>DMatrix</code> and parameter dictionary have been created for you. Your task is to use cross-validation with early stopping. Go for it!</p>

<ul>
<li>Perform 3-fold cross-validation with early stopping and <code>&quot;rmse&quot;</code> as your metric. Use <code>10</code> early stopping rounds and <code>50</code> boosting rounds. Specify a <code>seed</code> of <code>123</code> and make sure the output is a <code>pandas</code> DataFrame. Remember to specify the other parameters such as <code>dtrain</code>, <code>params</code>, and <code>metrics</code>.</li>
<li>Print <code>cv_results</code>.</li>
</ul>

In [6]:
# Create your housing DMatrix: housing_dmatrix
housing_dmatrix = xgb.DMatrix(data=X, label=y)

# Create the parameter dictionary for each tree: params
params = {"objective":"reg:linear", "max_depth":4}

# Perform cross-validation with early stopping: cv_results
cv_results = xgb.cv(dtrain=housing_dmatrix, params=params, nfold=3, num_boost_round=50, early_stopping_rounds=10, metrics="rmse", as_pandas=True, seed=123)

# Print cv_results
print(cv_results)

    train-rmse-mean  train-rmse-std  test-rmse-mean  test-rmse-std
0     141871.630208      403.632409   142640.651042     705.571916
1     103057.028646       73.769561   104907.666667     111.114933
2      75975.963541      253.734987    79262.059895     563.766991
3      57420.529948      521.653556    61620.135417    1087.690754
4      44552.955729      544.169200    50437.561198    1846.448222
5      35763.949219      681.798925    43035.660156    2034.469858
6      29861.464844      769.572234    38600.881510    2169.800969
7      25994.673828      756.522016    36071.817708    2109.795430
8      23306.833984      759.237670    34383.184896    1934.546688
9      21459.768880      745.624404    33509.139974    1887.375633
10     20148.720703      749.612103    32916.806641    1850.893136
11     19215.382813      641.387376    32197.833984    1734.458659
12     18627.388021      716.257152    31770.852865    1802.155484
13     17960.695963      557.043993    31482.782552    1779.12

# 5- Overview of XGBoost's hyperparameters


video

# 6- Tuning eta


<p>It&apos;s time to practice tuning other XGBoost hyperparameters in earnest and observing their effect on model performance! You&apos;ll begin by tuning the <code>&quot;eta&quot;</code>, also known as the learning rate.</p>
<p>The learning rate in XGBoost is a parameter that can range between <code>0</code> and <code>1</code>, with higher values of <code>&quot;eta&quot;</code> penalizing feature weights more strongly, causing much stronger regularization.</p>

<ul>
<li>Create a list called <code>eta_vals</code> to store the following <code>&quot;eta&quot;</code> values: <code>0.001</code>, <code>0.01</code>, and <code>0.1</code>.</li>
<li>Iterate over your <code>eta_vals</code> list using a <code>for</code> loop.</li>
<li>In each iteration of the <code>for</code> loop, set the <code>&quot;eta&quot;</code> key of <code>params</code> to be equal to <code>curr_val</code>. Then, perform 3-fold cross-validation with early stopping (<code>5</code> rounds), <code>10</code> boosting rounds, a metric of <code>&quot;rmse&quot;</code>, and a <code>seed</code> of <code>123</code>. Ensure the output is a DataFrame.</li>
<li>Append the final round RMSE to the <code>best_rmse</code> list.</li>
</ul>

In [7]:
# Create your housing DMatrix: housing_dmatrix
housing_dmatrix = xgb.DMatrix(data=X, label=y)

# Create the parameter dictionary for each tree (boosting round)
params = {"objective":"reg:linear", "max_depth":3}

# Create list of eta values and empty list to store final round rmse per xgboost model
eta_vals = [0.001, 0.01, 0.1]
best_rmse = []

# Systematically vary the eta 
for curr_val in eta_vals:

    params["eta"] = curr_val
    
    # Perform cross-validation: cv_results
    cv_results = xgb.cv(dtrain=housing_dmatrix, params=params, nfold=3, early_stopping_rounds=5, num_boost_round=10, metrics="rmse", as_pandas=True, seed=123)
    
    
    
    # Append the final round rmse to best_rmse
    best_rmse.append(cv_results["test-rmse-mean"].tail().values[-1])

# Print the resultant DataFrame
print(pd.DataFrame(list(zip(eta_vals, best_rmse)), columns=["eta","best_rmse"]))

     eta      best_rmse
0  0.001  195736.406250
1  0.010  179932.177083
2  0.100   79759.414063


# 7- Tuning max_depth


<p>In this exercise, your job is to tune <code>max_depth</code>, which is the parameter that dictates the maximum depth that each tree in a boosting round can grow to. Smaller values will lead to shallower trees, and larger values to deeper trees.</p>

<ul>
<li>Create a list called <code>max_depths</code> to store the following <code>&quot;max_depth&quot;</code> values: <code>2</code>, <code>5</code>, <code>10</code>, and <code>20</code>.</li>
<li>Iterate over your <code>max_depths</code> list using a <code>for</code> loop.</li>
<li>Systematically vary <code>&quot;max_depth&quot;</code> in each iteration of the <code>for</code> loop and perform 2-fold cross-validation with early stopping (<code>5</code> rounds), <code>10</code> boosting rounds, a metric of <code>&quot;rmse&quot;</code>, and a <code>seed</code> of <code>123</code>. Ensure the output is a DataFrame.</li>
</ul>

In [9]:
# Create your housing DMatrix
housing_dmatrix = xgb.DMatrix(data=X,label=y)

# Create the parameter dictionary
params = {"objective":"reg:linear"}

# Create list of max_depth values
max_depths = [2,5,10,20]
best_rmse = []

# Systematically vary the max_depth
for curr_val in max_depths:

    params["max_depth"] = curr_val
    
    # Perform cross-validation
    cv_results = xgb.cv(dtrain=housing_dmatrix, params=params, nfold=2,
                        early_stopping_rounds=5, num_boost_round=10,
                        metrics="rmse", as_pandas=True, seed=123)
    
    
    
    # Append the final round rmse to best_rmse
    best_rmse.append(cv_results["test-rmse-mean"].tail().values[-1])

# Print the resultant DataFrame
print(pd.DataFrame(list(zip(max_depths, best_rmse)),columns=["max_depth","best_rmse"]))

   max_depth     best_rmse
0          2  37957.468750
1          5  35596.601562
2         10  36065.548828
3         20  36739.578125


# 8- Tuning colsample_bytree


<p>Now, it&apos;s time to tune <code>&quot;colsample_bytree&quot;</code>. You&apos;ve already seen this if you&apos;ve ever worked with scikit-learn&apos;s <code>RandomForestClassifier</code> or <code>RandomForestRegressor</code>, where it just was called <code>max_features</code>. In both <code>xgboost</code> and <code>sklearn</code>, this parameter (although named differently) simply specifies the fraction of features to choose from at every split in a given tree. In <code>xgboost</code>, <code>colsample_bytree</code> must be specified as a float between 0 and 1.</p>

<ul>
<li>Create a list called <code>colsample_bytree_vals</code> to store the values <code>0.1</code>, <code>0.5</code>, <code>0.8</code>, and <code>1</code>.</li>
<li>Systematically vary <code>&quot;colsample_bytree&quot;</code> and perform cross-validation, exactly as you did with <code>max_depth</code> and <code>eta</code> previously.</li>
</ul>

In [10]:
# Create your housing DMatrix
housing_dmatrix = xgb.DMatrix(data=X,label=y)

# Create the parameter dictionary
params={"objective":"reg:linear","max_depth":3}

# Create list of hyperparameter values: colsample_bytree_vals
colsample_bytree_vals = [0.1, 0.5, 0.8, 1]
best_rmse = []

# Systematically vary the hyperparameter value 
for curr_val in colsample_bytree_vals:

    params['colsample_bytree'] = curr_val
    
    # Perform cross-validation
    cv_results = xgb.cv(dtrain=housing_dmatrix, params=params, nfold=2,
                 num_boost_round=10, early_stopping_rounds=5,
                 metrics="rmse", as_pandas=True, seed=123)
    
    # Append the final round rmse to best_rmse
    best_rmse.append(cv_results["test-rmse-mean"].tail().values[-1])

# Print the resultant DataFrame
print(pd.DataFrame(list(zip(colsample_bytree_vals, best_rmse)), 
                   columns=["colsample_bytree","best_rmse"]))

   colsample_bytree     best_rmse
0               0.1  48193.453125
1               0.5  36013.541015
2               0.8  35932.962891
3               1.0  35836.042969


# 9- Review of grid search and random search


video

# 10- Grid search with XGBoost


<p>Now that you&apos;ve learned how to tune parameters individually with XGBoost, let&apos;s take your parameter tuning to the next level by using scikit-learn&apos;s <code>GridSearch</code> and <code>RandomizedSearch</code> capabilities with internal cross-validation using the <code>GridSearchCV</code> and <code>RandomizedSearchCV</code> functions. You will use these to find the best model exhaustively from a collection of possible parameter values across multiple parameters simultaneously. Let&apos;s get to work, starting with <code>GridSearchCV</code>!</p>

<ul>
<li>Create a parameter grid called <code>gbm_param_grid</code> that contains a list of <code>&quot;colsample_bytree&quot;</code> values (<code>0.3</code>, <code>0.7</code>), a list with a single value for <code>&quot;n_estimators&quot;</code> (<code>50</code>), and a list of 2 <code>&quot;max_depth&quot;</code> (<code>2</code>, <code>5</code>) values.</li>
<li>Instantiate an <code>XGBRegressor</code> object called <code>gbm</code>.</li>
<li>Create a <code>GridSearchCV</code> object called <code>grid_mse</code>, passing in: the parameter grid to <code>param_grid</code>, the <code>XGBRegressor</code> to <code>estimator</code>, <code>&quot;neg_mean_squared_error&quot;</code> to <code>scoring</code>, and <code>4</code> to <code>cv</code>. Also specify <code>verbose=1</code> so you can better understand the output.</li>
<li>Fit the <code>GridSearchCV</code> object to <code>X</code> and <code>y</code>.</li>
<li>Print the best parameter values and lowest RMSE, using the <code>.best_params_</code> and <code>.best_score_</code> attributes, respectively, of <code>grid_mse</code>.</li>
</ul>

In [12]:
from sklearn.model_selection import GridSearchCV

# Create the parameter grid: gbm_param_grid
gbm_param_grid = {
    'colsample_bytree': [0.3, 0.7],
    'n_estimators': [50],
    'max_depth': [2, 5]
}

# Instantiate the regressor: gbm
gbm = xgb.XGBRegressor()

# Perform grid search: grid_mse
grid_mse = GridSearchCV(param_grid=gbm_param_grid, estimator=gbm, scoring="neg_mean_squared_error", cv=4, verbose=1)


# Fit grid_mse to the data
grid_mse.fit(X,y)

# Print the best parameters and lowest RMSE
print("Best parameters found: ", grid_mse.best_params_)
print("Lowest RMSE found: ", np.sqrt(np.abs(grid_mse.best_score_)))

Fitting 4 folds for each of 4 candidates, totalling 16 fits


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.




  if getattr(data, 'base', None) is not None and \
  if getattr(data, 'base', None) is not None and \
  if getattr(data, 'base', None) is not None and \
  if getattr(data, 'base', None) is not None and \




  if getattr(data, 'base', None) is not None and \




  if getattr(data, 'base', None) is not None and \
  if getattr(data, 'base', None) is not None and \
  if getattr(data, 'base', None) is not None and \




  if getattr(data, 'base', None) is not None and \
  if getattr(data, 'base', None) is not None and \




  if getattr(data, 'base', None) is not None and \
  if getattr(data, 'base', None) is not None and \




  if getattr(data, 'base', None) is not None and \




  if getattr(data, 'base', None) is not None and \
  if getattr(data, 'base', None) is not None and \




[Parallel(n_jobs=1)]: Done  16 out of  16 | elapsed:    2.2s finished
  if getattr(data, 'base', None) is not None and \
  data.base is not None and isinstance(data, np.ndarray) \


Best parameters found:  {'colsample_bytree': 0.7, 'max_depth': 5, 'n_estimators': 50}
Lowest RMSE found:  29916.562522854438


# 11- Random search with XGBoost


<p>Often, <code>GridSearchCV</code> can be really time consuming, so in practice, you may want to use <code>RandomizedSearchCV</code> instead, as you will do in this exercise. The good news is you only have to make a few modifications to your <code>GridSearchCV</code> code to do <code>RandomizedSearchCV</code>. The key difference is you have to specify a <code>param_distributions</code> parameter instead of a <code>param_grid</code> parameter.</p>

<ul>
<li>Create a parameter grid called <code>gbm_param_grid</code> that contains a list with a single value for <code>&apos;n_estimators&apos;</code> (<code>25</code>), and a list of <code>&apos;max_depth&apos;</code> values between <code>2</code> and <code>11</code> for <code>&apos;max_depth&apos;</code> - use <code>range(2, 12)</code> for this. </li>
<li>Create a <code>RandomizedSearchCV</code> object called <code>randomized_mse</code>, passing in: the parameter grid to <code>param_distributions</code>, the <code>XGBRegressor</code> to <code>estimator</code>, <code>&quot;neg_mean_squared_error&quot;</code> to <code>scoring</code>, <code>5</code> to <code>n_iter</code>, and <code>4</code> to <code>cv</code>. Also specify <code>verbose=1</code> so you can better understand the output.</li>
<li>Fit the <code>RandomizedSearchCV</code> object to <code>X</code> and <code>y</code>.</li>
</ul>

In [14]:
from sklearn.model_selection import RandomizedSearchCV

# Create the parameter grid: gbm_param_grid 
gbm_param_grid = {
    'n_estimators': [25],
    'max_depth': range(2,12)
}

# Instantiate the regressor: gbm
gbm = xgb.XGBRegressor(n_estimators=10)

# Perform random search: grid_mse
randomized_mse = RandomizedSearchCV(param_distributions=gbm_param_grid,
                                    estimator=gbm, scoring="neg_mean_squared_error",
                                    n_iter=5, cv=4, verbose=1)


# Fit randomized_mse to the data
randomized_mse.fit(X,y)

# Print the best parameters and lowest RMSE
print("Best parameters found: ", randomized_mse.best_params_)
print("Lowest RMSE found: ", np.sqrt(np.abs(randomized_mse.best_score_)))

Fitting 4 folds for each of 5 candidates, totalling 20 fits


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
  if getattr(data, 'base', None) is not None and \
  if getattr(data, 'base', None) is not None and \
  if getattr(data, 'base', None) is not None and \




  if getattr(data, 'base', None) is not None and \
  if getattr(data, 'base', None) is not None and \
  if getattr(data, 'base', None) is not None and \




  if getattr(data, 'base', None) is not None and \




  if getattr(data, 'base', None) is not None and \
  if getattr(data, 'base', None) is not None and \




  if getattr(data, 'base', None) is not None and \
  if getattr(data, 'base', None) is not None and \




  if getattr(data, 'base', None) is not None and \
  if getattr(data, 'base', None) is not None and \
  if getattr(data, 'base', None) is not None and \




  if getattr(data, 'base', None) is not None and \
  if getattr(data, 'base', None) is not None and \
  if getattr(data, 'base', None) is not None and \




  if getattr(data, 'base', None) is not None and \
  if getattr(data, 'base', None) is not None and \


Best parameters found:  {'n_estimators': 25, 'max_depth': 5}
Lowest RMSE found:  36636.35808132903


[Parallel(n_jobs=1)]: Done  20 out of  20 | elapsed:    2.5s finished
  if getattr(data, 'base', None) is not None and \
  data.base is not None and isinstance(data, np.ndarray) \


# 12- Limits of grid search and random search


video

# 13- When should you use grid search and random search?


Now that you've seen some of the drawbacks of grid search and random search, which of the following most accurately describes why both random search and grid search are non-ideal search hyperparameter tuning strategies in all scenarios?

