# Gradient Boosting and XGBoost

XGBoost is a model that utilizes gradient boosting and achieves state of the art results on a variety of datasets.

XGBoost is another example of an ensemble method similar to RandomForestRegressor. Both combine the predictions of several models for a more accurate prediction.

## What is Gradient Boosting?

Gradient boosting is an iterative process where gradient descent is used to iteratively add more accurate models to the ensemble over time.
The model is first initialized with random weights and is usually very inaccurate.

    1. First use t current ensemble to generate predictions for each observation in the data. A prediciton is a sum of all the models in the ensemble.
    2. Next the predictions are used to calculate a loss function (mean squared error for example).
    3. Then use the loss function to fit a new model that will be added to the ensemble (this is where the gradient descent comes in)
    4. The weights are adjusted and the new model is added to the ensemble.
    then repeat...

In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split

# Read the data
data = pd.read_csv('../data/melb_data.csv')

# Select subset of predictors
cols_to_use = ['Rooms', 'Distance', 'Landsize', 'BuildingArea', 'YearBuilt']
X = data[cols_to_use]

# Select target
y = data.Price

# Separate data into training and validation sets
X_train, X_valid, y_train, y_valid = train_test_split(X, y)

## XGBoost

XGBoost stands for Extreme Gradient Boost. This model implements gradient boosting with several additional features that focus on performance and speed.

In [3]:
from xgboost import XGBRegressor

my_model = XGBRegressor()
my_model.fit(X_train, y_train)

We can also make predictions and evaluate the model

In [4]:
from sklearn.metrics import mean_absolute_error

predictions = my_model.predict(X_valid)

print("MAE: ", mean_absolute_error(predictions, y_valid))

MAE:  234508.1677374816


## Parameter Tuning

XGBoost has some parameters that can dramatically impact the model performance.

### `n_estimators`

One example is `n_estimators` which dictates the number of times the model iterates, calculates loss, and adds a new model
    <br>- usually between 100-1000
    <br>- too low can cause underfitting
    <br>- too high can cause overfitiing

In [5]:
my_model = XGBRegressor(n_estimators=500)
my_model.fit(X_train, y_train)

### `early_stopping_rounds`

Offers a way to automatically find the ideal value for n_estimators. Stops iterating when the validation score stops improving.
This allows us to simply set a high value for `n_estimators` and enable early stopping for good results.
    <br>- Since somtimes due to the randomness of model training one or two rounds can occur where teh performance does not improve. Use early_stopping_rounds=5 as a reasonable baseline
    <br>- Stops the model after not improving for 5 rounds straight.
    <br>- When using early_stopping_rounds, also need to set aside data fro validation using the eval_set parameter
    <br>- If later the model will be fit to all the data then set `n_estimators` to whatever value was optimal when early stopping

In [None]:
# UserWarning: `early_stopping_rounds` in `fit` method is deprecated for better
# compatibility with scikit-learn, use `early_stopping_rounds` in constructor or`set_params` instead.

my_model = XGBRegressor(n_estimators=500,
                        early_stopping_rounds=5)

my_model.fit(X_train, y_train,
             eval_set=[(X_valid, y_valid)],
             verbose=False)

## `learning_rate`

The learning rate refers to the step size of the gradient descent? in this case Kaggle defines it as the number the predictions from each model are multiplied by before adding them together.
<br>- A lower learning rate means each additional iteration helps us less so we can set a higher value for `n_estimators` without overfitting.
<br>- In general a lower learning rate with a higher number of estimators yields more accurate XGBoost models though it also takes longer to train.
<br>- `learning_rate` byt default is 0.1

In [None]:
my_model = XGBRegressor(n_estimators=1000,
                        learning_rate=0.05,
                        early_stopping_rounds=5)

my_model.fit(X_train, y_train,
             eval_set=[(X_valid, y_valid)],
             verbose=False)

## `n_jobs`

`n_jobs`  is a fitting **time** optimization parameter. It breaks up the tasks into multiple threads for faster fitting operations.
<br>- Only really helps with bigger datasets (time optimization doesn't make a big difference on smaller data sets).
<br>- Typically set to be equal to the number of processing cores on your machine.

In [9]:
my_model = XGBRegressor(n_estimators=1000, learning_rate=0.05, n_jobs=4, early_stopping_rounds=5)

my_model.fit(X_train, y_train, 
             eval_set=[(X_valid, y_valid)], 
             verbose=False)