## Gradient boosting with XGBoost (exclusive gradient boost)
In this example, you'll work with the XGBoost library. XGBoost stands for extreme gradient boosting, which is an implementation of gradient boosting with several additional features focused on performance and speed. (Scikit-learn has another version of gradient boosting, but XGBoost has some technical advantages.)


In [16]:
from xgboost import XGBRegressor
import pandas as pd
from sklearn.model_selection import train_test_split

# Read the data
data = pd.read_csv('./melb_data.csv')

# Select subset of predictors
cols_to_use = ['Rooms', 'Distance', 'Landsize', 'BuildingArea', 'YearBuilt']
X = data[cols_to_use]

# Select target
y = data.Price

# Separate data into training and validation sets
X_train, X_valid, y_train, y_valid = train_test_split(X, y)

my_model = XGBRegressor()
my_model.fit(X_train, y_train)

# XGBoost has a few parameters 

- n_estimators <br>
n_estimators specifies how many times to go through the modeling cycle described above. It is equal to the number of models that we include in the ensemble.
- Too low a value causes underfitting, which leads to inaccurate predictions on both training data and test data.
- Too high a value causes overfitting, which causes accurate predictions on training data, but inaccurate <br>
predictions on test data (which is what we care about).

In [17]:
my_model = XGBRegressor(n_estimators=500)
my_model.fit(X_train, y_train)

- early_stopping_rounds 

early_stopping_rounds offers a way to automatically find the ideal value for n_estimators. Early stopping causes the model to stop iterating when the validation score stops improving, even if we aren't at the hard stop for n_estimators. It's smart to set a high value for n_estimators and then use early_stopping_rounds to find the optimal time to stop iterating. <br> <br>
Since random chance sometimes causes a single round where validation scores don't improve, you need to specify a number for how many rounds of straight deterioration to allow before stopping. Setting early_stopping_rounds=5 is a reasonable choice. In this case, we stop after 5 straight rounds of deteriorating validation scores.<br> <br>

When using early_stopping_rounds, you also need to set aside some data for calculating the validation scores - this is done by setting the eval_set parameter.<br> <br>

We can modify the example above to include early stopping:

In [18]:
my_model = XGBRegressor(n_estimators=500)

# verbose=False - sets the level of detail to be displayed during training.
#  In this case, it is set to False, which means that no information will 
# be displayed on the screen during the training process.
my_model.fit(X_train, y_train,early_stopping_rounds=5,eval_set=[(X_valid, y_valid)],verbose=False)




In [19]:
# learning_rate - This parameter controls the step size shrinkage used to prevent overfitting. A smaller 
# learning rate may lead to better generalization, but it also increases the training time.
my_model = XGBRegressor(n_estimators=1000, learning_rate=0.05)
my_model.fit(X_train, y_train, early_stopping_rounds=5, eval_set=[(X_valid, y_valid)],  verbose=False)



- n_jobs 
 This parameter specifies the number of parallel threads to use during training. 
Setting this to a value greater than 1 can speed up the training process by allowing 
multiple threads to work on different parts of the problem simultaneously.

In [20]:
my_model = XGBRegressor(n_estimators=1000, learning_rate=0.05, n_jobs=4)
my_model.fit(X_train, y_train,early_stopping_rounds=5,eval_set=[(X_valid, y_valid)],verbose=False)




To optimize the model's performance and achieve the lowest MAE, you can try adjusting the following hyperparameters:

- `learning_rate`: Decreasing the learning rate can help improve the model's accuracy, but may also increase training time. You can try values between 0.01 and 0.1, and lower it if the model is overfitting.
- `n_estimators`: Increasing the number of trees (estimators) can help improve the model's accuracy, but may also increase training time. You can try values between 100 and 1000, and increase it if the model is underfitting.
- `max_depth`: Increasing the maximum depth of each tree can allow the model to capture more complex patterns, but may also increase the risk of overfitting. You can try values between 3 and 10, and increase it if the model is underfitting.
- `subsample` and `colsample_bytree`: Decreasing the fraction of the training instances and features used to train each tree can help reduce overfitting, but may also decrease the model's accuracy. You can try values between 0.5 and 1.0 for `subsample`, and between 0.5 and 0.9 for `colsample_bytree`.
- `reg_alpha` and `reg_lambda`: Adding regularization terms to the loss function can help reduce overfitting, but may also decrease the model's accuracy. You can try values between 0.01 and 10 for both `reg_alpha` and `reg_lambda`.

Keep in mind that hyperparameter tuning is an iterative process and may require experimenting with different combinations of hyperparameters to achieve the best performance. You can use techniques such as grid search or randomized search to automate this process.