In [None]:
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test= train_test_split(X, y,
                                                   test_size=0.2, random_state=123)

xg_cl = xgb.XGBClassifier(objective='binary:logistic',
                          n_estimators=10, seed=123)
xg_cl.fit(X_train, y_train)preds = xg_cl.predict(X_test)
accuracy = float(np.sum(preds==y_test))/y_test.shape[0]

print("accuracy: %f" % (accuracy))accuracy: 0.78333


## **Boosting**
- Its not a specific machine learning algorithm
- Concept that can be applied to a set of machine learning models
- Ensemble meta-algorithm used to convert many weak models into a strong models

**Cross-validation in XGBoost example**

In [None]:
import xgboost as xgb

churn_dmatrix = xgb.DMatrix(data=X, label=y)

params = {"objective": 'binary:logistic', 'max_depth': 4}

cv_results = xgb.cv(dtrain=churn_dmatrix, params=params, nflod=4, num_boost_round=10,
                    metrics='error', as_pandas=True)

print("Accuracy: %f" %((1-cv_results["test-error-mean"]).iloc[-1]))

**When to use XGBoost**
- You have a large number of training samples
  - Greater than 1000 training samples and less than 100 features
  - The number of features < number of training samples
- You have a mixture of categorical and numerica features
  - Or just numeric features

**When not to use**
- Image recognition
- Computer vision
- Natural Language Processing and understanding problems
- When the number of training samples is significantly smaller than the number of features

**Common loss functions and XGBoost**
- Loss function names in xgboost:
  - reg:linear - use for regression problems
  - reg:logistic - use for classification problems when you want just decision, not probability
  - binary:logistic - use when you want probability rather than decision

**Base learners and why we need them**
- XGBoost involves creating meta-model that is composed of many individual models that combine to give a final prediction
- Two kinds of base learners: tree and linear

**Tree as base learners**

In [None]:
import xgboost as xgb

X_train, X_test, y_train, y_test= train_test_split(X, y, test_size=0.2,
                                                   random_state=123)
xg_reg = xgb.XGBRegressor(objective='reg:linear', n_estimators=10,
                          seed=123)

xg_reg.fit(X_train, y_train)
preds = xg_reg.predict(X_test)

rmse = np.sqrt(mean_squared_error(y_test,preds))
print("RMSE: %f" % (rmse))


**Linear base learners**

In [None]:
X_train, X_test, y_train, y_test= train_test_split(X, y, test_size=0.2,
                                                   random_state=123)

DM_train = xgb.DMatrix(data=X_train,label=y_train)
DM_test =  xgb.DMatrix(data=X_test,label=y_test)

params = {"booster":"gblinear","objective":"reg:linear"}

xg_reg = xgb.train(params = params, dtrain=DM_train, num_boost_round=10)
preds = xg_reg.predict(DM_test)

rmse = np.sqrt(mean_squared_error(y_test,preds))
print("RMSE: %f" % (rmse))


**Visualizing individual XGBoost trees**

In [None]:
# Create the DMatrix: housing_dmatrix
housing_dmatrix = xgb.DMatrix(data=X, label=y)

# Create the parameter dictionary: params
params = {"objective":"reg:linear", "max_depth":2}

# Train the model: xg_reg
xg_reg = xgb.train(params=params, dtrain=housing_dmatrix, num_boost_round=10)

# Plot the first tree
xgb.plot_tree(xg_reg, num_trees=0)
plt.show()

# Plot the fifth tree
xgb.plot_tree(xg_reg, num_trees=4)
plt.show()

# Plot the last tree sideways
xgb.plot_tree(xg_reg, num_trees=9, rankdir='LR')
plt.show()

**Visualizing feature importances**

In [None]:
# Create the DMatrix: housing_dmatrix
housing_dmatrix = xgb.DMatrix(data=X, label=y)

# Create the parameter dictionary: params
params = {"objective":"reg:linear", "max_depth":4}

# Train the model: xg_reg
xg_reg = xgb.train(params=params, dtrain=housing_dmatrix, num_boost_round=10)

# Plot the feature importances
xgb.plot_importance(xg_reg)
plt.show()

## **Tuning in XGBoost**

**Tuned model example**

In [None]:
housing_dmatrix = xgb.DMatrix(data=X,label=y)

tuned_params = {"objective":"reg:linear",'colsample_bytree': 0.3,'learning_rate': 0.1, 'max_depth': 5}
tuned_cv_results_rmse = xgb.cv(dtrain=housing_dmatrix,
                               params=tuned_params, nfold=4, num_boost_round=200, metrics="rmse",
                               as_pandas=True, seed=123)

print("Tuned rmse: %f" %((tuned_cv_results_rmse["test-rmse-mean"]).tail(1)))

**Automated boosting round selection using early_stopping**

In [None]:
# Create your housing DMatrix: housing_dmatrix
housing_dmatrix = xgb.DMatrix(data=X,label=y)

# Create the parameter dictionary for each tree: params
params = {"objective":"reg:linear", "max_depth":4}

# Perform cross-validation with early stopping: cv_results
cv_results = xgb.cv(dtrain=housing_dmatrix, params=params, nfold=3, num_boost_round=50, early_stopping_rounds=10, metrics="rmse", as_pandas=True, seed=123)

# Print cv_results
print(cv_results)

## **Tree Tunable Parameters in XGBoost**

XGBoost is a popular gradient boosting library that is widely used for machine learning tasks. It provides a variety of tunable parameters to help optimize model performance. Here are some of the most commonly used tree-related parameters in XGBoost:

1. **Learning Rate (eta):**
   - Controls the contribution of each tree to the final prediction.
   - Lower values make the model more robust but require more trees for the same performance.

   ```python
   learning_rate = 0.1
   ```

2. **Number of Trees (n_estimators):**
   - The number of boosting rounds (trees) to be run.

   ```python
   n_estimators = 100
   ```

3. **Maximum Depth of a Tree (max_depth):**
   - The maximum depth of each tree in the ensemble.
   - Higher values can lead to overfitting.

   ```python
   max_depth = 3
   ```

4. **Minimum Child Weight (min_child_weight):**
   - Minimum sum of instance weight (hessian) needed in a child.
   - It can be used to control over-fitting; higher values make the algorithm more conservative.

   ```python
   min_child_weight = 1
   ```

5. **Gamma (min_split_loss):**
   - Minimum loss reduction required to make a further partition on a leaf node.
   - It adds a regularization term to the cost function.

   ```python
   gamma = 0
   ```

6. **Subsample:**
   - The fraction of samples used for fitting the individual trees.
   - It helps prevent overfitting.

   ```python
   subsample = 1.0
   ```

7. **Column Subsampling by Tree (colsample_bytree):**
   - The fraction of features that are randomly sampled for building each tree.
   - It helps prevent overfitting.

   ```python
   colsample_bytree = 1.0
   ```

8. **Column Subsampling by Split (colsample_bylevel):**
   - Similar to `colsample_bytree`, but applies to each level.

   ```python
   colsample_bylevel = 1.0
   ```

9. **L1 Regularization Term on Weights (alpha):**
   - L1 regularization term on weights (analogous to Lasso regression).

   ```python
   alpha = 0
   ```

10. **L2 Regularization Term on Weights (lambda):**
    - L2 regularization term on weights (analogous to Ridge regression).

    ```python
    lambda = 1
    ```

These parameters provide a good starting point for tuning an XGBoost model. However, the optimal values depend on the specific dataset and problem, so it's common to perform a more thorough hyperparameter search using techniques like grid search or randomized search.

## **Linear Tunable Parameters in XGBoost**

XGBoost also supports linear models in addition to tree-based models. Linear models in XGBoost are created by adding a linear component to the boosting process. Here are some of the commonly used linear tunable parameters in XGBoost:

1. **Linear Learning Rate (eta):**
   - Similar to the tree-based version, it controls the step size shrinkage used to prevent overfitting.

   ```python
   eta = 0.1
   ```

2. **L1 Regularization on Weights (alpha):**
   - L1 regularization term on the linear weights (similar to Lasso regularization).
   - It helps prevent overfitting by encouraging sparsity in the linear model.

   ```python
   alpha = 0
   ```

3. **L2 Regularization on Weights (lambda):**
   - L2 regularization term on the linear weights (similar to Ridge regularization).
   - It helps prevent overfitting by penalizing large coefficients.

   ```python
   lambda = 1
   ```

4. **Column Subsampling by Tree (colsample_bytree):**
   - The fraction of features that are randomly sampled for building each tree.
   - In linear models, it controls the subsampling of features.

   ```python
   colsample_bytree = 1.0
   ```

5. **Column Subsampling by Split (colsample_bylevel):**
   - Similar to `colsample_bytree`, but applies to each level in the boosting process.

   ```python
   colsample_bylevel = 1.0
   ```

6. **Subsample:**
   - The fraction of samples used for fitting the individual trees.
   - In linear models, it controls the subsampling of data.

   ```python
   subsample = 1.0
   ```

These linear parameters are typically used when you choose the 'gblinear' booster type in XGBoost. You can set the booster type using the 'booster' parameter:

```python
booster = 'gblinear'
```

As always, the optimal values for these parameters depend on the specific dataset and problem, and tuning may be necessary to achieve the best performance.

**Tuning eta**

In [None]:
# Create your housing DMatrix: housing_dmatrix
housing_dmatrix = xgb.DMatrix(data=X, label=y)

# Create the parameter dictionary for each tree (boosting round)
params = {"objective":"reg:linear", "max_depth":3}

# Create list of eta values and empty list to store final round rmse per xgboost model
eta_vals = [0.001, 0.01, 0.1]
best_rmse = []

# Systematically vary the eta
for curr_val in eta_vals:

    params["eta"] = curr_val

    # Perform cross-validation: cv_results
    cv_results = xgb.cv(dtrain=housing_dmatrix, params=params, nfold=3,
                        num_boost_round=10, early_stopping_rounds=5,
                        metrics="rmse", as_pandas=True, seed=123)

    # Append the final round rmse to best_rmse
    best_rmse.append(cv_results["test-rmse-mean"].tail().values[-1])

# Print the resultant DataFrame
print(pd.DataFrame(list(zip(eta_vals, best_rmse)), columns=["eta","best_rmse"]))

**Grid Search with XGBoost**

In [None]:
# Create the parameter grid: gbm_param_grid
gbm_param_grid = {
    'colsample_bytree': [0.3, 0.7],
    'n_estimators': [50],
    'max_depth': [2, 5]
}

# Instantiate the regressor: gbm
gbm = xgb.XGBRegressor()

# Perform grid search: grid_mse
grid_mse = GridSearchCV(estimator=gbm, param_grid=gbm_param_grid,
                        scoring='neg_mean_squared_error', cv=4, verbose=1)
grid_mse.fit(X, y)

# Print the best parameters and lowest RMSE
print("Best parameters found: ", grid_mse.best_params_)
print("Lowest RMSE found: ", np.sqrt(np.abs(grid_mse.best_score_)))