# N Burning XGBoost FAQs Answered to Use the Library Like a Pro
## Master the nitty-gritty about XGBoost
![](images/unsplash.jpg)
<figcaption style="text-align: center;">
    <strong>
        Photo by 
        <a href='https://unsplash.com/@haithemfrd_off?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText'>Haithem Ferdi</a>
        on 
        <a href='https://unsplash.com/s/photos/boost?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText'>Unsplash.</a> All images are by the author unless specified otherwise.
    </strong>
</figcaption>

## Setup

In [1]:
import warnings

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import xgboost as xgb
from matplotlib import rcParams
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import cross_val_score, cross_validate, train_test_split
from sklearn.preprocessing import OneHotEncoder

rcParams["font.size"] = 15

In [2]:
iris = sns.load_dataset("iris").dropna()
penguins = sns.load_dataset("penguins").dropna()

In [3]:
i_input, i_target = iris.drop("species", axis=1), iris[["species"]]
p_input, p_target = penguins.drop("body_mass_g", axis=1), penguins[["body_mass_g"]]
p_input = pd.get_dummies(p_input)

In [4]:
X_train_i, X_test_i, y_train_i, y_test_i = train_test_split(
    i_input, i_target, test_size=0.2, random_state=1121218
)


X_train_p, X_test_p, y_train_p, y_test_p = train_test_split(
    p_input, p_target, test_size=0.2, random_state=1121218
)

## Motivation

## 1. Which API should I choose - Scikit-learn or the core learning API?

Even though this question has been answered by many, I will just state my answer because most of the other questions depend on this one.

XGBoost in Python have two APIs - Scikit-learn compatible (estimators have the familiar `fit/predict` pattern) and the core XGBoost-native API (there is a global `train` function, whose objectives can be tweaked to switch between regression and classification).

The majority of Python community, including Kagglers and myself use the Scikit-learn API. 

Using the Sklearn API enables you to freely integrate XGBoost estimators into your familiar workflow. The benefits are (and not limited to) the ability to pass core XGB algorithms into [Sklearn pipelines](https://towardsdatascience.com/how-to-use-sklearn-pipelines-for-ridiculously-neat-code-a61ab66ca90d?source=your_stories_page-------------------------------------), using a more efficient cross-validation workflow, avoiding the hassles that come with learning a new API, etc.

We will also see some nuances in XGBoost functionality that will tip the favor towards Sklearn API even further.

## 2. How Do I Completely Control the Randomness in XGBoost?

> The rest of the references to XGBoost algorithms mainly imply the Sklearn-compatible XGBRegressor and XGBClassifier (or similar) estimators.

The estimators have the `random_state` parameter similar to Sklearn estimators (the alternative `seed` parameter has been deprecated but still works). However, running XGBoost with default parameters will yield identical results even with different seeds. 

The reason for this behavior is that XGBoost induces randomness only when the parameters `subsample` and all other parameters that start with `colsample_*` prefix are used. As the names suggest, these parameters have a lot to do with [random sampling](https://towardsdatascience.com/why-bootstrap-sampling-is-the-badass-tool-of-probabilistic-thinking-5d8c7343fb67?source=your_stories_page-------------------------------------) to combat overfitting.

Therefore, you should only use `random_state` when tuning these hyperparameters to get the same results across runs for the same seed.

When using with other Sklearn transformers or estimators that have their own `random_state`, you should pass a seed number both to XGBoost and other classes for reproducibility.

## 3. What are objectives in XGBoost and how to specify them for different tasks?

Both regression and classification tasks have different types and implementations. They change depending on the objective function, the distributions they can work with and their loss function.

For example, regression can be performed using RMSE (Root Mean Squared Error), RMSLE (Root Mean Squared Log Error), Huber Error, etc. loss functions. Sklearn implements different regressors for each of these but in XGBoost, these are all packed into XGBRegressor estimator. 

You can switch between the implementations of different loss functions, supported distributions with the `objective` parameter. It accepts special code strings provided by XGBoost. Most commons ones are:

- `reg:squarederror`
- `reg:squaredlogerror`
- `reg:gamma`
- `reg:tweedie`

Similarly, classification objectives change based on their underlying loss function. These objectives start either with `binary:*` or `multi:*` prefixes depending on the target cardinality. 

There are many other objective types and I will leave it to you to explore the rest and find out the details using this documentation [link](https://xgboost.readthedocs.io/en/latest/parameter.html#learning-task-parameters).

> Note that specifying the correct objective gets rid of that unbelievably annoying warning you get when fitting XGB classifiers.

## 4. Which XGBoost booster should I use - gblinear, gbtree, dart?

> XGBoost has 3 types of gradient boosted learners - these are gradient boosted (GB) linear functions, GB trees and DART trees. You can switch the learners using the `booster` parameter.

If you ask the question from Kagglers, they will choose boosted trees over linear functions on any day (as do I). The reason is that trees have the ability to capture non-linear, much more complex relationships that linear functions cannot.

So, the only question is which tree booster should you pass to the `booster` parameter - `gbtree` or `dart`? 

I won't bother you with the full differences here. The thing you should know is that XGBoost uses an ensemble of decision tree-based models when used with `gbtree` booster. DART trees are an improvement (to be yet validated) where they introduce random dropping the subset of the decision-trees to prevent overfitting.

In the few, small experiments I did with default parameters for `gbtree` and `dart`, I got slightly better scores with dart when I set the `rate_drop` between 0.1 and 0.3. 

For more details, I refer to [this page](https://xgboost.readthedocs.io/en/latest/tutorials/dart.html) of the XGB documentation to learn about the nuances and additional hyperparameters.

## 5. Which tree method should I use in XGBoost?

There are 5 types of algorithms that control the tree construction. You should pass `hist` to `tree_method` if you are doing distributed training. 

For other scenarios, the default is `auto` which changes from `exact` for small-to-medium datasets to `approx.` for large datasets.

## 6. What is a boosting round in XGBoost?

As we said, XGBoost is an ensemble of gradient boosted decision trees. Each tree in the XGBoost ensemble is called a base or weak learner. A weak learner is any algorithm that performs slightly better than random guessing.

By combining the predictions of multiples of weak learners, XGBoost yields a final prediction (skipping a lot of details now). Each time we fit a tree to the data, it is called a single boosting round.

So, to specify the number of trees to be built, pass an integer to `num_boost_round` of the Learning API or to `n_estimators` of the Sklearn API. 

Typically, too few trees lead to underfitting and too large a number of trees leads to overfitting. You will normally tune this parameter with hyperparameter optimization.

## 7. What is `early_stopping_rounds` in XGBoost?

From one boosting round to the next, XGBoost builds upon the predictions of the last tree to the next.

Pretty often, even if haven't used all the trees specified in `n_estimators` or `num_boost_round`, we might observe that the *performance* doesn't exceed a particular threshold or starts deteriorating systematically after some round.

In this case, there is no point in waiting for all trees to be trained because we already know the performance won't improve any more than this or it is downhill from this point on.

So, XGBoost conveniently provides `early_stopping_rounds` parameter to specify that we want to stop training after that much rounds if the score does not improve.

For example, setting `early_stopping_rounds` to 10 means we will cease training after 10 rounds if the performance has not been improving for that last 10 rounds. So, early stopping can get invoked even if we are not at a hard stop for `n_estimators`.

So, to get good results (in single training cases), set a large value for `n_estimators` (over 1000) and choose a desired number of rounds for early stopping.