# Ray Tune - An end-to-end example of using XGBoost with Ray Tune and Ray Serve

© 2019-2022, Anyscale. All Rights Reserved

This example illustrates how you can use Ray Libraries for an end-to-end example.

![](images/xgboost_tune_serve.png)

1. Use XGBoost to train a baseline model, using default hyperparameters
2. Use XGBoost to train another model, using "guessed" hyperparemeters
3. Use Tune to HPO and train the best XGBoost model
4. Use ASHAscheduler to use early-stopping
5. Save the best trial model
6. Fetch the best saved model
7. Run some predictions
8. Create deployment and deploy the best trained model to Ray Serve
9. Send request for inference


<img src="https://raw.githubusercontent.com/dmlc/dmlc.github.io/master/img/logo-m/xgboost.png" width="40%" height="20%" aligh="center">

XGBoost is currently one of the most popular machine learning algorithms for regression and classification. It performs very well on a large selection of tasks, and is the key to success in many Kaggle competitions.

Derived mainly from [documentaton](https://docs.ray.io/en/latest/tune/tutorials/tune-xgboost.html), this tutorial will give you a quick introduction to XGBoost, show you how to train an XGBoost model, and then guide you on how to optimize XGBoost parameters using Ray Tune to get the best performance. In particular, we will cover the following:

 * What is XGBoost
 * Training a simple XGBoost classifier
 * XGBoost Hyperparameters
 * Tuning the configuration parameters
 * Early stopping
 * Conclusion
 * Further References

### What is XGBoost
XGBoost is an acronym for eXtreme Gradient Boosting. Internally, XGBoost uses decision trees. Instead of training just one large decision tree, XGBoost and other related algorithms train many small decision trees. The intuition behind this is that even though single decision trees can be inaccurate and suffer from high variance, combining the output of a large number of these weak learners can actually lead to a strong learner, resulting in better predictions and less variance.

<img src="https://docs.ray.io/en/latest/_images/tune-xgboost-ensemble.svg" width="70%" height="50%"> 

A single decision tree (left) might be able to get to an accuracy of 70% for a binary classification task. By combining the output of several small decision trees, an ensemble learner (right) might end up with a higher accuracy of 90%.¶

Boosting algorithms start with a single small decision tree and evaluate how well it predicts the given examples. When building the next tree, those samples that have been misclassified before have a higher chance of being used to generate the tree. This is useful because it avoids overfitting to samples that can be easily classified and instead tries to come up with models that are able to classify hard examples, too. Please [see here](https://towardsdatascience.com/ensemble-methods-bagging-boosting-and-stacking-c9214a10a205) for a more thorough introduction to bagging and boosting algorithms.

There are many boosting algorithms. In their core, they are all very similar. XGBoost uses second-level derivatives to find splits that maximize the **gain** (the inverse of the **loss**) - hence the name. In practice, there really is no drawback in using XGBoost over other boosting algorithms - in fact, it usually shows the best performance.

### Training a simple XGBoost classifier

Let’s first see how a simple XGBoost classifier can be trained. We’ll use the `breast_cancer` dataset included in the sklearn dataset collection. This is a `binary classification` dataset. Given 30 different input features, our task is to learn to identify subjects with breast cancer and those without.

In [1]:
import os
os.environ['KMP_DUPLICATE_LIB_OK']='True'
PIMA_INDIAN_DATA_FILE=os.path.join(os.getcwd(), "data/pima-indians-diabetes.data.csv")

In [2]:
import sklearn.metrics
from sklearn.model_selection import train_test_split
import xgboost as xgb
import numpy as np
from numpy import loadtxt

  from pandas import MultiIndex, Int64Index


### Load the data

In [3]:
# Utility function to load the data
def get_data():
    dataset = loadtxt(PIMA_INDIAN_DATA_FILE, delimiter=",")
    # split data into X and y
    X = dataset[:, 0:8]
    y = dataset[:, 8]
    # Split into train and test set
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=7)
    
    return X_train, X_test, y_train, y_test

### Step 1: Train the baseline model

Let's define our standard or regular XGBoost trainer (function). It takes in XGBoost configuration parameters.

In [4]:
def train_model(config):
    # Load dataset

    train_x, test_x, train_y, test_y = get_data()
    
    # Build input DMatrices for XGBoost
    train_set = xgb.DMatrix(train_x, label=train_y)
    test_set  = xgb.DMatrix(test_x, label=test_y)
    
    # Train the classifier
    results = {}
    bst = xgb.train(
        config,
        train_set,
        evals=[(test_set, "eval")],
        evals_result=results,
        verbose_eval=True)
    return results

Define our basic minimal and default configurations for XGBoost

In [5]:
configs = {
    "objective": "binary:logistic",
    "eval_metric": ["logloss", "error"]
}

### Train the basic model

In [6]:
results = train_model(config=configs)
accuracy = 1. - results["eval"]["error"][-1]
print(f"Accuracy: {accuracy:.4f}")

[0]	eval-logloss:0.60491	eval-error:0.28347
[1]	eval-logloss:0.55934	eval-error:0.25984
[2]	eval-logloss:0.53068	eval-error:0.25591
[3]	eval-logloss:0.51795	eval-error:0.24803
[4]	eval-logloss:0.51153	eval-error:0.24409
[5]	eval-logloss:0.50935	eval-error:0.24803
[6]	eval-logloss:0.50818	eval-error:0.25591
[7]	eval-logloss:0.51097	eval-error:0.24803
[8]	eval-logloss:0.51760	eval-error:0.25591
[9]	eval-logloss:0.51912	eval-error:0.24409
Accuracy: 0.7559


As you can see, the code is quite simple. First, the dataset is loaded and split into a test and train set. The XGBoost model is trained with `xgb.train()`. XGBoost automatically evaluates metrics we specified on the test set. In our case it calculates the `logloss` and the prediction error, which is the percentage of misclassified examples. To calculate the accuracy, we just have to subtract the error from 1.0. In this simple example, most runs result in an accuracy of about 0.75.

What if you want further accuracy, or want to use XGBoost's additional parameters?

### XGBoost Hyperparameters

Even with the default settings, XGBoost was able to get to a good accuracy on the breast cancer dataset. However, as in many machine learning algorithms, there are many knobs to tune which might lead to even better performance. Let’s explore some of them below.

#### Maximum tree depth
Remember that XGBoost internally uses many decision tree models to come up with predictions. When training a decision tree, we need to tell the algorithm how large the tree may get. The parameter for this is called the tree depth.

<img src="https://docs.ray.io/en/latest/_images/tune-xgboost-depth.svg" width="30%" height="10%">

In this image, the left tree has a depth of 2, and the right tree a depth of 3. Note that with each level, 2^(𝑑−1) splits are added, where d is the depth of the tree.¶

Tree depth is a property that concerns the model complexity. If you only allow short trees, the models are likely not very precise - they underfit the data. If you allow very large trees, the single models are likely to overfit to the data. In practice, a number between 2 and 6 is often a good starting point for this parameter.

XGBoost’s default value is 3.

#### Minimum child weight
When a decision tree creates new leaves, it splits up the remaining data at one node into two groups. If there are only few samples in one of these groups, it often doesn’t make sense to split it further. One of the reasons for this is that the model is harder to train when we have fewer samples.

<img src="https://docs.ray.io/en/latest/_images/tune-xgboost-weight.svg" width="20%" height="10%">

In this example, we start with 100 examples. At the first node, they are split into 4 and 96 samples, respectively. In the next step, our model might find that it doesn’t make sense to split the 4 examples more. It thus only continues to add leaves on the right side.

The parameter used by the model to decide if it makes sense to split a node is called the minimum child weight. In the case of linear regression, this is just the absolute number of nodes requried in each child. In other objectives, this value is determined using the weights of the examples, hence the name.

The larger the value, the more constrained the trees are and the less deep they will be. This parameter thus also affects the model complexity. Values can range between 0 and infinity and are dependent on the sample size. For our ca. 500 examples in the breast cancer dataset, values between 0 and 10 should be sensible.

XGBoost’s default value is 1.

#### Subsample size
Each decision tree we add is trained on a subsample of the total training dataset. The probabilities for the samples are weighted according to the XGBoost algorithm, but we can decide on which fraction of the samples we want to train each decision tree on.

Setting this value to 0.7 would mean that we randomly sample 70% of the training dataset before each training iteration.

XGBoost’s default value is 1.

#### Learning rate / Eta
Remember that XGBoost sequentially trains many decision trees, and that later trees are more likely trained on data that has been misclassified by prior trees. In effect this means that earlier trees make decisions for easy samples (i.e. those samples that can easily be classified) and later trees make decisions for harder samples. It is then sensible to assume that the later trees are less accurate than earlier trees.

To address this fact, XGBoost uses a parameter called Eta, which is sometimes called the learning rate. Don’t confuse this with learning rates from gradient descent!

Typical values for this parameter are between `0.01 and 0.3`.

XGBoost’s default value is 0.3.

#### Number of boost rounds
Lastly, we can decide on how many boosting rounds we perform, which means how many decision trees we ultimately train. When we do heavy subsampling or use small learning rate, it might make sense to increase the number of boosting rounds.

XGBoost’s default value is 10.

### Putting it together

Let’s see how this looks like in code! We just need to adjust our config dict.

### Step 2: Use some guessed hyperparameters 

In [7]:
config = {
    "objective": "binary:logistic",
    "eval_metric": ["logloss", "error"],
    "max_depth": 2,
    "min_child_weight": 0,
    "subsample": 0.8,
    "eta": 0.2
}

In [8]:
results = train_model(config)
accuracy = 1. - results["eval"]["error"][-1]
print(f"Accuracy: {accuracy:.4f}")

[0]	eval-logloss:0.64008	eval-error:0.24803
[1]	eval-logloss:0.60108	eval-error:0.23622
[2]	eval-logloss:0.57741	eval-error:0.23622
[3]	eval-logloss:0.55778	eval-error:0.24803
[4]	eval-logloss:0.54114	eval-error:0.25984
[5]	eval-logloss:0.52809	eval-error:0.26772
[6]	eval-logloss:0.52078	eval-error:0.26378
[7]	eval-logloss:0.51649	eval-error:0.25984
[8]	eval-logloss:0.51054	eval-error:0.25197
[9]	eval-logloss:0.50366	eval-error:0.24409
Accuracy: 0.7559


**Note**: The accuracy is slightly lower than the default parameters used above because we randomly chose the parameters.

What if we want to get the best combination of all the parameters? This is where tuning hyperparameters helps.

### Step 3: Tuning the configuration parameters for HPO
XGBoosts default parameters already lead to a good accuracy, and even our guesses in the last section should result in accuracies well above 90%. However, our guesses were just that: guesses. Often we do not know what combination of parameters would actually lead to the best results on a machine learning task.

Unfortunately, there are infinitely many combinations of hyperparameters we could try out. Should we combine `max_depth=3` with `subsample=0.8` or with `subsample=0.9?` What about the other parameters?

This is where hyperparameter tuning comes into play. By using tuning libraries such as Ray Tune, we can try out combinations of hyperparameters. Using sophisticated search strategies, these parameters can be selected so that they are likely to lead to good results (avoiding an expensive exhaustive search). Also, trials that do not perform well can be preemptively stopped to reduce waste of computing resources. 

Lastly, Ray Tune also takes care of training these runs in parallel, greatly increasing search speed.

Let’s start with a basic example on how to use Tune for this. We just need to make a few changes to our code-block:

In [9]:
from ray import tune

Add tune report to our XGBoost training function:

In [10]:
def train_tuned_model(config, checkpoint_dir=None):
    
    # Load dataset 
    train_x, test_x, train_y, test_y = get_data()
    
    # Build input DMatrices for XGBoost
    train_set = xgb.DMatrix(train_x, label=train_y)
    test_set  = xgb.DMatrix(test_x, label=test_y)
        
    # Train the classifier
    results = {}
    xgb.train(
         config,
         train_set,
         evals=[(test_set, "eval")],
         evals_result=results,
         verbose_eval=False)
    
    # Return prediction accuracy
    accuracy = 1. - results["eval"]["error"][-1]
    tune.report(mean_accuracy=accuracy, done=True)

### Define our Hyperparameter Search Space

In [11]:
config = {
    "objective": "binary:logistic",
    "eval_metric": ["logloss", "error"],
    "max_depth": tune.randint(1, 9),
    "min_child_weight": tune.choice([1, 2, 3]),
    "subsample": tune.uniform(0.5, 1.0),
    "eta": tune.loguniform(1e-4, 1e-1)
}

### Use Ray Tune parallelize our Hyperparameters tuning

This will automatically launch a Ray cluster on your laptop and schedule tasks. The `num_samples=10` option we pass to `tune.run()` means that we sample 10 different hyperparameter configurations from this search space, run across 10 CPUs

In [12]:
analysis = tune.run(train_tuned_model,
         resources_per_trial={"cpu": 10},
         config=config,
         mode="min",
         verbose=1,
         num_samples=10)

2022-03-16 16:46:43,374	INFO tune.py:639 -- Total run time: 29.64 seconds (28.73 seconds for the tuning loop).


In [13]:
print("Best Hyperparameter config: ", analysis.get_best_config(metric="mean_accuracy", mode="min"))

Best Hyperparamter config:  {'objective': 'binary:logistic', 'eval_metric': ['logloss', 'error'], 'max_depth': 7, 'min_child_weight': 3, 'subsample': 0.9197123063487549, 'eta': 0.010124005890316683}


### Step 4: Early ASHAScheduler for early stopping

Currently, in our example above, Tune samples 10 different hyperparameter configurations and trains a full XGBoost on all of them. In our small example, training is very fast. However, if training were done on a large dataset, it would
take much longer and a significant amount of computer resources would be spent on trials that would eventually show a bad performance, e.g., a low accuracy. It would be good if we could identify these trials early and stop them, so we don’t waste any resources.

This is where Tune’s Schedulers shine. A Tune `TrialScheduler` is responsible for starting and stopping trials. Tune implements a number of different schedulers, each described in the Tune documentation. For our example, we will use the `AsyncHyperBandScheduler` or `ASHAScheduler`.

The basic idea of this scheduler is simple. We sample a number of hyperparameter configurations. Each of these configurations is trained for a specific number of iterations. After these iterations, only the best performing hyperparameters are retained. These are selected according to some loss metric, usually an evaluation loss. This cycle is repeated until we end up with the best configuration.

The `ASHAScheduler` needs to know three things:

 * Which metric should be used to identify badly performing trials?

 * Should this metric be maximized or minimized?

 * How many iterations does each trial train for?

There are more parameters, which are explained in the [documentation](https://docs.ray.io/en/latest/tune/api_docs/schedulers.html#tune-schedulers).

Lastly, we have to report the loss metric to Tune. We do this with a Callback that XGBoost accepts and calls after each evaluation round. Ray Tune comes with [two XGBoost callbacks](https://docs.ray.io/en/latest/tune/api_docs/integration.html#tune-integration-xgboost) we can use for this. The `TuneReportCallback` just reports the evaluation metrics back to Tune. The `TuneReportCheckpointCallback` also saves checkpoints after each evaluation round. We will just use the latter in this example so that we can retrieve the saved model later.

These parameters from the `eval_metrics` configuration setting are then automatically reported to Tune via the callback. Here, the raw error will be reported, not the accuracy. To display the best reached accuracy, we will inverse it later.

We will also load the best checkpointed model so that we can use it for predictions. The best model is selected with respect to the `metric` and `mode` parameters we pass to `tune.run()`.

In [14]:
from ray.tune.schedulers import ASHAScheduler
from ray.tune.integration.xgboost import TuneReportCheckpointCallback

Let's modify our training function and add our callbacks

In [15]:
def train_tuned_asha_model(config: dict):
    # This is a simple training function to be passed into Tune
    # Load dataset
    
    # Load dataset 
    train_x, test_x, train_y, test_y = get_data()
    
    # Build input matrices for XGBoost
    train_set = xgb.DMatrix(train_x, label=train_y)
    test_set = xgb.DMatrix(test_x, label=test_y)
    # Train the classifier, using the Tune callback
    xgb.train(
        config,
        train_set,
        evals=[(test_set, "eval")],
        verbose_eval=False,
        callbacks=[TuneReportCheckpointCallback(filename="model.xgb")])

Write a helper function for loading callbacks and returning the best model with best configuration
after tuning

In [16]:
def get_best_model_checkpoint(analysis):
    best_bst = xgb.Booster()
    best_model_path = os.path.join(analysis.best_checkpoint, "model.xgb")
    best_bst.load_model(best_model_path)
    accuracy = 1. - analysis.best_result["eval-error"]
    print(f"Best model parameters: {analysis.best_config}")
    print(f"Best model total accuracy: {accuracy:.4f}")
    print(f"checkpoint best model path: {best_model_path}")
    return best_bst

Wrapper around our trainer to do actual tuning:
 * define search space
 * define our ASHAScheduler
 * run `tune.run(...)`
 * return the ExperimentAnalysis object from `tune.run()`

In [17]:
def tune_xgboost():
    search_space = {
        # You can mix constants with search space objects.
        "objective": "binary:logistic",
        "eval_metric": ["logloss", "error"],
        "max_depth": tune.randint(1, 9),
        "min_child_weight": tune.choice([1, 2, 3]),
        "subsample": tune.uniform(0.5, 1.0),
        "eta": tune.loguniform(1e-4, 1e-1)
    }
    # This will enable aggressive early stopping of bad trials.
    scheduler = ASHAScheduler(
        max_t=10,  # 10 training iterations
        grace_period=1,
        reduction_factor=2)

    analysis = tune.run(
        train_tuned_asha_model,   # our training function
        metric="eval-logloss", # eval metric
        mode="min",            # mode 
        # You can add "gpu": 0.1 to allocate GPUs
        resources_per_trial={"cpu": 1},
        config=search_space,
        num_samples=10,
        verbose=1,
        scheduler=scheduler)

    return analysis

Let's tune with our `ASHAScheduler`

In [18]:
analysis = tune_xgboost()

2022-03-16 16:47:39,853	INFO tune.py:639 -- Total run time: 3.60 seconds (3.46 seconds for the tuning loop).


In [19]:
print("Best Hyperparamter config: ", analysis.get_best_config(metric="eval-logloss", mode="min"))

Best Hyperparamter config:  {'objective': 'binary:logistic', 'eval_metric': ['logloss', 'error'], 'max_depth': 4, 'min_child_weight': 3, 'subsample': 0.545816348116087, 'eta': 0.04609844502524226}


As you can see, most trials have been stopped only after a few iterations. Only the two most promising trials were run for the full 10 iterations.

You can also ensure that all available resources are being used as the scheduler terminates trials, freeing them up. This can be done through the `ResourceChangingScheduler`. An example of this can be found here: [xgboost_dynamic_resources_example](https://docs.ray.io/en/latest/tune/examples/xgboost_dynamic_resources_example.html).



In [20]:
best_bst = get_best_model_checkpoint(analysis)

Best model parameters: {'objective': 'binary:logistic', 'eval_metric': ['logloss', 'error'], 'max_depth': 4, 'min_child_weight': 3, 'subsample': 0.545816348116087, 'eta': 0.04609844502524226}
Best model total accuracy: 0.7677
checkpoint best model path: /Users/jules/ray_results/train_tuned_asha_model_2022-03-16_16-47-36/train_tuned_asha_model_74ec5_00008_8_eta=0.046098,max_depth=4,min_child_weight=3,subsample=0.54582_2022-03-16_16-47-37/checkpoint_000005/model.xgb


### Step 5: Persist the best model 

In [21]:
best_bst.save_model("best_model.json")

### Get some test data for scoring

In [22]:
# Split into train and test set
train_x, test_x, train_y, test_y = get_data()
test_set = xgb.DMatrix(test_x, label=test_y)

### Step 6: Load the best persisted model

In [23]:
bst_model = xgb.Booster()
bst_model.load_model("best_model.json")

### Step 7: Test some predictions

In [24]:
pred = bst_model.predict(test_set)[:-1]
predictions = [round(value) for value in pred]
predictions[:25]

[0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0]

### Step 8: Create Deployment for Ray Serve

In [25]:
from fastapi import FastAPI, Request
import ray
from ray import serve

In [26]:
@serve.deployment(num_replicas=2, route_prefix="/regressor")
class XGBPimaIndianModel:
    def __init__(self):
        # Load the best saved model 
        self.bst_model = xgb.Booster()
        self.bst_model.load_model("best_model.json")
        print(type(self.bst_model))
        print("Best saved model loaded")
        
    async def __call__(self, starlette_request:Request):
        payload = await starlette_request.json()
        pred = xgb.DMatrix([np.array(list(payload.values()), dtype=np.float64)])
        prediction = round(np.float64(self.bst_model.predict(pred)[0]))
        
        return {"result": prediction}

### Step 9: Let's Deploy the model to Ray Serve

In [27]:
serve.start()
XGBPimaIndianModel.deploy()

[2m[36m(ServeController pid=65556)[0m 2022-03-16 16:49:01,251	INFO checkpoint_path.py:16 -- Using RayInternalKVStore for controller checkpoint and recovery.
[2m[36m(ServeController pid=65556)[0m 2022-03-16 16:49:01,355	INFO http_state.py:98 -- Starting HTTP proxy with name 'SERVE_CONTROLLER_ACTOR:hiAFWu:SERVE_PROXY_ACTOR-node:127.0.0.1-0' on node 'node:127.0.0.1-0' listening on '127.0.0.1:8000'
2022-03-16 16:49:02,321	INFO api.py:521 -- Started Serve instance in namespace '8f9e83df-2c81-4ce2-96b9-125f25062128'.
2022-03-16 16:49:02,327	INFO api.py:262 -- Updating deployment 'XGBPimaIndianModel'. component=serve deployment=XGBPimaIndianModel
[2m[36m(HTTPProxyActor pid=65732)[0m INFO:     Started server process [65732]
[2m[36m(ServeController pid=65556)[0m 2022-03-16 16:49:02,422	INFO deployment_state.py:920 -- Adding 2 replicas to deployment 'XGBPimaIndianModel'. component=serve deployment=XGBPimaIndianModel
[2m[36m(XGBPimaIndianModel pid=65734)[0m   from pandas import Mul

[2m[36m(XGBPimaIndianModel pid=65734)[0m <class 'xgboost.core.Booster'>
[2m[36m(XGBPimaIndianModel pid=65734)[0m Best saved model loaded
[2m[36m(XGBPimaIndianModel pid=65735)[0m <class 'xgboost.core.Booster'>
[2m[36m(XGBPimaIndianModel pid=65735)[0m Best saved model loaded


### Step 10: Score the model

In [28]:
sample_request_inputs = [
        {"Pregnancies": 6,
         "Glucose": 148,
         "BloodPressure": 72,
         "SkinThickness": 35,
         "Insulin": 0,
         "BMI": np.float64(33.6),
         "DiabetesPedigree": np.float64(0.625),
         "Age": 50,
         },
        {"Pregnancies": 10,
         "Glucose": 168,
         "BloodPressure": 74,
         "SkinThickness": 0,
         "Insulin": 0,
         "BMI": 38.0,
         "DiabetesPedigree": 0.537,
         "Age": 34,
         },
        {"Pregnancies": 10,
         "Glucose": 39,
         "BloodPressure": 80,
         "SkinThickness": 0,
         "Insulin": 0,
         "BMI": 27.1,
         "DiabetesPedigree": 1.441,
         "Age": 57,
         },
        {"Pregnancies": 1,
         "Glucose": 103,
         "BloodPressure": 30,
         "SkinThickness": 38,
         "Insulin": 83,
         "BMI": 43.3,
         "DiabetesPedigree": 0.183,
         "Age": 33,
         }
    ]

In [29]:
import requests
for sri in sample_request_inputs: 
    response = requests.get("http://localhost:8000/regressor", json=sri).json()
    print(response)

{'result': 1}
{'result': 1}
{'result': 0}
{'result': 0}


In [30]:
ray.shutdown()

### Conclusion
You should now have a basic understanding on how to train XGBoost models and on how to tune the hyperparameters to yield the best results. In our simple example, Tuning the parameters didn’t make a huge difference for the accuracy. But in larger applications, intelligent hyperparameter tuning can make the difference between a model that doesn’t seem to learn at all, and a model that outperforms all the other ones.

Also, once you tuned and persisted the models, you can easily deploy the best model to Ray Serve and score.



### Further References

1. [XGBoost Hyperparameter Tuning - A Visual Guide](https://kevinvecmanis.io/machine%20learning/hyperparameter%20tuning/dataviz/python/2019/05/11/XGBoost-Tuning-Visual-Guide.html)

2. [Notes on XGBoost Parameter Tuning](https://xgboost.readthedocs.io/en/latest/tutorials/param_tuning.html)

3. [Doing XGBoost Hyperparameter Tuning the smart way](https://towardsdatascience.com/doing-xgboost-hyper-parameter-tuning-the-smart-way-part-1-of-2-f6d255a45dde)
4. [Three ways to speed up XGBoost model training](https://www.anyscale.com/blog/three-ways-to-speed-up-xgboost-model-training)