# Students
- Ghaith Sarahnour
- Roiseux Thomas

# Introduction
## Goal
The goal of this project is to implement several machine learning algorithms to predict any risk of heart disease.
## Dataset
### Dataset description
We will use [this dataset](http://www-stat.stanford.edu/~tibs/ElemStatLearn/datasets/SAheart.data) to train our models. It contains 462 observations and 10 variables. The variables are the following:

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import display
from typing import Tuple

from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import (
    RandomForestClassifier,
    BaggingClassifier,
    AdaBoostClassifier,
    GradientBoostingClassifier,
    StackingClassifier,
)
from sklearn.model_selection import train_test_split
from sklearn.metrics import (
    accuracy_score,
    precision_score,
    f1_score,
    roc_auc_score,
    roc_curve,
)

with open(r"SAheartinfo.txt") as f:
    print(f.read())

NUMBER_TRIALS = 20

We are now going to visualize the dataset and its first lines.

In [None]:
df = pd.read_csv(r"SAheart.txt", sep=",", header=0, index_col=0)
df.index.name = "ID"

print("Head of dataframe")
display(df.head())

print("Description of dataframe")
display(df.describe())

print("Info on dataframe")
display(df.info())

Now, we are going to visualize the number of heart attacks in our dataset, and its proportion among the population.

In [None]:
print(
    f"Number of heart-attack cases among the total cases: {df['chd'].sum()} / {df['chd'].count()} ({df['chd'].sum() / 462 * 100}%)."
)
plt.figure(figsize=(10, 5))
plt.title("Distribution of heart-attack cases")

plt.barh(["Heart attack", "No heart attack"], [df["chd"].sum(), 462 - df["chd"].sum()])
plt.xlabel("Number of cases")

plt.show()

### Dataset analysis
We now want to proceed to a a more in-depth analysis of the dataset. We will first look at the distribution of the variables and then at the correlation between them.

In [None]:
df["famhist"] = df["famhist"].apply(lambda x: 1 if x == "Present" else 0)
corr = df.corr()
corr.style.background_gradient(cmap="coolwarm")
corr

In [None]:
sns.heatmap(corr, annot=True, cmap="coolwarm")

From this preliminary analysis, we notice that we have a correlation between obesity and adiposity, as both of them are related to the amount of fat in a human body.
We might then consider dropping one of them, as they are highly correlated. Let's drop `adiposity` for now.

In [None]:
df.drop(columns=["adiposity"], inplace=True)

# Classification models
We are going to use three classification models:
- Classification Decision Tree.
- Bagging.
- Random Forest.

All these models will be trained on the dataset we just cleaned. We will then compare their performances.
They will be provided by the `scikit-learn` library.

For each model, we will try several hyperparameters and choose the best one.
Then we will compare the performances of the three models, using the best parameters we found.

## Classification Decision Tree
### Model
A decision tree is a model that predicts the value of a target variable by learning simple decision rules inferred from the data features. It is a non-parametric supervised learning method used for classification and regression.
### Hyperparameters
#### Split
The `split` hyperparameter is the strategy used to choose the split at each node. It can be `best` or `random`. The `best` strategy chooses the best split, while the `random` strategy chooses the best random split.
#### Leaf
The `leaf` hyperparameter is the minimum number of samples required to be at a leaf node. This hyperparameter is used to avoid overfitting.
#### Deviance
The `deviance` hyperparameter is the loss function used in the model. It is used to measure the quality of a split. It is the difference between the impurity of the parent node and the sum of the impurities of the child nodes.

### Basic model
We are first going to build a tree classifier with default parameters, to have a baseline to compare our other models with.

In [None]:
def build_and_run_decision_tree(
    **kwargs,
) -> Tuple[DecisionTreeClassifier, dict[str, float]]:
    tree = DecisionTreeClassifier(**kwargs)
    acc, prec, f1, roc = [], [], [], []
    for _ in range(NUMBER_TRIALS):
        X_train, X_test, Y_train, Y_test = train_test_split(
            df.drop(columns=["chd"]), df["chd"], test_size=0.2, random_state=42
        )
        tree.fit(X_train, Y_train)
        Y_pred = tree.predict(X_test)
        acc.append(accuracy_score(Y_test, Y_pred))
        prec.append(precision_score(Y_test, Y_pred))
        f1.append(f1_score(Y_test, Y_pred))
        roc.append(roc_auc_score(Y_test, Y_pred))
    return tree, {
        "accuracy": np.mean(acc),
        "precision": np.mean(prec),
        "f1": np.mean(f1),
        "roc_auc": np.mean(roc),
    }

In [None]:
tree, power = build_and_run_decision_tree()

power_df = pd.DataFrame(power, index=["Basic tree"])
power_df.index.name = "Model"

display(power_df)

This basic first tree classifier has a very bad score. Let's try to improve it.
### Hyperparameters tuning

In [None]:
tree2, power = build_and_run_decision_tree(min_samples_split=10, min_samples_leaf=6)

power_df = pd.concat(
    [power_df, pd.DataFrame(power, index=["Tree with 10 samples, 6 leafs"])]
)

display(power_df)

In [None]:
_, X_test, _, Y_test = train_test_split(
    df.drop(columns=["chd"]), df["chd"], test_size=0.2, random_state=42
)

plt.figure(figsize=(10, 5))
plt.title("ROC curve for the trees")
fpr, tpr, _ = roc_curve(Y_test, tree.predict_proba(X_test)[:, 1])
plt.plot(fpr, tpr, label="Default")
fpr, tpr, _ = roc_curve(Y_test, tree2.predict_proba(X_test)[:, 1])
plt.plot(fpr, tpr, label="Tuned")
plt.xlabel("False positive rate")
plt.ylabel("True positive rate")
plt.legend()
plt.show()

This model is more powerful than the previous one.
Empirical testing shows that the best parameters are `min_samples_split=10` and `min_samples_leaf=6`.
The ROC curve confirms that this model is better than the previous one.

We are now going to try another classification model, the bagging classifier.

## Bagging classifier
We are going to try the same approach as before, with the bagging classifier.
### Model
A bagging classifier is an ensemble meta-estimator that fits base classifiers each on random subsets of the original dataset and then aggregate their individual predictions to form a final prediction.
### Hyperparameters
#### Base estimator
The `base_estimator` hyperparameter is the base estimator to fit on random subsets of the dataset. It can be `None` or a classifier. If `None`, then the base estimator is a decision tree.
#### `n_estimators`
The `n_estimators` hyperparameter is the number of base estimators in the ensemble.
#### `max_samples`
The `max_samples` hyperparameter is the number of samples to draw from `X` to train each base estimator.
#### `max_features`
The `max_features` hyperparameter is the number of features to draw from `X` to train each base estimator.
#### `bootstrap`
The `bootstrap` hyperparameter is whether samples are drawn with replacement. If `False`, sampling without replacement is performed.

### Basic model

In [None]:
def build_and_run_bagging(
    **kwargs,
) -> Tuple[BaggingClassifier, dict[str, float]]:
    tree = BaggingClassifier(**kwargs)
    acc, prec, f1, roc = [], [], [], []
    for _ in range(NUMBER_TRIALS):
        X_train, X_test, Y_train, Y_test = train_test_split(
            df.drop(columns=["chd"]), df["chd"], test_size=0.2, random_state=42
        )
        tree.fit(X_train, Y_train)
        Y_pred = tree.predict(X_test)
        acc.append(accuracy_score(Y_test, Y_pred))
        prec.append(precision_score(Y_test, Y_pred))
        f1.append(f1_score(Y_test, Y_pred))
        roc.append(roc_auc_score(Y_test, Y_pred))
    return tree, {
        "accuracy": np.mean(acc),
        "precision": np.mean(prec),
        "f1": np.mean(f1),
        "roc_auc": np.mean(roc),
    }

In [None]:
bagging, power = build_and_run_bagging()

power_df = pd.concat([power_df, pd.DataFrame(power, index=["Basic Bagging"])])

display(power_df)

This model is not very good. Let's try to improve it.

### Hyperparameters tuning
We will first try to use the previous good tree classifier as a base estimator.

In [None]:
bagging2, power = build_and_run_bagging(
    estimator=DecisionTreeClassifier(min_samples_split=10, min_samples_leaf=6)
)

power_df = pd.concat(
    [power_df, pd.DataFrame(power, index=["Bagging with efficient tree"])]
)

display(power_df)

This model is better than the previous one. Let's try to improve it.
It is nearly as powerful as the previous tree classifier.
Let's try to tune the other hyperparameters to still improve it.

In [None]:
bagging3, power = build_and_run_bagging(
    estimator=DecisionTreeClassifier(min_samples_split=10, min_samples_leaf=6),
    max_samples=75,
)

power_df = pd.concat(
    [
        power_df,
        pd.DataFrame(power, index=["Bagging with efficient tree, 75 max samples"]),
    ]
)

display(power_df)

Fixing `max_samples` to $75$ seems to be a good idea, as accuracy and precision seems to have lower variance.

In [None]:
plt.figure(figsize=(10, 5))
plt.title("ROC curve for the Bagging classifiers")
fpr, tpr, _ = roc_curve(Y_test, bagging.predict_proba(X_test)[:, 1])
plt.plot(fpr, tpr, label="Default")
fpr, tpr, _ = roc_curve(Y_test, bagging2.predict_proba(X_test)[:, 1])
plt.plot(fpr, tpr, label="Efficient classifier")
fpr, tpr, _ = roc_curve(Y_test, bagging3.predict_proba(X_test)[:, 1])
plt.plot(fpr, tpr, label="Tuned and efficient classifier")
plt.xlabel("False positive rate")
plt.ylabel("True positive rate")
plt.legend()
plt.show()

After tuning the hyperparameters, we get a model that is better than the previous one.
Let's now try the last one, the random forest classifier.

## Random forest classifier
### Model
A random forest classifier is an ensemble meta-estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.
### Hyperparameters
#### `n_estimators`
The `n_estimators` hyperparameter is the number of trees in the forest.
#### `max_depth`
The `max_depth` hyperparameter is the maximum depth of the tree.
#### `min_samples_split`
The `min_samples_split` hyperparameter is the minimum number of samples required to split an internal node.
#### `min_samples_leaf`
The `min_samples_leaf` hyperparameter is the minimum number of samples required to be at a leaf node.
### Basic model
Let's first fit a random forest classifier with default parameters.

In [None]:
def build_and_run_rf(
    **kwargs,
) -> Tuple[RandomForestClassifier, dict[str, float]]:
    tree = RandomForestClassifier(**kwargs)
    acc, prec, f1, roc = [], [], [], []
    for _ in range(NUMBER_TRIALS):
        X_train, X_test, Y_train, Y_test = train_test_split(
            df.drop(columns=["chd"]), df["chd"], test_size=0.2, random_state=42
        )
        tree.fit(X_train, Y_train)
        Y_pred = tree.predict(X_test)
        acc.append(accuracy_score(Y_test, Y_pred))
        prec.append(precision_score(Y_test, Y_pred))
        f1.append(f1_score(Y_test, Y_pred))
        roc.append(roc_auc_score(Y_test, Y_pred))
    return tree, {
        "accuracy": np.mean(acc),
        "precision": np.mean(prec),
        "f1": np.mean(f1),
        "roc_auc": np.mean(roc),
    }

In [None]:
rf, power = build_and_run_rf()

power_df = pd.concat([power_df, pd.DataFrame(power, index=["Basic Random Forest"])])

display(power_df)

### Hyperparameters tuning
This model seems to be better than the previous ones.
Let's use the tuned hyperparameters of the previous tree classifier to see if we can improve it.

In [None]:
rf2, power = build_and_run_rf(
    min_samples_split=10, min_samples_leaf=6, n_estimators=100, max_samples=75
)

power_df = pd.concat([power_df, pd.DataFrame(power, index=["Tuned Random Forest"])])

display(power_df)

These hyperparameters seems to be the best ones for this model.

In [None]:
plt.figure(figsize=(10, 5))
plt.title("ROC curve for the random forests")
fpr, tpr, _ = roc_curve(Y_test, rf.predict_proba(X_test)[:, 1])
plt.plot(fpr, tpr, label="Default")
fpr, tpr, _ = roc_curve(Y_test, rf2.predict_proba(X_test)[:, 1])
plt.plot(fpr, tpr, label="Efficient classifier")
plt.xlabel("False positive rate")
plt.ylabel("True positive rate")
plt.legend()
plt.show()

## AdaBoost classifier
### Model
An AdaBoost classifier is a meta-estimator that begins by fitting a classifier on the original dataset and then fits additional copies of the classifier on the same dataset but where the weights of incorrectly classified instances are adjusted such that subsequent classifiers focus more on difficult cases.

### Hyperparameters
#### `base_estimator`
The `base_estimator` hyperparameter is the base estimator from which the boosted ensemble is built. It must be a classifier.
By default, it is a decision tree.
#### `n_estimators`
The `n_estimators` hyperparameter is the maximum number of estimators at which boosting is terminated.
By default, it is $50$.
#### `learning_rate`
The `learning_rate` hyperparameter shrinks the contribution of each classifier by `learning_rate`. There is a trade-off between `learning_rate` and `n_estimators`.
By default, it is $1$.

### Basic model

In [None]:
def build_and_run_adaboost(
    **kwargs,
) -> Tuple[AdaBoostClassifier, dict[str, float]]:
    tree = AdaBoostClassifier(**kwargs)
    acc, prec, f1, roc = [], [], [], []
    for _ in range(NUMBER_TRIALS):
        X_train, X_test, Y_train, Y_test = train_test_split(
            df.drop(columns=["chd"]), df["chd"], test_size=0.2, random_state=42
        )
        tree.fit(X_train, Y_train)
        Y_pred = tree.predict(X_test)
        acc.append(accuracy_score(Y_test, Y_pred))
        prec.append(precision_score(Y_test, Y_pred))
        f1.append(f1_score(Y_test, Y_pred))
        roc.append(roc_auc_score(Y_test, Y_pred))
    return tree, {
        "accuracy": np.mean(acc),
        "precision": np.mean(prec),
        "f1": np.mean(f1),
        "roc_auc": np.mean(roc),
    }

In [None]:
ada_class, power = build_and_run_adaboost()

power_df = pd.concat(
    [power_df, pd.DataFrame(power, index=["Basic Adaboost classifier"])]
)

display(power_df)

The basic Adaboost classifier is already better than the tuend random forest but it doesn't overcome the tuned decision tree.
Let's adjust its hyperparameters to see if we can improve it.

### Hyperparameters tuning
Let's try to tune the hyperparameters to see if we can improve the model.
First, we can use the tuned hyperparameters of the previous decision tree.

In [None]:
ada_class, power = build_and_run_adaboost(
    estimator=DecisionTreeClassifier(min_samples_split=10, min_samples_leaf=6)
)

power_df = pd.concat(
    [
        power_df,
        pd.DataFrame(power, index=["Adaboost classifier with Tuned Decision Tree"]),
    ]
)

display(power_df)

This seems to be worse than before.
Let's try to adjust the learning rate.

In [None]:
ada_class2, power = build_and_run_adaboost(learning_rate=0.5)

power_df = pd.concat(
    [
        power_df,
        pd.DataFrame(power, index=["Adaboost classifier with learning rate 0.5"]),
    ]
)

display(power_df)

In [None]:
ada_class3, power = build_and_run_adaboost(learning_rate=2)

power_df = pd.concat(
    [power_df, pd.DataFrame(power, index=["Adaboost classifier with learning rate 2"])]
)

display(power_df)

Previous results shows that a lower learning rate seems to provide better results. Let's try to lower it even more.

In [None]:
ada_class4, power = build_and_run_adaboost(learning_rate=0.25)

power_df = pd.concat(
    [
        power_df,
        pd.DataFrame(power, index=["Adaboost classifier with learning rate 0.25"]),
    ]
)

ada_class5, power = build_and_run_adaboost(learning_rate=0.1)

power_df = pd.concat(
    [
        power_df,
        pd.DataFrame(power, index=["Adaboost classifier with learning rate 0.1"]),
    ]
)

display(power_df)

Depending on the chosen criteria, the best for `AdaBoostClassifier` is either `learning_rate=0.1` or `learning_rate=0.25`.

In [None]:
plt.figure(figsize=(10, 5))
plt.title("ROC curve for the AdaBoost classifier")
fpr, tpr, _ = roc_curve(Y_test, ada_class.predict_proba(X_test)[:, 1])
plt.plot(fpr, tpr, label="Default")
fpr, tpr, _ = roc_curve(Y_test, ada_class4.predict_proba(X_test)[:, 1])
plt.plot(fpr, tpr, label="AdaBoost with learning rate 0.25")
fpr, tpr, _ = roc_curve(Y_test, ada_class5.predict_proba(X_test)[:, 1])
plt.plot(fpr, tpr, label="AdaBoost with learning rate 0.1")
plt.xlabel("False positive rate")
plt.ylabel("True positive rate")
plt.legend()
plt.show()

The ROC curve shows that the best model is the one with `learning_rate=0.1`.

## Gradient Boosting classifier
### Model
A gradient boosting classifier is a meta-estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and uses boosting to improve the predictive accuracy and control over-fitting.

### Hyperparameters
#### `loss`
The `loss` hyperparameter is the loss function to be optimized. It must be `deviance` or `exponential`.
By default, it is `deviance`.
#### `learning_rate`
The `learning_rate` hyperparameter shrinks the contribution of each classifier by `learning_rate`. There is a trade-off between `learning_rate` and `n_estimators`.
By default, it is $0.1$.
#### `n_estimators`
The `n_estimators` hyperparameter is the number of boosting stages to perform. Gradient boosting is fairly robust to over-fitting so a large number usually results in better performance.
By default, it is $100$.
#### `subsample`
The `subsample` hyperparameter is the fraction of samples to be used for fitting the individual base learners. If smaller than $1.0$, this results in Stochastic Gradient Boosting. `subsample` interacts with the `n_estimators` hyperparameter. Choosing `subsample < 1.0` leads to a reduction of variance and an increase in bias.
By default, it is $1.0$.
#### `criterion`
The `criterion` hyperparameter is the function to measure the quality of a split. It must be `friedman_mse`, `mse` or `mae`.
By default, it is `friedman_mse`.

### Basic model

In [None]:
def build_and_run_gradient_boosting(
    **kwargs,
) -> Tuple[GradientBoostingClassifier, dict[str, float]]:
    tree = GradientBoostingClassifier(**kwargs)
    acc, prec, f1, roc = [], [], [], []
    for _ in range(NUMBER_TRIALS):
        X_train, X_test, Y_train, Y_test = train_test_split(
            df.drop(columns=["chd"]), df["chd"], test_size=0.2, random_state=42
        )
        tree.fit(X_train, Y_train)
        Y_pred = tree.predict(X_test)
        acc.append(accuracy_score(Y_test, Y_pred))
        prec.append(precision_score(Y_test, Y_pred))
        f1.append(f1_score(Y_test, Y_pred))
        roc.append(roc_auc_score(Y_test, Y_pred))
    return tree, {
        "accuracy": np.mean(acc),
        "precision": np.mean(prec),
        "f1": np.mean(f1),
        "roc_auc": np.mean(roc),
    }

In [None]:
gb, power = build_and_run_gradient_boosting()

power_df = pd.concat(
    [power_df, pd.DataFrame(power, index=["Basic Gradient Boosting classifier"])]
)

display(power_df)

This model looks basically correct but we can improve it by tuning the hyperparameters.
### Hyperparameters tuning
Let's try to tune the hyperparameters to see if we can improve the model.
Let's try first $1\,000$ estimators.

In [None]:
gb2, power = build_and_run_gradient_boosting(n_estimators=150)

power_df = pd.concat(
    [
        power_df,
        pd.DataFrame(power, index=["Gradient Boosting classifier with 150 estimators"]),
    ]
)

display(power_df)

It seems that increasing this number leads to worse results.
Let's try 50 estimators.

In [None]:
gb3, power = build_and_run_gradient_boosting(n_estimators=50)

power_df = pd.concat(
    [
        power_df,
        pd.DataFrame(power, index=["Gradient Boosting classifier with 50 estimators"]),
    ]
)

display(power_df)

We have the same situation. Let's change the parameter and tune `learning_rate` to $0.01$.

In [None]:
gb4, power = build_and_run_gradient_boosting(learning_rate=0.01)

power_df = pd.concat(
    [
        power_df,
        pd.DataFrame(
            power, index=["Gradient Boosting classifier with learning rate 0.01"]
        ),
    ]
)

display(power_df)

This reduces most of the scores. Let's try to increase it to $0.2$.

In [None]:
gb5, power = build_and_run_gradient_boosting(learning_rate=0.2)

power_df = pd.concat(
    [
        power_df,
        pd.DataFrame(
            power, index=["Gradient Boosting classifier with learning rate 0.2"]
        ),
    ]
)

display(power_df)

This also reduces the scores.
It seems that the basic model is the best one.

In [None]:
plt.figure(figsize=(10, 5))
plt.title("ROC curve for the Gradient Boosting classifier")
fpr, tpr, _ = roc_curve(Y_test, gb.predict_proba(X_test)[:, 1])
plt.plot(fpr, tpr, label="Default")
fpr, tpr, _ = roc_curve(Y_test, gb3.predict_proba(X_test)[:, 1])
plt.plot(fpr, tpr, label="Gradient Boosting with 50 estimators")
fpr, tpr, _ = roc_curve(Y_test, gb4.predict_proba(X_test)[:, 1])
plt.plot(fpr, tpr, label="Gradient Boosting with learning rate 0.01")
fpr, tpr, _ = roc_curve(Y_test, gb5.predict_proba(X_test)[:, 1])
plt.plot(fpr, tpr, label="Gradient Boosting with learning rate 0.2")
plt.xlabel("False positive rate")
plt.ylabel("True positive rate")
plt.legend()
plt.show()

This curve shows that using `learning_rate=0.2` is not a good idea.
However, other hyperparameters seems to give quite te same results, altough using `learning_rate=0.01` seems to be a good idea.

## Stacking classifier
### Model
A stacking classifier is an ensemble-learning meta-classifier for stacking. It is useful for combining multiple estimators, for example, by averaging their predictions.
### Hyperparameters
#### `estimators`
The `estimators` hyperparameter is a list of estimators to be fitted on the data. Each estimator must have a `fit` method. The final estimator is fitted on the concatenated results of the predictions of the estimators in `estimators`.
#### `final_estimator`
The `final_estimator` hyperparameter is an estimator which is used to combine the base estimators. It must have a `fit` method. If `None`, then a `LogisticRegression` classifier is used.
#### `cv`
The `cv` hyperparameter determines the cross-validation splitting strategy. It must be an iterable yielding pairs of train, test splits. If `None`, then `KFold` is used.
#### `stack_method`
The `stack_method` hyperparameter is the method used to stack the base estimators. It can be `auto`, `predict_proba` or `decision_function`.
#### `n_jobs`
The `n_jobs` hyperparameter is the number of jobs to run in parallel. It must be `None` or an integer. If `None`, then `1` is used.
Here, we won't use it as we are exporing the model, not trying to optimize it for each CPU.

### Basic model

In [None]:
def build_and_run_stacking(
    **kwargs,
) -> Tuple[StackingClassifier, dict[str, float]]:
    tree = StackingClassifier(**kwargs)
    acc, prec, f1, roc = [], [], [], []
    for _ in range(NUMBER_TRIALS):
        X_train, X_test, Y_train, Y_test = train_test_split(
            df.drop(columns=["chd"]), df["chd"], test_size=0.2, random_state=42
        )
        tree.fit(X_train, Y_train)
        Y_pred = tree.predict(X_test)
        acc.append(accuracy_score(Y_test, Y_pred))
        prec.append(precision_score(Y_test, Y_pred))
        f1.append(f1_score(Y_test, Y_pred))
        roc.append(roc_auc_score(Y_test, Y_pred))
    return tree, {
        "accuracy": np.mean(acc),
        "precision": np.mean(prec),
        "f1": np.mean(f1),
        "roc_auc": np.mean(roc),
    }

In [None]:
stacking, power = build_and_run_stacking(estimators=[("rf", rf), ("gb", gb)])

power_df = pd.concat(
    [
        power_df,
        pd.DataFrame(
            power,
            index=["Stacking classifier with random forest and gradient boosting"],
        ),
    ]
)

display(power_df)

# Comparison of all classifiers
Using the `power` data frame, we can now compare the three models we just trained.

In [None]:
power_df.index.name = "Model"
power_df.columns = ["Accuracy", "Precision", "F1 score", "ROC AUC"]


power_df

Overall, the best model is the AdaBoost classifier, using `learning_rate=0.1`.