## AdaBoost Regression

What boosting does is that it makes multiple models in a sequential manner. Each newer model tries to successfully predict what older models struggled with. For regression, the average of the models are used for predictions. It is often most common to use boosting with decision trees but this approach can be used with any machine learning algoriths that deals with supervised learning

Boosting is associated with ensemble learning several models are created thst are averaged together. An assumption of boosting is that, combining several weak models can make one really strong and accurate model.

For our purposes, we will be using AdaBoost to improve th performance of the decision tree. We will us cancer dataset from the pydataset library. Our goal is to be predict the weight loss of a patient based on several independent variables. The steps are as follows:
- Data Preperation
- Regression decision tree baseline model
- Hyperparameter tuning of AdaBoost Regression model
- AdaBoost regression model development

In [1]:
from sklearn.ensemble import AdaBoostRegressor
from sklearn import tree
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV, cross_val_score, KFold
from sklearn.metrics import mean_squared_error
from pydataset import data
import numpy as np
import pandas as pd

### Data Preperation

In [2]:
df = data("cancer").dropna()

In [3]:
X = df[['time', 'sex', 'ph.karno', 'pat.karno', 'status', 'meal.cal']]
y = df['wt.loss']

### Baseline Regression Tree Model

The purpose of the baseline model is for competing it to performance of our model that utilizes adaboost. In order to make this model, we need to initialize a KFold cross-validation. this will help in stabilizing th results. Next we will create a for loop so thst we can create several trees that vary based on their depth. By depth, it meant how far the tree can go to purify the classification. More depth often leads to a higher likelihood of overfitting

In [4]:
crossvalidation = KFold(n_splits = 10, shuffle = True, random_state = 1)

for depth in range(1, 11):
    tree_regressor = tree.DecisionTreeRegressor(max_depth = depth, 
                                               random_state = 1)
    if tree_regressor.fit(X, y).tree_.max_depth < depth:
        break
    score = np.mean(cross_val_score(tree_regressor, X, y, 
                                   scoring = 'neg_mean_squared_error',
                                   cv = crossvalidation, n_jobs = 1))
    print(depth, score)

1 -193.55304528235052
2 -176.27520747356175
3 -209.2846723461564
4 -218.80238479654003
5 -222.4393459885871
6 -249.95330609042858
7 -286.76842138165705
8 -294.0290706405905
9 -287.39016236497804
10 -318.9378775573167


Looks like a tree with depth of 2 has the lowest amount of error. We can now move to the hyperparameters for the adaBoost Algorithm.

### Hyperparamter Tuning

For hyperparameter tuning, we ned to start by initiating our AdaBoostRegressor() class. Then we need to create our grid. The grid will address two hyperparameters which are the number of estimators and the learning rate. The number of estimators tells Python how many models to make and the learning indicates how each tree contributes to the overall results. There is one more paramters which is random_state but this is just for setting the seed and never changes.

After making the grid, we need to use the GridSearchCV function to finish this process. Inside this function you have to set the estimator which is adaBoostRegressor, the parameter grid which we just made, the cross validation whcih we made wen we created the baseline, and the n_jobs which allocates resources for the calculation.

In [5]:
ada = AdaBoostRegressor()
search_grid = {
    'n_estimators': [500, 1000, 2000],
    'learning_rate': [0.001,0.01,0.1],
    'random_state': [1]
}
search = GridSearchCV(estimator = ada, param_grid = search_grid, 
                      scoring = 'neg_mean_squared_error',
                     n_jobs = 1, cv = crossvalidation)

In [6]:
print(search_grid)

{'n_estimators': [500, 1000, 2000], 'learning_rate': [0.001, 0.01, 0.1], 'random_state': [1]}


In [7]:
search

In [8]:
search.fit(X, y)
print(search.best_params_)
print(search.best_score_)

{'learning_rate': 0.01, 'n_estimators': 500, 'random_state': 1}
-165.10439582389532


The best mix of hyperparameter is a learning rate of 0.01 and 500 estimators. This mix led to a mean error square 164, which is a little lower than our single decision tree of 176. 

In [9]:
ada2 = AdaBoostRegressor(n_estimators = 500, learning_rate = 0.01, 
                         random_state = 1)
score = np.mean(cross_val_score(ada2, X, y, scoring = 'neg_mean_squared_error',
                               cv = crossvalidation, n_jobs = 1))
print(score)

-165.10439582389532
