In [1]:
import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt 
import seaborn as sns
from sklearn.model_selection import train_test_split

## Define the AdaBoost classifier
In the previous we'll revisit the Indian Liver Patient dataset . Our task is to predict whether a patient suffers from a liver disease using 10 features including Albumin, age and gender. However, this time, we'll be training an AdaBoost ensemble to perform the classification task. In addition, given that this dataset is imbalanced, we'll be using the ROC AUC score as a metric instead of accuracy

In [2]:
df = pd.read_csv('indian_liver_patient.csv')
df['Gender'] = df.Gender.replace({'Male':0, 'Female':1})

In [3]:
df['Albumin_and_Globulin_Ratio'] = df['Albumin_and_Globulin_Ratio'].fillna(0)

In [4]:
X = df.drop(['Gender'], axis = 1)
y = df['Gender']

In [5]:
X_train, X_test,y_train,y_test =  train_test_split(X, y , test_size =0.3)

In [6]:
# Import DecisionTreeClassifier
from sklearn.tree import DecisionTreeClassifier

# Import AdaBoostClassifier
from sklearn.ensemble import AdaBoostClassifier

# Instantiate dt
dt = DecisionTreeClassifier(max_depth=2, random_state=1)

# Instantiate ada
ada = AdaBoostClassifier(base_estimator=dt, n_estimators=180, random_state=1)

## Train the AdaBoost classifier
Now that we've instantiated the AdaBoost classifier ada, it's time train it. We will also predict the probabilities of obtaining the positive class in the test set.

In [7]:
# Fit ada to the training set
ada.fit(X_train, y_train)
# Compute the probabilities of obtaining the positive class
y_pred_proba = ada.predict_proba(X_test)[:,1]

## Evaluate the AdaBoost classifier
Now that we're done training ada and predicting the probabilities of obtaining the positive class in the test set, it's time to evaluate ada's ROC AUC score. 

In [8]:
# Import roc_auc_score
from sklearn.metrics import roc_auc_score

# Evaluate test-set roc_auc_score
ada_roc_auc = roc_auc_score(y_test, y_pred_proba)

# Print roc_auc_score
print('ROC AUC score: {:.2f}'.format(ada_roc_auc))

ROC AUC score: 0.61


## Define the GB regressor
We'll now revisit the Bike Sharing Demand dataset that was introduced in the previous excercise. Recall that your task is to predict the bike rental demand using historical weather data from the Capital Bikeshare program in Washington, D.C.. For this purpose, we'll be using a gradient boosting regressor.

In [9]:
# Import GradientBoostingRegressor
from sklearn.ensemble import GradientBoostingRegressor

# Instantiate gb
gb = GradientBoostingRegressor(max_depth=4, 
            n_estimators=200,
            random_state=2)

## Train the GB regressor
We'll now train the gradient boosting regressor gb that you instantiated in the previous exercise and predict test set labels.

In [10]:
# Fit gb to the training set
gb.fit(X_train, y_train)

# Predict test set labels
y_pred = gb.predict(X_test)

In [11]:
y_pred

array([ 5.49384972e-01,  6.28323124e-01, -2.35110911e-02,  2.79604865e-01,
        5.10372249e-01, -4.20944318e-03,  4.31618511e-01,  1.51298396e-01,
        1.18445540e-01,  3.20002758e-01, -1.51899002e-01, -8.94373077e-02,
        3.96799111e-01,  1.87078526e-01,  5.16332200e-02,  7.11248698e-02,
        3.80745715e-01,  4.54389272e-01,  5.92319353e-02,  2.29626495e-02,
        2.30458165e-01,  7.26692704e-01,  2.31226216e-01, -5.33120152e-02,
        7.43992681e-03,  6.64506219e-01,  4.76930021e-01, -3.01071764e-02,
        1.69521510e-01,  3.30784288e-01,  4.50630510e-01,  5.11871714e-01,
        1.45185019e-01,  3.14289539e-01,  3.99178022e-02, -4.42850384e-02,
        2.33438000e-01,  4.79954963e-02,  1.13304342e-01,  1.96454992e-01,
        1.48842718e-02,  7.88983168e-01, -9.99936947e-03,  1.29475755e-01,
        3.17281469e-01, -1.11236959e-01,  2.99971389e-01,  1.64559320e-01,
        7.27357557e-03,  3.54325811e-01,  8.07287022e-02,  6.72040871e-01,
        5.94712345e-01,  

## Evaluate the GB regressor
Now that the test set predictions are available, we can use them to evaluate the test set Root Mean Squared Error (RMSE) of gb.

In [12]:
# Import mean_squared_error as MSE
from sklearn.metrics import mean_squared_error as MSE

# Compute MSE
mse_test = MSE(y_test, y_pred)

# Compute RMSE
rmse_test = mse_test**(1/2)

# Print RMSE
print('Test set RMSE of gb: {:.3f}'.format(rmse_test))

Test set RMSE of gb: 0.430


## Regression with SGB


In [14]:
# Import GradientBoostingRegressor
from sklearn.ensemble import GradientBoostingRegressor

# Instantiate sgbr
sgbr = GradientBoostingRegressor(max_depth=4, 
            subsample=0.9,
            max_features=0.75,
            n_estimators=200,
            random_state=2)