<center>
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-ML0101EN-SkillsNetwork/labs/Module%203/images/IDSNlogo.png" width="300" alt="cognitiveclass.ai logo">
</center>

# Adaptive Boosting (AdaBoost) for classification with Python

Estimated time needed: **45** minutes

## Objectives

After completing this lab you will be able to:

*   Understand  that AdaBoost is a linear combination of  𝑇 weak classifiers
*   Apply AdaBoost
*   Understand Hyperparameters selection in  AdaBoost


In this notebook, you will learn AdaBoost, short for Adaptive Boosting, is a classification algorithm; AdaBoost is actually part of a family of Boosting algorithms. Like Bagging and Random Forest (RF), AdaBoost combines the outputs of many classifiers into an ensemble, but there are some differences. In both Bagging and RF, each classifier in the ensemble is powerful but prone to overfitting. As Bagging or RF aggregate more and more classifiers, they reduce overfitting.

With AdaBoost, each Classifier usually has performance slightly better than random. This is referred to as a weak learner or weak classifier. AdaBoost combines these classifiers to get a strong classifier. Unlike Bagging and Random Forest, in AdaBoost, adding more learners can cause overfitting. As a result, AdaBoost requires Hyperparameter tuning, taking more time to train. One advantage of AdaBoost is that each classifier is smaller, so predictions are faster.


In AdaBoost, the strong classifier $H(x)$ is a linear combination of $T$ weak classifiers $h_t(x)$ and $\alpha_t$ as shown in (1). Although each classifier $h_t(x)$ appears independent, the $\alpha_t$ contains information about the error of classifiers from $h_1(x),.., h_{t-1}(x)$. As we add more classifiers, the training accuracy gets larger. What’s not so apparent in (1) is that during the training process, the values of that training sample are modified for $h_t(x)$. For a more in depth look at the theory behind Adaboost, check out <a href="https://hastie.su.domains/Papers/ESLII.pdf?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMML241ENSkillsNetwork31576874-2022-01-01#page=356">The Elements of Statistical Learning Data Mining, Inference, and Prediction</a>.


$H(x) = 	ext{sign}(  \sum_{t=1}^T \alpha_t h_t(x) )$ [1]


<h1>Table of contents</h1>

<div class="alert alert-block alert-info" style="margin-top: 20px">
    <ol>
        <li><a href="https://#RFvsBag">What's the difference between RF and Bagging </a></li>
        <li><a href="https://#Example">Cancer Data Example</li>
        <li><a href="https://practice/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMML241ENSkillsNetwork31576874-2022-01-01">Practice</a></li>

</div>
<br>
<hr>


In [64]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from tqdm import tqdm
import scipy.optimize as opt

In [65]:
from sklearn.metrics import accuracy_score
from sklearn.ensemble import AdaBoostClassifier
from sklearn.model_selection import train_test_split

In [66]:
import warnings

warnings.filterwarnings('ignore')

In [67]:
def get_accuracy(X_train, X_test, y_train, y_test, model):
    return  {"test accuracy": accuracy_score(y_test, model.predict(X_test)), 
             "train accuracy": accuracy_score(y_train, model.predict(X_train))}

In [110]:
def get_accuracy_bag(X, y, title, times=20, xlabel='Num Estimators', learning_rate=[0.2, 0.4, 0.6, 1]):
    lines_array = ['solid', '--', '-.', ':']
    n_estimators = [n for n in range(1, 200)]
    np_shape = (times, len(learning_rate), len(n_estimators))
    train_acc = np.zeros(np_shape)
    test_acc = np.zeros(np_shape)
    
    for n in tqdm(range(times)):
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
        
        for num_trees in n_estimators:
            for j, lr in enumerate(learning_rate):
                model = AdaBoostClassifier(
                    n_estimators=num_trees + 1,
                    random_state=0,
                    learning_rate=lr
                )
                model.fit(X_train, y_train)
                accuracy = get_accuracy(X_train, X_test, y_train, y_test, model)
                train_acc[n, j, num_trees - 1] = accuracy['test accuracy']
                test_acc[n, j, num_trees - 1] = accuracy['train accuracy']

    fig, ax1 = plt.subplots()
    mean_test = test_acc.mean(axis=0)
    mean_train = train_acc.mean(axis=0)
    ax2 = ax1.twinx()

    for j, (lr, line) in enumerate(zip(learning_rate, lines_array)): 
        ax1.plot(mean_train[j,:], linestyle=line, color='b', label=f"Learning rate {lr}")
        ax2.plot(mean_test[j,:], linestyle=line, color='r', label=str(lr))

    ax1.set_ylabel('Training accuracy',color='b')
    ax1.set_xlabel('No of estimators')
    ax1.legend()
    ax2.set_ylabel('Testing accuracy', color='r')
    ax2.legend()
    plt.show()

### About the dataset

We will use a telecommunications dataset for predicting customer churn. This is a historical customer dataset where each row represents one customer. The data is relatively easy to understand, and you may uncover insights you can use immediately. Typically, it is less expensive to keep customers than to acquire new ones, so the focus of this analysis is to predict the customers who will stay with the company.

This data set provides information to help you predict what behavior will help you to retain customers. You can analyze all relevant customer data and develop focused customer retention programs.

The dataset includes information about:

*   Customers who left within the last month – the column is called Churn
*   Services that each customer has signed up for – phone, multiple lines, internet, online security, online backup, device protection, tech support, and streaming TV and movies
*   Customer account information – how long they have been a customer, contract, payment method, paperless billing, monthly charges, and total charges
*   Demographic info about customers – gender, age range, and if they have partners and dependents


In [111]:
churn_df = pd.read_csv("https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-ML0101EN-SkillsNetwork/labs/Module%203/data/ChurnData.csv")
churn_df.head()

Unnamed: 0,tenure,age,address,income,ed,employ,equip,callcard,wireless,longmon,...,pager,internet,callwait,confer,ebill,loglong,logtoll,lninc,custcat,churn
0,11.0,33.0,7.0,136.0,5.0,5.0,0.0,1.0,1.0,4.4,...,1.0,0.0,1.0,1.0,0.0,1.482,3.033,4.913,4.0,1.0
1,33.0,33.0,12.0,33.0,2.0,0.0,0.0,0.0,0.0,9.45,...,0.0,0.0,0.0,0.0,0.0,2.246,3.24,3.497,1.0,1.0
2,23.0,30.0,9.0,30.0,1.0,2.0,0.0,0.0,0.0,6.3,...,0.0,0.0,0.0,1.0,0.0,1.841,3.24,3.401,3.0,0.0
3,38.0,35.0,5.0,76.0,2.0,10.0,1.0,1.0,1.0,6.05,...,1.0,1.0,1.0,1.0,1.0,1.8,3.807,4.331,4.0,0.0
4,7.0,35.0,14.0,80.0,2.0,15.0,0.0,1.0,0.0,7.1,...,0.0,0.0,1.0,1.0,0.0,1.96,3.091,4.382,3.0,0.0


In [112]:
churn_df.dtypes

tenure      float64
age         float64
address     float64
income      float64
ed          float64
employ      float64
equip       float64
callcard    float64
wireless    float64
longmon     float64
tollmon     float64
equipmon    float64
cardmon     float64
wiremon     float64
longten     float64
tollten     float64
cardten     float64
voice       float64
pager       float64
internet    float64
callwait    float64
confer      float64
ebill       float64
loglong     float64
logtoll     float64
lninc       float64
custcat     float64
churn       float64
dtype: object

In [113]:
(churn_df.max() == 1).all()

False

In [114]:
(churn_df.min() == -1).all()

False

## Data Preprocessing

In [115]:
Y_COLUMN = 'churn'

In [116]:
churn_df[Y_COLUMN] = churn_df[Y_COLUMN].astype(int)
churn_df[Y_COLUMN].dtype

dtype('int64')

In [117]:
FEATURE_COLUMNS = ['tenure', 'age', 'address', 'income', 'ed', 'employ', 'equip', 'callcard', 'wireless', 'churn']
churn_df = churn_df[FEATURE_COLUMNS]
churn_df.head()

Unnamed: 0,tenure,age,address,income,ed,employ,equip,callcard,wireless,churn
0,11.0,33.0,7.0,136.0,5.0,5.0,0.0,1.0,1.0,1
1,33.0,33.0,12.0,33.0,2.0,0.0,0.0,0.0,0.0,1
2,23.0,30.0,9.0,30.0,1.0,2.0,0.0,0.0,0.0,0
3,38.0,35.0,5.0,76.0,2.0,10.0,1.0,1.0,1.0,0
4,7.0,35.0,14.0,80.0,2.0,15.0,0.0,1.0,0.0,0


## Train Test Dataset

In [118]:
X=churn_df[['tenure', 'age', 'address', 'income', 'ed', 'employ', 'equip']]
y = churn_df[Y_COLUMN]

In [119]:
y.value_counts(normalize=True)

churn
0    0.71
1    0.29
Name: proportion, dtype: float64

In [120]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)

## Ada Boost Model Selection

In [121]:
n_estimators=5
random_state=0

In [122]:
ada_boost_classifier = AdaBoostClassifier(n_estimators=n_estimators, random_state=random_state)

If the outputs were y-1 and 1, the form of the classifier would be:

$H(x) = 	ext{sign}(  \alpha_1 h_1(x)+ \alpha_2 h_2(x)+ \alpha_3 h_3(x)+ \alpha_4 h_4(x)+ \alpha_5 h_5(x) )$

We can fit the object finding all the $\alpha_t$ $h_t(x)$ and then make a prediction:


In [123]:
model = AdaBoostClassifier(n_estimators=n_estimators,random_state=random_state)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
y_pred 

array([1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0,
       0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1,
       1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0])

In [124]:
ada_boost_classifier.fit(X_train, y_train)

In [125]:
ada_boost_classifier.estimators_

[DecisionTreeClassifier(max_depth=1, random_state=209652396),
 DecisionTreeClassifier(max_depth=1, random_state=398764591),
 DecisionTreeClassifier(max_depth=1, random_state=924231285),
 DecisionTreeClassifier(max_depth=1, random_state=1478610112),
 DecisionTreeClassifier(max_depth=1, random_state=441365315)]

In [126]:
y_pred_test = ada_boost_classifier.predict(X_test)
y_pred_test

array([1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0,
       0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1,
       1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0])

In [127]:
accuracy_score(y_test, y_pred_test)

0.7666666666666667

In [128]:
print(get_accuracy(X_train, X_test, y_train, y_test, ada_boost_classifier))

{'test accuracy': 0.7666666666666667, 'train accuracy': 0.7642857142857142}


In [129]:
[(f"for weak classifiers {get_accuracy(X_train, X_test, y_train, y_test,  weak_classifiers)} the we get ") for i, weak_classifiers in enumerate(ada_boost_classifier.estimators_)]

["for weak classifiers {'test accuracy': 0.7, 'train accuracy': 0.7428571428571429} the we get ",
 "for weak classifiers {'test accuracy': 0.6, 'train accuracy': 0.6214285714285714} the we get ",
 "for weak classifiers {'test accuracy': 0.6333333333333333, 'train accuracy': 0.6642857142857143} the we get ",
 "for weak classifiers {'test accuracy': 0.35, 'train accuracy': 0.4642857142857143} the we get ",
 "for weak classifiers {'test accuracy': 0.43333333333333335, 'train accuracy': 0.5} the we get "]

In [130]:
get_accuracy_bag(X, y, title="Training and Test Accuracy vs Weak Classifiers", learning_rate=[1], times=20, xlabel='Number Estimators')

 40%|████      | 8/20 [08:41<13:01, 65.14s/it]


KeyboardInterrupt: 

## Changing the Base Classifier
