# AdaBoost

$\textit{Indra Ikauniece}$

Adaboosting is an ensemble method that creates a strong classifier from a number of week classifiers. A week classifier performs poorly, but better than guessing.


The most suited and therefore most common algorithm used with AdaBoost are decision trees with one level. Because these trees are so short and only contain one decision for classification, they are often called decision stumps. But Adaboost can be applied to any classification algorithm.

Adaboost trains several weak classifiers on different random subsets of training data. After each training Adaboost assigns a weight to each traing data point, so that data points with higher weights will have a higher probability of being included in the next training subset. From each training, the examples that were misclassified, are assigned higher weight, so they will have a higher probability of being corrected in the next training. (Before the first training all data points are assigned equal probability.)

After all the classidiers are trained, they are each given weights based on their accuracies. Classifiers with 50% accuracies are assigned weight zero (because 50% accuracy is the same as guessing), classifiers with higher than 50% accuracies are assigned weights larger than zero, and classifiers with less than 50% accuracies are assigned negative weights (because they predict the opposite of correct). 

$w_0(x_i)=1/N$ - weights before training the first model

Create classifier $h_j(x), i=j,..,K$ from the training subset.

Then compute the output weight for that classifier: $\alpha_j=\frac{1}{2}\ln\big(\frac{1-\epsilon_{j}}{ \epsilon_{j}}\big)$

In this plot of $\alpha$ we can see that at $0.5$ error rate, the assigned weight is $0$, and then for the error rate larger than $0.5$ the assigned weight is positive, and for the error rate less than $0.5$ the assigned weight is negative.

![title](adaboost_alphacurve.png)


After computing the classfier weight, we then compute weights for each training data point. 

$
w_j(x_i)=\frac{w_{j-1}(x_i) e^{- y_i \alpha_j h_j(x_i)}}{Z_j}
$

$w_j$ is a vector of weights for each point in the training data set. 

$Z_j=\sum_i w_j(x_i)$ is the sum of all weights, used to normalize the weights so that they sum up to $1$.

### Testing Adaboost on Churn problem

In [37]:
%reset -f
#Recover Churn data
import pandas as pd

df = pd.read_csv("files/churn_curated_numerical.csv",header=None)
df.head()
data = df.values
X = data[:,:-1]
y = 2*data[:,-1]-1
print ('Loading ok.')

Loading ok.


In [63]:
import numpy as np
from sklearn import model_selection
from sklearn import tree
from sklearn import metrics
from sklearn import ensemble
from sklearn.ensemble import AdaBoostClassifier
from sklearn.model_selection import cross_val_score

First predict churn with a decision tree.

In [79]:
clf = tree.DecisionTreeClassifier(random_state=0)
clf.fit(X, y)
scores = cross_val_score(clf, X, y)
scores.mean()  

0.90969096909690972

Now predict the same problem with decision trees using Adaboost.

In [80]:
clf = AdaBoostClassifier(tree.DecisionTreeClassifier(random_state=0),algorithm="SAMME",n_estimators=200)
clf.fit(X, y)
scores = cross_val_score(clf, X, y)
scores.mean()  

0.91179117911791174

We can see a small $1\%$ improvement in accuracy.