## AdaBoost Classifier 

$\bullet$ The boosting is an approach to decrease model bias. <br>
$\bullet$The basic idea of boosting, is to form a powerful classifer out of many weak classifiers. <br>
$\bullet$ Boosting algorithm track those model who failed to predict accurately.<br>
$\bullet$ The boosting algorithms try to predict sequentially, where each subsequent model attempts to fix the errors of its predecessor.

There are many boosting methods available, but AdaBoost and Gradient Boosting are the most popular. Here, we implement AdaBoost on Iris datasets.

$\bullet$ It combines multiple weak classifiers to increase the accuracy of classifiers.<br>
$\bullet$ The basic functionality of Adaboost, is to set the weights of classifiers and place the training data samples in each iteration such that samples get highlighted which fialed to predict accurrately.  The following steps involve in AdaBoost algorithm:

1. Initialize training datasets, $\mathcal{D}={(x_{i},y_{i}),~ i=1,...,n}$
2. Initialize each sample with weight $w_{i} = 1/n $
3. For $m=1$ to $M:$ <br/>
      (i) Fit a classifier $h_{m}(x)$ to the training data using weights $w_{i}$ <br>
      
      (ii) Compute error
 $$ err_{m} = \frac{\sum_{i=1}^{N}w_{i}\textbf{1}(y_{i} \neq h_{m}(x_{i}))}{\sum_{i=1}^{n}w_{i}}$$

     (iii) Compute $\alpha_{m} = log((1-err_{m})/err_{m}) + log(K-1)$ <br>
   
     (iv) Set $w_{i} \leftarrow w_{i}~ e^{[\alpha_{m}I(y_{i} \neq h_{m}(x_{i}))]}, ~ i=1,...,n$ <br>
   
   
4. Predict: $h(x) = \text{argmax}_{j}\sum_{m=1}^{M} \alpha_{m} \textbf{1} [h_{m}(x) = j]$  
   

Where $w_{i}$ weights for training observation $i=1,...,n$,  weak claasifiers $h_{m}$ and error rate $err_{m}, m=1,...,M$ on the training smaples, $\alpha_{m}$ weights contribution of each $h_{m}$, $\textbf{1}(y_{i} \neq h_{m}(x_{i}))=+1$ if the classifer $h_{m}$ miss classified, $K$ is denotes as number classes.i.e. multiclass also known as "SAMME", final classifier $h(x)$ (weighted combination of classifier).


In [1]:
import numpy as np
import pandas as pd 
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier

from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import AdaBoostClassifier

from sklearn.metrics import accuracy_score


In [2]:
iris = datasets.load_iris()
print(dir(iris))

['DESCR', 'data', 'feature_names', 'filename', 'target', 'target_names']


In [3]:
X = pd.DataFrame(iris.data[:, :], columns = iris.feature_names[:])
y = pd.DataFrame(iris.target, columns = ["Species"])

In [4]:
X.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2


In [5]:
y.shape

(150, 1)

In [6]:
print(y)

     Species
0          0
1          0
2          0
3          0
4          0
..       ...
145        2
146        2
147        2
148        2
149        2

[150 rows x 1 columns]


In [7]:
le = LabelEncoder()
y = le.fit_transform(y)
y.shape

  y = column_or_1d(y, warn=True)


(150,)

In [8]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)

## Decsion Stump (depth-1)

In [9]:
tree_clf = DecisionTreeClassifier(criterion='entropy', max_depth=1, random_state=1)

#### Model evaluation of decision stump

In [10]:
tree_clf.fit(X_train, y_train)

y_pred_stmp = tree_clf.predict(X_test)
# Model Accuracy
print ("Accuracy with decision strump = {} %".format(accuracy_score(y_test, y_pred_stmp)*100))

Accuracy with decision strump = 60.0 %


## Applying AdaBoost

The accuracy with decision stumps is around 60%, but we can increase this accuracy by applying AdaBoost algorithm in the following 


In [11]:
aboost_clf = AdaBoostClassifier(base_estimator=tree_clf, n_estimators=200, algorithm="SAMME.R", learning_rate=0.5)

model_ab = aboost_clf.fit(X_train, y_train)

Prediction (Test data)

In [12]:
y_pred = model_ab.predict(X_test)

Model Evaluation

In [13]:
# Model Accuracy
print ("Accuracy with AdaBoosting = {} %".format(accuracy_score(y_test, y_pred)*100))

Accuracy with AdaBoosting = 95.55555555555556 %


In [14]:
print("Test accuracy: %0.2f" % aboost_clf.score(X_test, y_test))

Test accuracy: 0.96


In [15]:
aboost_clf.estimator_weights_

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])