<a class="anchor" id="0"></a>
# **AdaBoost Classifier Tutorial in Python**


### 6.1 Import libraries <a class="anchor" id="6.1"></a>

In [9]:
import time
from datetime import datetime


### 6.2 Load dataset <a class="anchor" id="6.2"></a>

In [2]:
import torchvision

# Load Data
train_dataset = torchvision.datasets.MNIST(root='../../data/',
                                           train=True,
                                           download=True)

test_dataset = torchvision.datasets.MNIST(root='../../data/',
                                          train=False)



### 6.4 Split dataset into training set and test set <a class="anchor" id="6.4"></a>

In [4]:
training_data = train_dataset.train_data.numpy()[:5000].reshape(5000, -1)
# (5000, 28, 28) -> (5000, 784)
training_label = train_dataset.train_labels[:5000].numpy()

test_data = test_dataset.test_data.numpy()[:5000].reshape(5000, -1)
test_label = test_dataset.test_labels[:5000].numpy()

In [5]:
print('Training data size: ', training_data.shape)
print('Training data label size:', training_label.shape)
print('Training data size: ', test_data.shape)
print('Training data label size:', test_label.shape)

Training data size:  (5000, 784)
Training data label size: (5000,)
Training data size:  (5000, 784)
Training data label size: (5000,)


### 6.5 Build the AdaBoost model <a class="anchor" id="6.5"></a>

In [6]:
# Import the AdaBoost classifier
from sklearn.ensemble import AdaBoostClassifier


# Create adaboost classifer object
clf = AdaBoostClassifier()

# Train Adaboost Classifer
model1 = clf.fit(training_data, training_label)


#Predict the response for test dataset
y_pred = model1.predict(test_data)

### Create Adaboost Classifier

- The most important parameters are `base_estimator`, `n_estimators` and `learning_rate`.

- **estimator** is the learning algorithm to use to train the weak models. This will almost always not needed to be changed because by far the most common learner to use with AdaBoost is a decision tree – this parameter’s default argument.

- **n_estimators** is the number of models to iteratively train.

- **learning_rate** is the contribution of each model to the weights and defaults to 1. Reducing the learning rate will mean the weights will be increased or decreased to a small degree, forcing the model train slower (but sometimes resulting in better performance scores).

- **loss** is exclusive to AdaBoostRegressor and sets the loss function to use when updating weights. This defaults to a linear loss function however can be changed to square or exponential.



```
# This is formatted as code
```

### 6.6 Evaluate Model <a class="anchor" id="6.6"></a>

Let's estimate, how accurately the classifier or model can predict the type of cultivars.

In [7]:
#import scikit-learn metrics module for accuracy calculation
from sklearn import metrics

# calculate and print model accuracy
print("Accuracy without best param:", metrics.accuracy_score(y_true=test_label, y_pred=y_pred), "\n")

Accuracy without best param: 0.4606 



### 6.7 The effect of estimator
Let's see the effect of estimator within the same model

In [10]:
print("start")
StartTime = time.time()

for i in range(10,200,10):
    clf = AdaBoostClassifier(n_estimators=i)

    # Train Adaboost Classifer
    model1 = clf.fit(training_data, training_label)


    #Predict the response for test dataset
    y_pred = model1.predict(test_data)

    acc_rf = metrics.accuracy_score(y_true=test_label, y_pred=y_pred)
    print("n_estimators = %d, accuracy:%f" % (i, acc_rf))

EndTime = time.time()
print('Total time %.2f s' % (EndTime - StartTime))


start
n_estimators = 10, accuracy:0.558400
n_estimators = 20, accuracy:0.545800
n_estimators = 30, accuracy:0.525000
n_estimators = 40, accuracy:0.508400
n_estimators = 50, accuracy:0.460600
n_estimators = 60, accuracy:0.453200
n_estimators = 70, accuracy:0.441000
n_estimators = 80, accuracy:0.447000
n_estimators = 90, accuracy:0.438800
n_estimators = 100, accuracy:0.447200
n_estimators = 110, accuracy:0.438400
n_estimators = 120, accuracy:0.446000
n_estimators = 130, accuracy:0.438200
n_estimators = 140, accuracy:0.446000
n_estimators = 150, accuracy:0.438200
n_estimators = 160, accuracy:0.446000
n_estimators = 170, accuracy:0.438400
n_estimators = 180, accuracy:0.446000
n_estimators = 190, accuracy:0.438600
Total time 256.14 s


- In this case, we got an accuracy of 55.84%, when consider the number of estimator as 10.

### 6.7 Further evaluation with SVC base estimator


- For further evaluation, we will use SVC as a base estimator as follows:

In [11]:
# load required classifer
from sklearn.ensemble import AdaBoostClassifier


# import Support Vector Classifier
from sklearn.svm import SVC


# import scikit-learn metrics module for accuracy calculation
from sklearn.metrics import accuracy_score
svc=SVC(probability=True, kernel='linear')


# create adaboost classifer object
clf2 =AdaBoostClassifier(estimator=svc)


# train adaboost classifer
model2 = clf2.fit(training_data, training_label)


# predict the response for test dataset
y_pred2 = model2.predict(test_data)


# calculate and print model accuracy
print("Model Accuracy with SVC Base Estimator:",accuracy_score(test_label, y_pred2))


Model Accuracy with SVC Base Estimator: 0.8874



### 6.8 Further evaluation with SVC base estimator + n_estimator


In [15]:

# create adaboost classifer object
clf3 =AdaBoostClassifier(n_estimators=10, base_estimator=svc)


# train adaboost classifer
model3 = clf3.fit(training_data, training_label)


# predict the response for test dataset
y_pred3 = model3.predict(test_data)


# calculate and print model accuracy
print("Model Accuracy with SVC Base Estimator + n_estimator:",accuracy_score(test_label, y_pred3))




Model Accuracy with SVC Base Estimator + n_estimator: 0.8884


### 6.9 Further evaluation with Decision Tree base estimator  <a class="anchor" id="6.7"></a>

In [14]:
# load required classifer
from sklearn.ensemble import AdaBoostClassifier


# import Support Vector Classifier
from sklearn.tree import DecisionTreeClassifier


# import scikit-learn metrics module for accuracy calculation
from sklearn.metrics import accuracy_score
DT=DecisionTreeClassifier()


# create adaboost classifer object
clf4 =AdaBoostClassifier(estimator=DT)


# train adaboost classifer
model4 = clf4.fit(training_data, training_label)


# predict the response for test dataset
y_pred4 = model4.predict(test_data)


# calculate and print model accuracy
print("Model Accuracy with Decision Tree Estimator:",accuracy_score(test_label, y_pred4))


Model Accuracy with SVC Base Estimator: 0.747


:- In this case, we have got a classification rate:
 - 46.06%, which is with all in default.
 - 55.84%, which is with best number of estimators.
 - 88.84%, which introduces svc as estimator.
 - 88.60%, which introduces svc as estimator + best number of estimators.
 - 74.40%, which introduces decision tree as estimator + best number of estimators.



- In this case, SVC Base Estimator is getting better accuracy then Non-base Estimator.
- In this case, SVC Base Estimator is getting better accuracy then Decision tree Base Estimator.
- In this case, SVC Base Estimator + best number of estimator is not getting better accuracy then sole SVC Base Estimator.



- The disadvantages are as follows:

   1. AdaBoost is sensitive to noise data.
  
   2. It is highly affected by outliers because it tries to fit each point perfectly.
  
   3. AdaBoost is slower compared to XGBoost.

# **7. Advantages and disadvantages of AdaBoost** <a class="anchor" id="7"></a>

[Back to Notebook Contents](#0.1)


- The advantages are as follows:

   1. AdaBoost is easy to implement.
  
   2. It iteratively corrects the mistakes of the weak classifier and improves accuracy by combining weak learners.
  
   3. We can use many base classifiers with AdaBoost.
  
   4. AdaBoost is not prone to overfitting.

# **8. Results and Conclusion** <a class="anchor" id="8"></a>

[Back to Notebook Contents](#0.1)


- In this kernel, we have discussed AdaBoost classifier.

- We have discussed how the base-learners are classified.

- Then, we move on to discuss the intuition behind AdaBoost classifier.

- We have also discuss the differences between AdaBoost classifier and GBM.

- Then, we present the implementation of AdaBoost classifier using iris dataset.

- Lastly, we have discussed the advantages and disadvantages of AdaBoost classifier.

[Go to Top](#0)