* AdaBoost or Adaptive Boosting is an ensemble boosting classifier that combines multiple weak classifiers to increase the accuracy of classifiers.


* AdaBoost is an iterative ensemble method. AdaBoost classifier builds a strong classifier by combining multiple poorly performing classifiers so that you will get high accuracy strong classifier.


* The basic concept behind Adaboost is to set the weights of classifiers and training the data sample in each iteration such that it ensures the accurate predictions of unusual observations.


* AdaBoost should meet two conditions:
1. The classifier should be trained interactively on various weighed training examples.


2. In each iteration, it tries to provide an excellent fit for these examples by minimizing training error.


* To build a AdaBoost classifier, imagine that as a first base classifier we train a Decision Tree algorithm to make predictions on our training data.


* Now, following the methodology of AdaBoost, the weight of the misclassified training instances is increased.


* The second classifier is trained and acknowledges the updated weights and it repeats the procedure over and over again.


* At the end of every model prediction we end up boosting the weights of the misclassified instances so that the next model does a better job on them, and so on.


* AdaBoost adds predictors to the ensemble gradually making it better. The great disadvantage of this algorithm is that the model cannot be parallelized since each predictor can only be trained after the previous one has been trained and evaluated.


* Below are the steps for performing the AdaBoost algorithm:
1. Initially, all observations are given equal weights.


2. A model is built on a subset of data.


3. Using this model, predictions are made on the whole dataset.


4. Errors are calculated by comparing the predictions and actual values.


5. While creating the next model, higher weights are given to the data points which were predicted incorrectly.


6. Weights can be determined using the error value. For instance,the higher the error the more is the weight assigned to the observation.


7. This process is repeated until the error function does not change, or the maximum limit of the number of estimators is reached.

## Parameters of AdaBoost

* The most important parameters are base_estimator, n_estimators, and learning_rate.


* **base_estimator** is the learning algorithm to use to train the weak models. This will most always not needed to be changed because by far the most common learner to use with AdaBoost is a decision tree-this parameter's default argument.


* **n_estimators** is the number of models tp iteratively train.


* **learning_rate** is the contribution of each model to the weights and defaults to 1. Reducing the learning rate will mean the weights will be increased or decreased to a small degree, forcing the model train slower (but sometimes resulting in better performance scores).


* **loss** is exclusive to AdaBoostRegressor and sets the loss function to use when updating weights. This defaults to a linear loss function however can be changed to square or exponential.

## Advantages

1. AdaBoost is easy to implement.


2. It iteratively corrects the mistakes of the weak classifier and improves accuracy by combining weak learners.


3. We can use many base classifiers with AdaBoost.


4. AdaBoost is not prone to overfitting.

## Disadvantages

1. AdaBoost is sensitive to noise data.


2. It is highly affected by outliers because it tries to fit each point perfectly.


3. AdaBoost is slower compared to XGBoost.

In [1]:
#Importing the necessary libraries
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostClassifier
from sklearn.metrics import accuracy_score
from sklearn.svm import SVC

In [2]:
#Loading the dataset
data = pd.read_csv('D:\\SLIIT\\3rd year 2nd sem\\Machine Learning amd Optimization Methods\\Coding\\Iris.csv')
data.head(10)

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa
5,6,5.4,3.9,1.7,0.4,Iris-setosa
6,7,4.6,3.4,1.4,0.3,Iris-setosa
7,8,5.0,3.4,1.5,0.2,Iris-setosa
8,9,4.4,2.9,1.4,0.2,Iris-setosa
9,10,4.9,3.1,1.5,0.1,Iris-setosa


In [3]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Id             150 non-null    int64  
 1   SepalLengthCm  150 non-null    float64
 2   SepalWidthCm   150 non-null    float64
 3   PetalLengthCm  150 non-null    float64
 4   PetalWidthCm   150 non-null    float64
 5   Species        150 non-null    object 
dtypes: float64(4), int64(1), object(1)
memory usage: 7.2+ KB


In [4]:
data['Id'] = data['Id'].astype(str)

In [5]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Id             150 non-null    object 
 1   SepalLengthCm  150 non-null    float64
 2   SepalWidthCm   150 non-null    float64
 3   PetalLengthCm  150 non-null    float64
 4   PetalWidthCm   150 non-null    float64
 5   Species        150 non-null    object 
dtypes: float64(4), object(2)
memory usage: 7.2+ KB


In [6]:
data.describe()

Unnamed: 0,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm
count,150.0,150.0,150.0,150.0
mean,5.843333,3.054,3.758667,1.198667
std,0.828066,0.433594,1.76442,0.763161
min,4.3,2.0,1.0,0.1
25%,5.1,2.8,1.6,0.3
50%,5.8,3.0,4.35,1.3
75%,6.4,3.3,5.1,1.8
max,7.9,4.4,6.9,2.5


## Declaring feature vector and target variable

In [7]:
x = data[['SepalLengthCm','SepalWidthCm','PetalLengthCm','PetalWidthCm']]
x.head(10)

Unnamed: 0,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2
5,5.4,3.9,1.7,0.4
6,4.6,3.4,1.4,0.3
7,5.0,3.4,1.5,0.2
8,4.4,2.9,1.4,0.2
9,4.9,3.1,1.5,0.1


In [8]:
y = data['Species']
y.head(10)

0    Iris-setosa
1    Iris-setosa
2    Iris-setosa
3    Iris-setosa
4    Iris-setosa
5    Iris-setosa
6    Iris-setosa
7    Iris-setosa
8    Iris-setosa
9    Iris-setosa
Name: Species, dtype: object

* **LabelEncoder** Encode target labels with value between 0 and n_classes-1. This transformer should be used to encode target values, i.e. y, and not the input X.

In [9]:
le = LabelEncoder()
y = le.fit_transform(y)

In [10]:
y

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

In [11]:
#Splitting the dataset into training and testing
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.3)

In [12]:
#Building the AdaBoost model
Ada = AdaBoostClassifier(n_estimators=50,learning_rate=1,random_state=0)

In [13]:
#Training the AdaBoost model
AdaModel = Ada.fit(x_train,y_train)

In [14]:
#Predict the response for the dataset
y_pred = AdaModel.predict(x_test)

In [15]:
y_pred

array([0, 2, 1, 1, 1, 0, 1, 0, 2, 0, 2, 2, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1,
       1, 1, 1, 1, 2, 0, 0, 1, 2, 2, 2, 2, 0, 0, 2, 2, 2, 0, 0, 1, 1, 2,
       2])

In [16]:
#Checking the model accuracy
print('AdaBoost Classifier Model Accuracy:',accuracy_score(y_test,y_pred))

AdaBoost Classifier Model Accuracy: 0.9555555555555556


In [17]:
svc = SVC(probability=True,kernel='linear')

In [19]:
#Creating adaboost classifier object
AdaModel =AdaBoostClassifier(n_estimators=50, base_estimator=svc,learning_rate=1, random_state=0)

In [20]:
#Training the adaboost classifier
modelAdaSVC = AdaModel.fit(x_train,y_train)

In [21]:
#Making predictions
y_pred = modelAdaSVC.predict(x_test)

In [22]:
y_pred

array([0, 2, 1, 1, 1, 0, 1, 0, 1, 0, 2, 2, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1,
       1, 1, 1, 1, 2, 0, 0, 1, 2, 2, 2, 2, 0, 0, 2, 2, 2, 0, 0, 1, 1, 2,
       2])

In [23]:
# Checking the model accuracy
print("Model Accuracy with SVC Base Estimator:",accuracy_score(y_test, y_pred))

Model Accuracy with SVC Base Estimator: 0.9333333333333333
