## Key Idea:
AdaBoost focuses on improving the accuracy of a weak learner (such as a shallow decision tree) by adjusting the importance (weight) of data points that were misclassified in previous iterations.

### How AdaBoost Works:
1. Initialize Weights:
Initially, each data point in the training set is given equal weight.
2. Train a Weak Learner:
A weak learner (e.g., a shallow decision tree, shallow means depth is limited) is trained on the data.

3. Calculate Error:
The model’s error is calculated based on the weighted misclassifications. Points that are misclassified get a higher weight in the next round.

4. Update Weights:
The misclassified points are given higher weights, meaning they will be "focused" on more in the next iteration. Correctly classified points get reduced weights.

5. Repeat:
Another weak learner is trained, now focusing more on the misclassified points. This process repeats, with each new learner trying to correct the errors made by the previous ones.

6. Final Model:
The final prediction is a weighted sum of the predictions from all weak learners.

AdaBoost is more susceptible to noise and outliers in the data, as it assigns high weights to misclassified samples

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostClassifier
import warnings
warnings.filterwarnings("ignore")


In [3]:
# Reading the dataset from the csv file
# separator is a vertical line, as seen in the dataset
data = pd.read_csv("Iris.csv")

# Printing the shape of the dataset
print(data.shape)


(150, 6)


In [5]:
data.head

<bound method NDFrame.head of       Id  SepalLengthCm  SepalWidthCm  PetalLengthCm  PetalWidthCm  \
0      1            5.1           3.5            1.4           0.2   
1      2            4.9           3.0            1.4           0.2   
2      3            4.7           3.2            1.3           0.2   
3      4            4.6           3.1            1.5           0.2   
4      5            5.0           3.6            1.4           0.2   
..   ...            ...           ...            ...           ...   
145  146            6.7           3.0            5.2           2.3   
146  147            6.3           2.5            5.0           1.9   
147  148            6.5           3.0            5.2           2.0   
148  149            6.2           3.4            5.4           2.3   
149  150            5.9           3.0            5.1           1.8   

            Species  
0       Iris-setosa  
1       Iris-setosa  
2       Iris-setosa  
3       Iris-setosa  
4       Iris-setosa

In [6]:
data = data.drop('Id',axis=1)
X = data.iloc[:,:-1]
y = data.iloc[:,-1]
print("Shape of X is %s and shape of y is %s"%(X.shape,y.shape))
1

Shape of X is (150, 4) and shape of y is (150,)


1

In [7]:
total_classes = y.nunique()
print("Number of unique species in dataset are: ",total_classes)


Number of unique species in dataset are:  3


In [8]:
distribution = y.value_counts()
print(distribution)


Species
Iris-setosa        50
Iris-versicolor    50
Iris-virginica     50
Name: count, dtype: int64


In [9]:
X_train, X_val, Y_train, Y_val = train_test_split(
    X, y, test_size=0.25, random_state=28)


In [10]:
# Creating adaboost classifier model
adb = AdaBoostClassifier()
adb_model = adb.fit(X_train,Y_train)


In [11]:
print("The accuracy of the model on validation set is", adb_model.score(X_val,Y_val))


The accuracy of the model on validation set is 0.9210526315789473
