# AdaBoost Algorithm
- AdaBoost, short for Adaptive Boosting, is a powerful algorithm in machine learning that falls under the category of ensemble methods. 
- Ensemble methods combine multiple models to create a single, more robust model.
- In AdaBoost's case, it specifically focuses on combining weak learners into a strong learner.

**Weighted Training Instances:** It starts by assigning equal weights to all instances in the training data.

**Train a Weak Learner:** The algorithm then trains a weak learner, which can be any basic classification algorithm like a decision tree.

**Evaluate Learner Performance:** It evaluates the performance of the weak learner on the training data. Instances that the weak learner misclassified get higher weights in the next iteration. This forces the next learner to focus on the "harder" examples.

**Repeat and Combine Learners:** This process of training a weak learner, evaluating it, and adjusting weights is repeated for multiple iterations. Each iteration creates a new weak learner that focuses on the previously misclassified examples. Finally, the predictions from all the weak learners are combined to make a final prediction.

![image_30_919a085e3a.png](attachment:image_30_919a085e3a.png)

# Implementation

In [1]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns

In [2]:
data = sns.load_dataset('iris')

In [3]:
data.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [5]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   sepal_length  150 non-null    float64
 1   sepal_width   150 non-null    float64
 2   petal_length  150 non-null    float64
 3   petal_width   150 non-null    float64
 4   species       150 non-null    object 
dtypes: float64(4), object(1)
memory usage: 6.0+ KB


In [9]:
X =data [['sepal_length','sepal_width','petal_length','petal_width']]

X.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2


In [10]:
y = data['species']

y.head()

0    setosa
1    setosa
2    setosa
3    setosa
4    setosa
Name: species, dtype: object

In [11]:
from sklearn.preprocessing import LabelEncoder

le=LabelEncoder()

y=le.fit_transform(y)

In [12]:
# Import train_test_split function
from sklearn.model_selection import train_test_split

# Split dataset into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

In [13]:
# Import the AdaBoost classifier
from sklearn.ensemble import AdaBoostClassifier


# Create adaboost classifer object
abc = AdaBoostClassifier(n_estimators=50, learning_rate=1, random_state=0)

# Train Adaboost Classifer
model1 = abc.fit(X_train, y_train)


#Predict the response for test dataset
y_pred = model1.predict(X_test)

In [14]:
#import scikit-learn metrics module for accuracy calculation
from sklearn.metrics import accuracy_score


# calculate and print model accuracy
print("AdaBoost Classifier Model Accuracy:", accuracy_score(y_test, y_pred))

AdaBoost Classifier Model Accuracy: 0.9777777777777777
