# **AdaBoost - Boosting Technique Machine Learning**

Adaboost which is otherwise called as Adaptive Boosting and this is supervised ML technique. This helps us to combine all weak learners into a single strong learner. The base learner/classifier could be of any of our basic classifiers, from decision tree (often default) to logistic regression, etc. If the classifier used in this technique is decision tree, then it will create different decision trees with one depth or single split, called ‘Decision Stump’. The number of decision stumps will depend on the number of features in the dataset. When there are M number of features in the dataset, this algorithm will create ‘M’ number of decision stumps. 
 
1.	The first weak learner/classifier from for example, three weak learners sequentially, which we need to create from the dataset with the help of decision tree. All the base learners are decision trees. When we think about decision tree, it is not like the decision tree in the random forest. Instead, we need to create decision tree with only one depth / single split. These decision trees are basically called ‘Decision Stumps’

2.	Initially we assign weight that is  w = 1/n (n – total number of samples in the dataset) in all the samples. For example, above in the picture, total number of samples = 10 and the initial weight is 1 / 10
3.	We will create this stump for each and every feature in the dataset
4.	We will select the first decision tree base model. How to select the first decision tree base model? You know that we have two properties of decision tree, entropy and  Gini index / coefficient. After comparing all the features, whichever entropy has less value, we will select that decision tree as the first base learner model in the sequence base models
5.	After the first base learner model built, we need to check the observations of misclassified, ie, how many records have been misclassified. For example, for the first feature, there are 10 (N) observations, and out of N observations, how many are misclassified (T), So, the total error (TE) is T / N  ( N – total number of observations, T – total number of misclassified observations. For example, the total number of records is 10 and the total number of misclassified record is 3, then Total Error is 3/10 = 0.3
6.	We need to find out the ‘Performance of Stump’ – that means how the stump has basically classified. Performance of Stump = ½ loge ((1-TE)/TE). For example, the initial weight is 3/10
Performance of Stump = ½ loge ((1-3/10)/(3/10)) = ½ loge (2.3333) = ½ * 0.8472 = 0.4236

7.	After calculating the Total Error and Performance of Stump, we need to update the weight for each and every record. So, first update the weight for the incorrectly classified samples and correctly classified samples in the dataset. In order to update the weight for the incorrectly classifier record, we will use the simple formula. 

The formal is New sample weight = old weight * ePerformance of Stump
New sample weight = 3/10 * e0.4236              
		        = 0.4583
Update the new weight 0.4583 to the incorrectly classified samples which basically have increased the weight. 
8.	Calculate the new weight for the correctly classified samples. 
The formula is New sample weight = old weight * e-Performance of Stump 
                           New sample weight = 3/10 * e-0.4236             
				         = 0.1964
9.	Update the new weight 0.1286 for the correctly classified samples. 
10.	Summation of all these updated weight is not 1, whereas the summation of initial weight of all records is 1.            
So, we need to normalize the updated weights for all the samples.  
Normalized weight 	= 	Updated weight / Total updated weight 
			= 	0.4583 / 2.7497 = 0.17   ----- for the incorrectly classified record
			=	0.1964 / 2.7497 = 0.07 -----------for the correctly classified record
So, the summation of all normalized weights is 1.

11.	After this, we have to make our second decision stump. For this, we will make a class intervals for the normalized weights. 

12.	After that, we want to make a second weak model. But to do that, we need a sample dataset on which the second weak model can be run. For making it, we will run N number of iterations. On each iteration, it will calculate a random number ranging between 0-1 and this random will be compared with class intervals we created and on which class interval it lies, that row will be selected for sample data set. So new sample data set would also be of N observation. 

13.	This whole process will continue for M decision stumps. The final sequential tree would be considered as the final tree.




In [46]:
### Importing libraries

from sklearn.ensemble import AdaBoostClassifier
from sklearn.datasets import load_iris

In [47]:
### Getting the data from IRIS dataset from sklearn

iris = load_iris()
x = iris.data
y = iris.target
x

array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2],
       [5.4, 3.9, 1.7, 0.4],
       [4.6, 3.4, 1.4, 0.3],
       [5. , 3.4, 1.5, 0.2],
       [4.4, 2.9, 1.4, 0.2],
       [4.9, 3.1, 1.5, 0.1],
       [5.4, 3.7, 1.5, 0.2],
       [4.8, 3.4, 1.6, 0.2],
       [4.8, 3. , 1.4, 0.1],
       [4.3, 3. , 1.1, 0.1],
       [5.8, 4. , 1.2, 0.2],
       [5.7, 4.4, 1.5, 0.4],
       [5.4, 3.9, 1.3, 0.4],
       [5.1, 3.5, 1.4, 0.3],
       [5.7, 3.8, 1.7, 0.3],
       [5.1, 3.8, 1.5, 0.3],
       [5.4, 3.4, 1.7, 0.2],
       [5.1, 3.7, 1.5, 0.4],
       [4.6, 3.6, 1. , 0.2],
       [5.1, 3.3, 1.7, 0.5],
       [4.8, 3.4, 1.9, 0.2],
       [5. , 3. , 1.6, 0.2],
       [5. , 3.4, 1.6, 0.4],
       [5.2, 3.5, 1.5, 0.2],
       [5.2, 3.4, 1.4, 0.2],
       [4.7, 3.2, 1.6, 0.2],
       [4.8, 3.1, 1.6, 0.2],
       [5.4, 3.4, 1.5, 0.4],
       [5.2, 4.1, 1.5, 0.1],
       [5.5, 4.2, 1.4, 0.2],
       [4.9, 3

In [48]:
x.shape

(150, 4)

In [49]:
y.shape

(150,)

In [50]:
y

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

In [51]:
### Splitting the data

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.3)

In [52]:
### AdaBoost Classifier

abc = AdaBoostClassifier(n_estimators=50, 
                         learning_rate=1)

In [53]:
### Fittng the model - Training the Adaboost classifier

model = abc.fit(x_train, y_train)

In [54]:
### Predict the reponse for the test dataset

y_pred = model.predict(x_test)

In [55]:
### Evaluating the model

from sklearn import metrics
accuracy = metrics.accuracy_score(y_test, y_pred)

In [56]:
accuracy

0.9333333333333333

### **Using Logistic Regression as weak learner**

In [57]:
### importing Logistic regression 

from sklearn.linear_model import LogisticRegression

logisticRegressionModel = LogisticRegression()

In [58]:
### Creating adaboost classifier object

ada_Logistic = AdaBoostClassifier(n_estimators=50, learning_rate=1, base_estimator=logisticRegressionModel)

In [60]:
### Train adaboost classifier

model = ada_Logistic.fit(x_train, y_train)

In [61]:
### Predict test data

y_pred = model.predict(x_test)

In [62]:
### Evaluating the model

metrics.accuracy_score(y_test, y_pred)
accuracy


0.9333333333333333

**Advantages:**

1.	It is fast, simple and easy to program
2.	It can be used to improve the accuracy of weak classifiers
3.  It is not prone to overfitting
**Disadvantages:**
1.	It is extremely sensitive to Noisy data and outliers. If you want to use AdaBoost, then it is recommended to remove noisy data and outliers. 
2.	Comparatively, AdaBoost is slower than XgBoost