# <p style='text-align: center;'> AdaBoost Algorithm in Machine Learning </p>

- Adaboost is also known as Adaptive Boosting.


- AdaBoost is the ensemble learning boosting algorithm that combines several weak classifiers to create a strong classifier.


- It combines many weak decision trees and it sums the weight of all the weak decision trees to get the final result.


- The decision tree which we use to consider is called a Stump.


- AdaBoost adapts by using previously created **weak learners** in order to adjust misclassified instances for the next created **weak learner**.


- Unlike a single Decision Tree which fits to all the data at once (fitting the data hard), AdaBoost aggregates multiple weak learners, allowing the overall ensemble model to learn slowly from the features.


- AdaBoost Algorithm is a Boosting technique.


- In the AdaBoost Algorithm each decision tree is pruned with max_depth=1 hence it is called as **stump**. 


<b> What is stump ?
- The algorithm makes a one node (root node) with two leaves (terminal nodes), known as **Stump**.
    
    
![image.png](attachment:image.png)
    
    
- Each stump essentially represents the strength of a feature to predict.
    
    
<b> What is weak learner ?
- A weak model is a model that is too simple to perform well on its own.
    
    
- In this algorithm it combines the multiple weak leaners to create a strong classifier and predict the result, as follows:
    
![image-3.png](attachment:image-3.png)
    
    


### Steps in Boosting Algorithms:
<b> Step 1: Assign Equal Weights to all the observations:
- Initially assign same weights to each record in the dataset.

   Sample weight = 1/N
   
- Where N = Number of records
    
    
<b> Step 2: Classify random samples using stumps:
- Draw random samples with replacement from original data with the probabilities equal to the sample weights and fit the model. Here the model (base learners) used in AdaBoost is decision tree. Decision trees are created with one depth which has one node and two leaves also referred to as stumps. Fit the model to the random samples and predict the classes for the original data.
    
    
<b> Step 3: Calculate Total Error:
- Total error is nothing but the sum of weights of misclassified record.
    
    Total Error = Weights of misclassified records.
    
    
- Total error will be always between 0 and 1. Where 0 represents perfect stump (correct classification) and 1 represents weak stump (misclassification).
    
    
<b> Step 4: Calculate Performance of the Stump:
- Using the Total Error, determine the performance of the base learner. The calculated performance of stump(α) value is used to update the weights in consecutive iteration and also used for final prediction calculation.
    
    Performance of the stump(α) = ½ ln (1 – Total error/Total error)
    
    
- If the total error is 0.5, then the performance of the stump will be zero.
- If the total error is 0 or 1, then the performance will become infinity or -infinity respectively.
    
    
<b> Step 5: Update Weights:
- Based on the performance of the stump(α) update the weights. We need the next stump to correctly classify the misclassified record by increasing the corresponding sample weight and decreasing the sample weights of the correctly classified records.
    
    New weight (updated weight) = Weight * e^(performance) → misclassified records

    New weight (updated weight) = Weight * e^-(performance) → correctly classified records
    
    
- After updating the new weight for wrong classified results and correctly classified results, need to calculate the normalized weight in order to normalize the data.
    
    
    Normalized weight = Updated weight / Sum of updated weight
    
    
<b> Step 6: Update weights in iteration:
- Use the normalized weight and make the second stump in the forest. Create a new dataset of same size of the original dataset with repetition based on the newly updated sample weight. So that the misclassified records get higher probability of getting selected. Repeat step 2 to 5 again by updating the weights for a particular number of iterations.
    
    
<b> Step 7: Final Predictions:
- Final prediction is done by obtaining the sign of the weighted sum of final predicted value.

    Final prediction/sign (weighted sum) = ∑ (αi* (predicted value at each iteration))
    
    
- For example: 5 weak classifiers may predict the values 1.0, 1.0, -1.0, 1.0, -1.0. From a majority vote, it looks like the model will predict a value of 1.0 or the first class. These same 5 weak classifiers may have the performance (α) values as 0.2, 0.5, 0.8, 0.2 and 0.9 respectively.
    


### AdaBoost Calculation with one dataset:
<b> Let us suppose that we have a dataset that has features F1, F2, F3, and an output (1 means "Yes" and 0 means "No"). The dataset contains seven records. Now we are applying this dataset for AdaBoost Algorithm calculation. 
    
![image.png](attachment:image.png)


<b> Step-1:
    
In the first step, each record will be assigned with some sample weight. 
    
    Sample weight = 1/N
    
    The weight for each sample is 1/7.
    
![image-2.png](attachment:image-2.png)

<b> Step-2:
    
In this step, we will create the base learner sequentially. we will create a decision tree with only one depth. These decision trees are called Stump. For each feature, we will create a stump. We will choose the best decision tree stump based on the entropy value. Let’s suppose the F1 is the best stump.
    
![image.png](attachment:image.png)

<b> Step-3:

Now, will check how many incorrect records it has classified? Let’s suppose that it has classified 2 incorrect records which are called an error. After that, it will calculate the total error.
    
    Total Error = Weights of misclassified records.
    Total Error = 2/7
    
![image.png](attachment:image.png)
    

<b> Step-4:
    
Now we have calculated the total error, we will calculate the performance of the stump (how the stump has been classified). For that, we will use the following formula. We are calculating the performance to update the weights.
    
    Performance of the stump(α) = ½ ln (1 – Total error/Total error)
    
    Performance of the stump(α) = ½ ln ((1 – 2/7) / (2/7))
    
    Performance of the stump(α) = 0.19
    
    
**Note:** Please note that here only the wrong predicted record will pass to the next decision tree stump. For this we will increase the weight of wrong predicted result and decrease the weight for correctlly classified result.
    
    
<b> Step-5:
    
In the previous step, we have calculated the performance of the stump. Now will update the weights. we will update the new weight for both, the wrong classified result and the correct classified result. For this, we will use the following methods.
    
<b> For calculating the new weight (updated weight) of wrong classified point:
    
    New weight (updated weight) = Weight * e^(performance)
    
    New weight (updated weight) = 2/7 * e^(0.19)
    
    New weight (updated weight) = 0.34
    
    
<b> For calculating the new weight (updated weight) of correctly classified point:
    
    New weight (updated weight) = Weight * e^(-performance)
    
    New weight (updated weight) = 2/7 * e^(-0.19)
    
    New weight (updated weight) = 0.23
    
    
After updating the new weight for wrong classified results and correctly classified results our data will look something like this. Note that here we have also calculated the normalized weight in order to normalize our data.
    
    Normalized weight = Updated weight / Sum of updated weight
    
![image.png](attachment:image.png)
    
    
    
    
    


<b> Step-6:
    
Now we will remove the updated weight column and sample weight column and keep the normalized column.
    
![image.png](attachment:image.png)
    
    
Base on the normalized weight value we will divide the data into some bucket range( 0.12:- 0 to 0.12, 0.18:- 0.12 to 0.19, and so on).

<b> Step-7:
    
Now based on this normalized dataset we will create a new dataset. The newly created dataset will select the wrong records from previous data for its training purpose. The algorithm which we will use i.e. AdaBoost will run 8 iterations to select the different records from the previous data set. new data set will select the values randomly from the old one and it will check that in which region or bucket it falls. After that, it will select that record and populate it into the new records of features, and this will goes on. It may also select the wrong data. This is how we will keep on creating a new decision tree stump by using the previous decision tree stump.
    
    
So in this way, we will reduce the error as compared to the initial state. Now let us suppose that we have created the 4 decision tree stump D1, D2, D3, D4 by applying all the above processes. Let’s suppose the D1, D2, D3, and D4 gives the result as 1, 1, 0, 1. It will take the majority of the outputs of the stumps and gives the final result.

### Advantages of AdaBoost Algorithm:

1. One of the many advantages of the AdaBoost Algorithm is it is fast, simple and easy to program.
2. Boosting has been shown to be robust to overfitting.
3. It has been extended to learning problems beyond binary classification (i.e.) it can be used with text or numeric data.


### Disadvantages of AdaBoost Algorithm:

1. AdaBoost can be sensitive to noisy data and outliers.
2. Weak classifiers being too weak can lead to low margins and overfitting.

