# [AdaBoost](https://towardsdatascience.com/a-mathematical-explanation-of-adaboost-4b0c20ce4382) 
Adaboost, short for Adaptive Boosting, is an ensemble learning algorithm that combines weak learners (classifiers with weak predictive power) to create a strong classifier. It is an iterative algorithm that assigns weights to each training sample, focusing on the misclassified samples in subsequent iterations.

Step 1: Initialize the weights for each training sample as $w_i = \frac{1}{N}$, where $N$ is the total number of samples.

Step 2: For $t = 1$ to $T$ (the number of iterations):
  
  a) Train a weak learner $G_t(x)$ on the training data using the current sample weights.
  
  b) Calculate the weighted error of the weak learner as $\epsilon_t = \sum_{i=1}^{N} w_i^{(t)} \cdot I(y_i \neq G_t(x_i))$, where $w_i^{(t)}$ is the weight of sample $i$ at iteration $t$ and $I(\cdot)$ is the indicator function.
  
  c) Calculate the weight of the weak learner as $\alpha_t = \frac{1}{2} \ln \left(\frac{1 - \epsilon_t}{\epsilon_t}\right)$.
  
  d) Update the sample weights as $w_i^{(t+1)} = \frac{w_i^{(t)} \cdot \exp(-\alpha_t \cdot y_i \cdot G_t(x_i))}{Z_t}$, where $Z_t$ is the normalization factor (the sum of all updated weights).
  
Step 3: Repeat steps 2 until $T$ iterations are completed.

Step 4: The final boosted model is given by $F(x) = \text{sign} \left(\sum_{t=1}^{T} \alpha_t \cdot G_t(x)\right)$.

The derivation of Adaboost involves minimizing the exponential loss function by iteratively updating the weights of misclassified samples. The weight update equation ensures that the subsequent weak learners focus more on the misclassified samples from the previous iterations. The weight $\alpha_t$ of each weak learner depends on its weighted error, emphasizing more accurate weak learners.

The final prediction is obtained by aggregating the predictions of all weak learners, where each weak learner's contribution is weighted by its importance (determined by $\alpha_t$). The sign function ensures that the predictions are binary.

Adaboost effectively combines the weak learners to create a strong ensemble model that improves the overall prediction performance. It is widely used in classification tasks and has shown good generalization capabilities.

Please note that this is a brief summary of the derivation process. The actual derivation involves more mathematical details and proofs, which can be found in the original paper by Freund and Schapire (1997) titled "A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting."

         +---------------------+
         |                     |
         |    Initialize       |
         |   sample weights    |
         |                     |
         +---------+-----------+
                   |
                   v
         +---------------------+
         |                     |
         |   Training          |
         |   Iterations        |
         |                     |
         +---------+-----------+
                   |
                   v
         +---------------------+
         |                     |
         |   Weak Learner      |
         |   Training          |
         |                     |
         +---------+-----------+
                   |
                   v
         +---------------------+
         |                     |
         |   Calculate         |
         |   Weighted Error    |
         |                     |
         +---------+-----------+
                   |
                   v
         +---------------------+
         |                     |
         |   Calculate         |
         |   Weak Learner      |
         |   Weight            |
         |                     |
         +---------+-----------+
                   |
                   v
         +---------------------+
         |                     |
         |   Update            |
         |   Sample Weights    |
         |                     |
         +---------+-----------+
                   |
                   v
         +---------------------+
         |                     |
         |   Repeat            |
         |   Iterations        |
         |                     |
         +---------+-----------+
                   |
                   v
         +---------------------+
         |                     |
         |   Final             |
         |   Prediction        |
         |                     |
         +---------+-----------+
                   |
                   v
         +---------------------+
         |                     |
         |   Ensemble          |
         |   Model             |
         |                     |
         +---------------------+
         
         
---



Here's how it works in simple terms:

1. Imagine you have a group of people who are not very good at solving a problem individually. These people are like the weak learners in Adaboost.

2. You give each person a different set of data and ask them to make predictions about the problem. Their predictions might not be very accurate on their own.

3. Now, you analyze the predictions made by each person and pay more attention to the predictions that were wrong. You want to focus on the areas where they struggled.

4. In the next round, you give more importance to the predictions that were incorrect. This means you try to make those predictions better by adjusting the parameters of the weak learners.

5. You repeat this process for several rounds, each time adjusting the parameters of the weak learners based on their previous performance.

6. Finally, you combine the predictions from all the weak learners, giving more weight to those who performed better overall.

By doing this, Adaboost creates a strong model that learns from the mistakes of the weak learners and improves its accuracy over time. It's like having a group of people work together, where each person focuses on the areas they are not good at, and the group as a whole becomes better at solving the problem.
         
         

1. Initialize Sample Weights:
   - Each training sample is assigned an initial weight, typically set to 1/N, where N is the total number of samples.

2. Training Iteration:
   - Adaboost performs a series of training iterations.

3. Weak Learner Training:
   - In each iteration, a weak learner (e.g., decision stump) is trained on the training data.
   - The weak learner aims to minimize the weighted error, taking into account the sample weights.

4. Calculate Weighted Error:
   - The weighted error of the weak learner is calculated as the sum of weights of misclassified samples.
   - It measures the performance of the weak learner on the weighted training data.

5. Calculate Weak Learner Weight:
   - The weight of the weak learner is determined based on its weighted error.
   - The weight emphasizes more accurate weak learners by assigning higher weights to those with lower errors.
   - The weight is calculated using a formula: alpha = 0.5 * ln((1 - weighted_error) / weighted_error).

6. Update Sample Weights:
   - The sample weights are updated to emphasize the misclassified samples from the current weak learner.
   - The weights of correctly classified samples are reduced, while the weights of misclassified samples are increased.
   - The updated weights ensure that the misclassified samples have higher weights for the next iteration.

7. Repeat Iterations:
   - Steps 3 to 6 are repeated for a specified number of iterations or until a stopping criterion is met.

8. Final Prediction:
   - After all iterations are completed, the weak learners are combined to form the final boosted model.
   - The prediction for a new sample is determined by aggregating the predictions of all weak learners, weighted by their respective alpha values.
   - Typically, a weighted majority vote or weighted sum is used to make the final prediction.

9. Ensemble Model:
   - The ensemble model consists of the combination of weak learners, each with its respective weight.
   - The ensemble model is capable of making more accurate predictions than individual weak learners.

The diagram illustrates the step-by-step process of Adaboost, highlighting the training iterations, the calculation of weighted error and weak learner weight, and the updating of sample weights. Ultimately, Adaboost creates an ensemble model by combining weak learners to make accurate predictions on unseen data.

-------------

Given: Data x  
**Step 1:**  
- Initialize weights $w_{i}$ = for every i  
- Start with the null classifier $f_{0}(x)$ = $g_{0}(x)$ = 0  
  


|Row# | F1|  F2   | F3   | Y  | Wgt|
|-----|---|-------|------|----|----|
|1    | 83|  0.30 | 73   | +  |0.1 |
|2    | 91|  0.06 |  7   | +  |0.1 |
|3    | 98|  0.41 | 42   |+   |0.1 |
|4    | 95| 0.16  |29    |+   |0.1 |
|5    | 89|   0.71|  99  |  + |0.1 |
|6    | 73|   0.81| 37   |  - |0.1 |
|7    | 58|   0.66|  82  | -  |0.1 |
|8    | 32|  0.65 | 36   | -  |0.1 |
|9    | 13|  0.11 |91    | -  |0.1 |
|10   | 82| 0.28  |91    |-   |0.1 |


**Step 2:**
- For t = 1 to T:  
    Generate training dataset by sampling with $w_i$  
    - Undersampling those which are correctly predicted datapoints.(whose weights are lower)    
    - oversampling those which are Incorrect datapoints.(whose weights are higher)      
    - If the weights are higher, it will be oversampling, if it is lower, it will be undersampling. 
- Fit some weak learner $g_t$ ($g_t$ is algorithm of choice. In this case, it is decision tree.)  
 
 
**Step 3:**
- Lets assume $g_t$ makes some prediction     

|Row# | F1|  F2   | F3   | Y  | Wgt|$g_t$ Prediction| Error|
|-----|---|-------|------|----|----|----------------|------|
|1    | 83|  0.30 | 73   | +  |0.1 |+               |0     |
|2    | 91|  0.06 |  7   | +  |0.1 |+               |0     |
|3    | 98|  0.41 | 42   |+   |0.1 |-               |1     |
|4    | 95| 0.16  |29    |+   |0.1 |-               |1     | 
|5    | 89|   0.71|  99  |  + |0.1 |-               |1     |
|6    | 73|   0.81| 37   |  - |0.1 |-               |0     |
|7    | 58|   0.66|  82  | -  |0.1 |-               |0     |
|8    | 32|  0.65 | 36   | -  |0.1 |-               |0     |
|9    | 13|  0.11 |91    | -  |0.1 |-               |0     |
|10   | 82| 0.28  |91    |-   |0.1 |-               |0     |  

- In the above table, we can see, in error column 1 means incorrect prediction and 0 for correct prediction.  
- AdaBoost loss function = 
$L(f) = \frac{1}{N}\sum{e^{-y_i.f(x_i)}}$    

  
**Step 4:**
- set $\lambda_t$ = $\frac{1}{2}\log\frac{1 - e_t}{e_t}$, $\lambda_t$ is weight of each model. $\lambda_t$ is also called "Amount of Say"  
- $e_t$  is the Total Error is equal to the sum of the weights of the incorrectly classified samples

So in this example, error, ($e_T$) becomes 0.1+0.1+0.1 = 0.3  
And $\lambda_t$ becomes 0.42.  
**Step 5:**  
- update the weights.
    - $w_i$ = $w_i * e^{\lambda_t}$ if wrongly classified by $g_i$
    - $w_i$ = $w_i * e^{- \lambda_t}$ if correctly classified by $g_i$  
    
    
|Row# | F1|  F2   | F3   | Y  | Wgt|New Weight| Error|
|-----|---|-------|------|----|----|----------|------|
|1    | 83|  0.30 | 73   | +  |0.1 |0.065     |0     |
|2    | 91|  0.06 |  7   | +  |0.1 |0.065     |0     |
|3    | 98|  0.41 | 42   |+   |0.1 |0.153     |1     |
|4    | 95| 0.16  |29    |+   |0.1 |0.153     |1     | 
|5    | 89|   0.71|  99  |  + |0.1 |0.153     |1     |
|6    | 73|   0.81| 37   |  - |0.1 |0.065     |0     |
|7    | 58|   0.66|  82  | -  |0.1 |0.065     |0     |
|8    | 32|  0.65 | 36   | -  |0.1 |0.065     |0     |
|9    | 13|  0.11 |91    | -  |0.1 |0.065     |0     |
|10   | 82| 0.28  |91    |-   |0.1 |0.065     |0     |  

So weight for correct classifier = 0.065 and for incorrect classifier = 0.153  

**step 6:**  
- Normalize the $w_i$ to sum to one  
- We need to multiply each New weight to sum of new weights  
    - Normalized Weights = $\frac{New\ Weight_i}{Total\ of\ New\ Weight}$
    - Total of Normalized weights adds to 1.  
    
|Row# | F1|  F2   | F3   | Y  | Wgt|New Weight   | Normalized Sample weights|
|-----|---|-------|------|----|----|-------------|--------------------------|
|1    | 83|  0.30 | 73   | +  |0.1 |0.065        |0.071                     |
|2    | 91|  0.06 |  7   | +  |0.1 |0.065        |0.071                     |
|3    | 98|  0.41 | 42   |+   |0.1 |0.153        |0.167                     |
|4    | 95| 0.16  |29    |+   |0.1 |0.153        |0.167                     | 
|5    | 89|   0.71|  99  |  + |0.1 |0.153        |0.167                     |
|6    | 73|   0.81| 37   |  - |0.1 |0.065        |0.071                     |
|7    | 58|   0.66|  82  | -  |0.1 |0.065        |0.071                     |
|8    | 32|  0.65 | 36   | -  |0.1 |0.065        |0.071                     |
|9    | 13|  0.11 |91    | -  |0.1 |0.065        |0.071                     |
|10   | 82| 0.28  |91    |-   |0.1 |0.065        |0.071                     |  
|     |   |       |      |    |    |Total = 0.917|  Total = 1               |  


**Step 7:**  
- The New Model is $f_t = f_{t-1} + \lambda_t g_t$  
- Output of the final model is  $f_T(x) = sgn(\sum^{T}_{t=1}\lambda_t g_t)$  .   

 
--------
Summary:  


- Initialize weights $w_{i}$ = for every i  
- Start with the null classifier $f_{0}(x)$ = $g_{0}(x)$ = 0  
- For t = 1 to T:  
    - Generate training dataset by sampling with $w_i$ 
    - set $\lambda_t$ = $\frac{1}{2}\log\frac{1 - e_t}{e_t}$ 
    - update the weights.
        - $w_i$ = $w_i * e^{\lambda_t}$ if wrongly classified by $g_i$
        - $w_i$ = $w_i * e^{- \lambda_t}$ if correctly classified by $g_i$
    - Normalize the $w_i$ to sum to one  
    - The New Model is $f_t = f_{t-1} + \lambda_t g_t$  

- Output of the final model is  $f_T(x) = sgn(\sum^{T}_{t=1}\lambda_t g_t)$  .   

    



In [1]:
from sklearn.datasets import load_iris
from sklearn.ensemble import AdaBoostClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split the dataset into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create an AdaBoost classifier
adaboost_classifier = AdaBoostClassifier(n_estimators=50, random_state=42)

# Fit the classifier to the training data
adaboost_classifier.fit(X_train, y_train)

# Make predictions on the test set
y_pred = adaboost_classifier.predict(X_test)

# Calculate the accuracy of the classifier
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy: ", accuracy)


Accuracy:  1.0
