# Adaboost (Ada)
***

1. Adaboost, or boosting in general, combines a series of weak learners into a strong learner. A weak learner is defined as any classifier that is slightly better than random guessing (>50%) which means that it has some basic understandings of the underlying distribution of the dataset. The output from the final strong learner is a combination of the weighted outputs of the weak learners. 
2. Adaboost works by repeatedly fitting a base model on training instances with different weights. First we initialize a equal weight for each training instance and then we have M iterations. In each iteration, we fit the base model on the training instances with the current weights and get a value called error rate that evaluates what is the percentage of the weights of the incorrectly classified instances. The error rate then is used to compute the classifier coefficient that increases as the error rate decreases. In the end of each iteration, we update the weight of each instance so that misclassified instances get larger weights and correctly classified instances get lower weights. After the iterations, we get M classifiers and their coefficients. To make a prediction for an instance from the strong learner, we get the outputs from the M classifiers, sum up the product of the outputs and their coefficients and take the sign of value as the final output. 
3. Adaboost assumes the weak learner to always have training accuracy larger than 50% and the output class to be 1 and -1. A very short decision tree called decision stump is usually used as the weak learner. 

1. **[Adaboost algorithm]** Here we show the adaboost algorithm for binary classification problems ($y \in \{-1, 1\}$).
    1. For the dataset with $N$ instances, initialize the observation weights for each instance $w_i=\frac{1}{N}, i=1,2, ... ,N$. 
    2. For $m = 1 ... M$,
        1. Fit a classifier $G_m(x)$ to the training instances with weights $w_i$.
        2. Compute 
            $$ E_m=\frac{\sum_{i=1}^{N} w_i \mathcal{1}(y_i\neq G_m(x_i))}{\sum_{i=1}^{N}w_i} $$
        3. Compute 
            $$ \alpha_m = \log(\frac{1-E_m}{E_m}) $$
        4. Set 
            $$ w_i \gets w_i \cdot e^{\alpha_m y_i G_m(x_i)} $$
    3. Final output of Adaboost: 
        $$ G(x) = \textrm{sign} (\sum_{m=1}^M \alpha_m G_m(x)) $$

## References
***

1. https://koalaverse.github.io/machine-learning-in-R/gradient-boosting-machines.html  
1. https://arxiv.org/pdf/1403.1452.pdf