# AdaBoost Classifier

The basic principle of the AdaBoost algorithm is to combine several weak learners and make an aggregated prediction.

## How is Ada boost different from a normal Decision Tree algorithm?

No depth Restirctions:
- In a normal decision tree algorithm or even in the random forest algorithm we tend to build different trees with no restrictions on the depth of each tree.
- In AdaBoost version of random forest, we tend to create trees with just one node and two leaves (also called as stump) 

Comprised of weak learners:
- As stumps have just a single node, it means that they can only classify the data using any one of the feature at any given time.
- This makes them very weak learners

Unequal priority to stumps:
- In a normal Random Forest implementation, we tend to give equal priority to the decision made by each tree and finally we aggregate our results
- But, in AdaBoost implementation, we tend to bake in the concepts of variable weights right into the algorithm. This means that not all the stumps have an equal say in predicting the output

The stumps created before will influence the stumps created after:
- In a Random forest implementation, each of the trees which is developed is built independantly.
- In an Adaboost implementation, the error we get from the stump created earlier would influence the creation of future stumps

## How does AdaBoost work?

1. Assign a weight to every every sample record we have. Initially all of them can be assigned a weight of 1/(no_of_samples)
2. Find how each feature is independantly able to predict the outcome. Using the data we can compute the <b>weighted gini index</b> (weighted using the sample_weight) for each parameter and then choose the lowest one to be our first stump feature.
3. Find the error for this stump by adding the weights of samples which were incorrectly classified by our model.
4. Using the total error, we can determine the influence this stump has on the final result using the formula
    
    <b>Amount of say = (1/2)*log ((1 - total_error)/ total_error)</b>
5. As our model classified a few samples wrongly, we can emphasise that the next stumps must focus on these specific samples by increasing their respective weights and decreasing the sample weight of correctly classified samples.
    
    <b>new_sample_weight for incorrect classification = old_weight* e^(amount_of_say)</b>
    
    Is the amount of say for the stump was larger we add a larger penalty in the form of sample_weight for negatively classified sample
    
    <b>new_sample_Weight for correct classifications = old_weight* e^ - (amount_of_say)</b>
    
    <b> NOTE: as the new weights total may exceed 1, we normalize each weight by diving by sum of all new weights</b>

6. After the new weights are calculated we can use the existing samples or randomly pick samples (allowing duplication) from the existing samples to have same number of samples. This random picking of data is done by using the sample_weights as a distribution and picking values. As the higher weight samples would occupy more area under the distribution we have high chance that those samples would repeat themselves more.
7. After the new data is formed we reset all the sample_weights to 1/(no_of_samples) and continue the process.

How can a forest of stumps (created using AdaBoost) classify the samples?

- we pass the new data to all the stumps
- at the end we might get stumps which predict 0 (this can be any output) and the remaining which predict 1
- we just add the amounts of say for each category of stumps and whichever has the higher amount of say we give out that result

How are the stumps created earlier influencing the one's to come?

- as we are using the weighted gini function to determine which samples need to be focused on, we are indirectly influencing the stumps which are yet to come

# Source

- https://www.youtube.com/watch?v=LsK-xG1cLYA
- https://www.youtube.com/watch?v=NLRO1-jp5F8

# Example

In [1]:
from sklearn.model_selection import cross_val_score
from sklearn.datasets import load_iris
from sklearn.ensemble import AdaBoostClassifier

In [2]:
X, y = load_iris(return_X_y=True)
clf = AdaBoostClassifier(n_estimators=100)
scores = cross_val_score(clf, X, y, cv=5)
scores.mean()

0.9466666666666665