***Boosting***

It is a ensumble technique that combines multiple weak learners (typically decision trees) to create a strong predictive model.

A weak learner is a model that performs slightly better than random guessing. In the context of boosting, decision trees are often used as weak learners because they can capture complex patterns in the data while being relatively simple and fast to train.

The key idea behind boosting is to train a sequence of weak learners, where each subsequent learner focuses on the mistakes made by the previous ones. This is typically done by assigning higher weights to the misclassified instances in the training data, so that the next learner pays more attention to those difficult cases.

The final prediction is made by combining the predictions of all the weak learners, often through a weighted majority vote (for classification) or a weighted average (for regression). The weights are usually determined based on the performance of each learner, with better-performing learners receiving higher weights.

***ADA BOOSTING***

Adapative boosting combines multiple weak learners (typically decision trees) to create a strong predictive model.
Each subsequent learner focuses on the mistakes made by the previous ones, improving overall accuracy.

Let’s say you train 3 weak models:
| Model | Accuracy | What happens next                          |
| ----- | -------- | ------------------------------------------ |
| 1st   | 60%      | Increase weights of misclassified points   |
| 2nd   | 70%      | Focus more on the previously wrong samples |
| 3rd   | 85%      | Combines all models → strong ensemble      |


In [None]:
# implementation of ada boosting algorithm

from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier

# Base learner (weak model)
base_model = DecisionTreeClassifier(max_depth=1)

# AdaBoost
ada = AdaBoostClassifier(
    estimator=base_model,
    n_estimators=50,
    learning_rate=0.1,
    random_state=42
)

***Key Hyperparameters***

| Parameter       | Meaning                              | Typical Range                                 |
| --------------- | ------------------------------------ | --------------------------------------------- |
| `n_estimators`  | Number of weak learners              | 50–500                                        |
| `learning_rate` | Shrinks contribution of each learner | 0.01–1.0                                      |
| `estimator`     | Base weak learner                    | Usually `DecisionTreeClassifier(max_depth=1)` |


***order of implementation for ada boost algorithm***

1. Define problem & goal  
    - Classification or regression?  
    - Choose metrics (Accuracy/F1 for classification, MAE/RMSE for regression).

2. Perform EDA (Exploratory Data Analysis)  
    - Check data distribution, missing values, class imbalance, and feature correlations.

3. Preprocess the data  
    - Handle missing values, encode categorical variables, scale numerical features if needed.

4. Split dataset  
    - Use train_test_split; for classification, use stratify=y to maintain class balance.

5. Select base estimator  
    - Usually a DecisionTreeClassifier (max_depth=1) — a weak learner (called a “stump”).

6. Train baseline AdaBoost model  
    - Initialize AdaBoostClassifier or AdaBoostRegressor with default parameters.

7. Cross-validation  
    - Use KFold / StratifiedKFold to check stability across folds.

8. Hyperparameter tuning  
    - Tune parameters such as:  
      - n_estimators (number of weak learners)  
      - learning_rate (step size per learner)  
      - base estimator depth (max_depth for stump)

9. Train final model & evaluate  
    - Retrain with best parameters and evaluate using chosen metrics.  
    - Plot learning curves or feature importances.

10. Interpret results  
     - Analyze misclassifications, feature importance, and contribution of weak learners.

11. Save & monitor  
     - Serialize model (joblib, pickle), and monitor performance over time.
