<a href="https://colab.research.google.com/github/AsraniSanjana/All_Codes/blob/main/All_Semester_Codes/ML_sem7/models/ML05_D17B1_BaggingBoosting.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

NAME: SANJANA ASRANI

DIV: D17B

ROLL NO.: 01

ML LAB 05 : BAGGING AND BOOSTING

DOP : 25/09/23

# BAGGING VS. BOOSTING

**Bagging (Bootstrap Aggregating)**:

1. **Base Model Training**:
   - Trains multiple base models independently in parallel.
   - Each base model is trained on a random subset of the training data (with replacement).
   - No dependency among the base models.

2. **Weighting of Data**:
   - All base models contribute equally to the final prediction.
   - No specific weighting of base models.
   - Final prediction is often an average (in Bagging Regressor) or majority vote (in Bagging Classifier) of the base model predictions.

3. **Variance vs. Bias**:
   - Aims to reduce variance.
   - Reduces the variability of predictions and improves model stability.

4. **Parallelism**:
   - Suitable for parallel and distributed computing because base models are trained independently.

5. **Robustness to Overfitting**:
   - Tends to reduce overfitting, making it less prone to overfit the training data.
   - More likely to generalize well to unseen data.

**Boosting**:

1. **Base Model Training**:
   - Trains multiple base models sequentially in an adaptive manner.
   - Each base model corrects the errors made by the previous models.
   - Base models are usually weak learners (models that perform slightly better than random chance).

2. **Weighting of Data**:
   - Base models are assigned weights based on their performance.
   - Models that perform better on the training data are given higher weights.
   - Models that perform worse are given lower weights.
   - Each base model's contribution to the final prediction is weighted according to its competence.

3. **Variance vs. Bias**:
   - Aims to reduce bias.
   - Focuses on correcting errors made by previous models.
   - Can result in models with low bias but potentially higher variance.

4. **Parallelism**:
   - Limited parallelism because base models are trained sequentially.
   - Each base model depends on the output of the previous model, which limits parallelization.

5. **Robustness to Overfitting**:
   - Can be more prone to overfitting, especially with too many iterations or complex base models.
   - Requires careful tuning to prevent overfitting.

In [None]:
# Import necessary libraries
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.ensemble import BaggingRegressor
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import mean_squared_error
from sklearn.ensemble import BaggingRegressor
from sklearn.model_selection import GridSearchCV

In [None]:
iris = load_iris()
X = pd.DataFrame(data=iris.data, columns=iris.feature_names)
y = pd.Series(iris.target)  # Target variable (species)

In [None]:
# mapping target values to numbers
class_mapping = {
    0: 0,
    1: 1,
    2: 2
}
y = y.map(class_mapping)

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
n_estimators = 5000  # Number of base estimators (e.g., Decision Trees)
bagging_regressor = BaggingRegressor(n_estimators=n_estimators, random_state=42)
bagging_regressor.fit(X_train, y_train)

In [None]:
y_pred = bagging_regressor.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse}")

Mean Squared Error: 0.0009295680000000002


# HYPER PARAMETER TUNING

In [None]:
bagging_regressor = BaggingRegressor(random_state=42)

# Define hyperparameters to tune

# FOR 4 ESTIMATORS
param_grid = {
    'n_estimators': [25, 100, 150,250],        # Number of base estimators (Decision Trees)
    'max_samples': [0.4, 0.4, 0.4, 0.4],       # Fraction of samples to be used for each base estimator
    'max_features': [0.5, 0.8, 1.0,1.0],      # Fraction of features to be used for each base estimator
}

In [None]:
# Create a GridSearchCV object
grid_search = GridSearchCV(estimator=bagging_regressor, param_grid=param_grid,scoring='neg_mean_squared_error', cv=5)

# Perform grid search
grid_search.fit(X_train, y_train)

In [None]:
# Get the best hyperparameters
best_params = grid_search.best_params_
print("Best Hyperparameters:")
print(best_params)

Best Hyperparameters:
{'max_features': 0.8, 'max_samples': 0.4, 'n_estimators': 250}


In [None]:
# Evaluate the model with the best hyperparameters
best_model = grid_search.best_estimator_
y_pred = best_model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error (Best Model): {mse}")

Mean Squared Error (Best Model): 0.0046261333333333324


# **ASSESSMENT**

#**What is the difference between BAGGING REGRESSOR AND BAGGING CLASSIFIER**

(i) Bagging, which stands for Bootstrap Aggregating, is an ensemble learning technique that is used to improve the performance of machine learning models by reducing overfitting and variance and thus increasing stability.

(ii) improves the overall performance of machine learning models by introducing diversity among the base models through bootstrapping.

(iii) bagging can be applied to both regression and classification problems, resulting in Bagging Regressors and Bagging Classifiers, respectively. The main difference between them lies in the type of prediction task they are designed for:

1. Bagging Regressor:
   - bagging Regressor is used for regression tasks
   - aggregates the predictions of multiple regression models to make a final prediction
   - the goal is to predict a continuous numeric value (e.g., predicting house prices, temperature, stock prices)
   - It works by training multiple base regression models on different subsets of the training data, which are created through bootstrapping (SRSWR - randomly sampling with replacement)
   - Common base models used : decision trees or linear regression

2. Bagging Classifier:
   - Bagging Classifier is used for classification tasks
   - The final prediction is determined by aggregating the predictions of the base classifiers through methods such as majority voting (for binary classification) or weighted voting (for multi-class classification).
   - goal is to classify input data into one of several predefined classes or categories (e.g., spam detection, image recognition).
   - Similar to the Bagging Regressor, it also involves training multiple base classifiers on bootstrapped subsets of the training data.
   - Common base classifiers used in Bagging Classifiers include decision trees, random forests, and support vector machines.


---


**Q2. What is the difference between BOOSTING REGRESSOR AND BOOSTING CLASSIFIER**

Boosting is another ensemble learning technique used to improve the performance of machine learning models. Like bagging, boosting can be applied to both regression and classification problems, resulting in Boosting Regressors and Boosting Classifiers. The key difference between them is in how they train and combine base models:

1. Boosting Regressor:
   - Boosting Regressor is used for regression tasks, where the objective is to predict a continuous numeric value (e.g., predicting house prices, stock prices).
   - It works by training a sequence of base regression models, with each subsequent model focusing on the mistakes made by the previous ones.
   - Base models are typically decision trees with limited depth (weak learners).
   - During training, each base model is assigned a weight based on its performance, and the models' predictions are combined through a weighted sum to make the final prediction.
   - The process continues until a predefined number of base models (iterations) are trained or until the performance stops improving.

2. Boosting Classifier:
   - Boosting Classifier is used for classification tasks, where the goal is to classify input data into one of several predefined classes or categories (e.g., spam detection, image recognition).
   - Similar to Boosting Regressor, it trains a sequence of base classifiers, with each one focusing on correcting the mistakes of its predecessors.
   - Base classifiers are typically decision trees with limited depth (weak learners).
   - The final prediction in Boosting Classifier is determined by a weighted combination of the base classifiers' predictions, with more weight given to the models that perform better on the training data.
   - The boosting process continues for a set number of iterations or until no further improvements in performance can be achieved.


---

**Q3. What are the different ways to combine classifiers.**

The choice of which method to use depends on the nature of the problem, the characteristics of the data, and the behavior of the individual base classifiers. Ensemble methods are powerful tools for improving predictive performance and robustness in various machine learning tasks

1. **Voting Methods**:
   - **Majority Voting**: In binary classification, each classifier's prediction is treated as a vote, and the class with the majority of votes is chosen as the final prediction. In multi-class classification, the class with the highest number of votes can be selected.
   - **Weighted Voting**: Similar to majority voting, but each classifier's vote is weighted based on its reliability or performance on the validation data.

2. **Averaging Methods**:
   - **Simple Average**: For regression tasks, the predictions of multiple regressors are averaged to obtain the final prediction.
   - **Weighted Average**: Similar to simple averaging, but each regressor's prediction is weighted based on its performance or reliability.

3. **Bagging (Bootstrap Aggregating)**:
   - Bagging combines multiple base classifiers by training them independently on bootstrapped subsets of the training data. For classification tasks, the final prediction can be obtained by majority voting.

4. **Boosting**:
   - Boosting combines multiple base classifiers sequentially, with each classifier trained to correct the errors made by the previous ones. The final prediction is typically a weighted combination of the base classifier predictions.

5. **Random Forests**:
   - Random Forests combine multiple decision trees. Each tree is trained on a subset of the data and a random subset of features. The final prediction is obtained through majority voting (for classification) or averaging (for regression).

8. **Adaptive Methods**:
   - Methods like AdaBoost adaptively assign weights to data points during training to emphasize the samples that were misclassified by previous classifiers.
   

 **Stacking (Stacked Generalization), Stacked Ensembles, Bayesian Model Averaging , Clustering Ensembles** are a few other methods.

---
