<font color="red" size="6">Ensemble methods</font>
<p> <font color="Yellow" size="5"><b>1_BAGGING</font>

Ensemble methods are highly powerful tools, as they generally lead to improved model accuracy, robustness, and generalization compared to individual models. However, they can also be computationally expensive, especially for large datasets or many base learners.

Bagging (Bootstrap Aggregating) is an ensemble method in machine learning designed to improve the performance of machine learning algorithms by reducing variance and preventing overfitting. The main idea behind bagging is to combine predictions from multiple base models (typically weak learners) to produce a more accurate and stable prediction.

<font color="pink" size=4>Key Concepts of Bagging:</font>
<ol>
   <li> <font color="orange">Bootstrap Sampling:</font>
       Bagging uses bootstrap sampling to generate multiple training sets by sampling the data with replacement. This means some samples may be repeated in each subset, while others may be omitted.</li>
    <li><font color="orange">Parallel Training:</font>Bagging trains multiple base models (such as decision trees) in parallel, each on a different bootstrap sample of the data.</li>
   <li> <font color="orange">Prediction Aggregation:</font>
        For classification, bagging combines the predictions of individual models by majority voting.
        For regression, bagging combines the predictions by averaging.</li>
    <li><font color="orange">Reduces Overfitting:</font>By combining multiple models, bagging reduces the variance of the final model without significantly increasing bias. It is particularly useful when the base model is prone to overfitting (e.g., decision trees).</li><ol/>

In [1]:
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix


In [2]:
# 1. Load Wine Dataset
data = load_wine()
X = data.data
y = data.target

# 2. Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)


In [3]:
# 3. Create the Bagging Classifier using a Decision Tree as the base estimator
bagging_model = BaggingClassifier(base_estimator=DecisionTreeClassifier(), n_estimators=100, random_state=42)

In [7]:
# 4. Train the Bagging model
bagging_model.fit(X_train, y_train)



In [5]:
# 5. Make predictions on the test set
y_pred = bagging_model.predict(X_test)

In [6]:
# 6. Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")

# Display the classification report and confusion matrix for better understanding
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred))

Accuracy: 0.9630

Classification Report:
              precision    recall  f1-score   support

           0       1.00      0.95      0.97        19
           1       0.91      1.00      0.95        21
           2       1.00      0.93      0.96        14

    accuracy                           0.96        54
   macro avg       0.97      0.96      0.96        54
weighted avg       0.97      0.96      0.96        54


Confusion Matrix:
[[18  1  0]
 [ 0 21  0]
 [ 0  1 13]]


<font color="pink" size=4>Advantages of Bagging:</font>
<ol>
    <li><font color="orange">Improved Accuracy:</font> By combining multiple models, bagging generally improves predictive performance.</li>
    <li><font color="orange">Reduced Overfitting:</font> Bagging reduces the risk of overfitting by averaging over multiple models.</li>
    <li><font color="orange">Parallelizable:</font> Since each base model is trained independently, bagging can be parallelized, leading to faster computations when dealing with large datasets.</li>
    <li><font color="orange">Robust to Noise:</font> Bagging helps make models more robust against noisy data and small fluctuations in the training set.</li></ol>

<font color="pink" size=4>Disadvantages of Bagging:</font>
<ol>
    <li><font color="orange">Computationally Expensive:</font> Training multiple models in parallel can be computationally expensive, especially with large datasets.</li>
    <li><font color="orange">Interpretability:</font> As an ensemble of many models, bagging reduces the interpretability of the resulting model. For example, a Random Forest (a bagging-based model) is harder to interpret than a single decision tree.</li>
    <li><font color="orange">Not Useful for All Models:</font> Bagging works best with models that have high variance (e.g., decision trees). For models with low variance, bagging may not provide significant benefits.</li></ol>