# Bagging in Machine Learning | [Link](Bagging/Bagging.ipynb)

Bagging (Bootstrap Aggregating) is an ensemble learning technique used to improve the stability and accuracy of machine learning models. It works by creating multiple versions of a predictor by training on bootstrapped samples from the original data, then aggregating their predictions.

## Key Concepts

- **Bootstrap Sampling:**  
  Generate multiple training datasets by randomly sampling (with replacement) from the original dataset. Each bootstrap sample typically contains about 63.2% of unique data points.

- **Base Learners:**  
  Train a base model (e.g., decision tree, neural network) on each bootstrap sample independently.

- **Aggregation:**  
  Combine the predictions from all base models:
  - For regression: Average the outputs.
  - For classification: Use majority voting.

## Mathematical Formulation

For **regression**, the aggregated prediction is calculated as:

<p>
y = (1/N) &sum;<sub>i=1</sub><sup>N</sup> f<sub>i</sub>(x)
</p>

Where:  
- <code>N</code> is the number of base models,  
- <code>f<sub>i</sub>(x)</code> is the prediction from the <code>i</code><sup>th</sup> model, and  
- <code>y</code> is the final aggregated prediction.

For **classification**, the final class is determined by the mode (majority vote) of the predictions:

<p>
y = mode { f<sub>1</sub>(x), f<sub>2</sub>(x), ..., f<sub>N</sub>(x) }
</p>

## Advantages of Bagging

- **Variance Reduction:**  
  Aggregating predictions from diverse models reduces overall variance.
  
- **Robustness:**  
  Bagging helps to mitigate overfitting and improves model stability.

- **Parallel Training:**  
  Since each model is trained independently, the training can be parallelized to save time.

## Python Code Example

Below is an example using the `BaggingClassifier` from scikit-learn with a decision tree as the base estimator on the Wine dataset.

```python
# Import necessary libraries
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the Wine dataset
wine = load_wine()
X, y = wine.data, wine.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize the base classifier (a decision tree in this case)
base_classifier = DecisionTreeClassifier()

# Initialize the BaggingClassifier with 10 base estimators
bagging_classifier = BaggingClassifier(base_estimator=base_classifier, n_estimators=10, random_state=42)

# Train the BaggingClassifier
bagging_classifier.fit(X_train, y_train)

# Make predictions on the test set
y_pred = bagging_classifier.predict(X_test)

# Calculate and print the accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
```

## Summary

Bagging is a simple yet powerful ensemble method that:
- **Reduces variance** by averaging predictions from multiple models.
- **Improves robustness** against overfitting and noisy data.
- **Is highly parallelizable**, making it scalable for large datasets.