**Ensemble methods** combine the predictions of multiple models to improve the overall performance of a model. Instead of relying on a single model, ensembles utilize multiple models (often weak learners) and aggregate their predictions to achieve a more accurate or stable result.

The main ensemble methods are:

- **Bagging (Bootstrap Aggregating)**
- **Boosting**
- **Stacking**


Ensemble methods improve the performance of weak models by combining them into a stronger, more robust model. By aggregating the predictions from multiple models, the overall performance is enhanced.

### Common Applications
- **Classification**
- **Regression**

### Why Ensemble Methods?

- **Reduce overfitting and variance**
- **Improve model performance**




**Bagging** (Bootstrap Aggregating) is an ensemble method that trains multiple models (typically of the same type) independently on different subsets of the data, which are created via bootstrapping. The final prediction is made by averaging the predictions for regression tasks or taking a vote for classification tasks.

Bagging helps to **reduce variance** and is particularly effective with high-variance models, such as decision trees.

### Steps of Bagging:

1. **Generate multiple subsets** of the training dataset using random sampling with replacement (bootstrapping).
2. **Train the same model** on each subset.
3. **For classification**: Aggregate the results using a majority vote.
   **For regression**: Take the average of the predictions.


In [1]:
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

data = load_iris()
X = data.data
y = data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

base_model = DecisionTreeClassifier()

bagging_model = BaggingClassifier(base_model, n_estimators=50, random_state=42)

bagging_model.fit(X_train, y_train)

y_pred = bagging_model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print(f'Bagging Classifier Accuracy: {accuracy:.2f}')


Bagging Classifier Accuracy: 1.00



**Boosting** is an ensemble technique where models are trained sequentially, with each new model focusing on the mistakes made by previous models. The core idea is to give more weight to misclassified points in subsequent iterations, improving the overall model's performance.

### Popular Boosting Algorithms:
- **AdaBoost** (Adaptive Boosting)
- **Gradient Boosting**
- **XGBoost**
- **LightGBM**


In [2]:
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

data = load_iris()
X = data.data
y = data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

base_model = DecisionTreeClassifier(max_depth=1)

boosting_model = AdaBoostClassifier(base_model, n_estimators=50, random_state=42)

boosting_model.fit(X_train, y_train)

y_pred = boosting_model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print(f'Boosting Classifier Accuracy: {accuracy:.2f}')


Boosting Classifier Accuracy: 1.00
