<a href="https://colab.research.google.com/github/KhotNoorin/Machine-Learning-/blob/main/Ensemble_learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Ensemble learning:


---

Ensemble Learning is a powerful machine learning technique where multiple models (often called "weak learners" or "base learners") are combined to produce a stronger overall model. The idea is that while a single model might make mistakes, combining several models can reduce errors and improve accuracy and robustness.



---

Why Ensemble Learning Works:
- Reduces bias (error due to wrong assumptions)
- Reduces variance (error due to sensitivity to small changes in data)
- Improves generalization on unseen data



---

# Main Types of Ensemble Learning
1. Bagging (Bootstrap Aggregating)
 - Goal: Reduce variance
 - How: Train multiple models on random subsets of the data (with replacement)
 - Final Prediction: Averaging (regression) or majority vote (classification)
 - Example: Random Forest

2. Boosting
 - Goal: Reduce bias
 - How: Models are trained sequentially, each trying to correct the errors of the previous one
 - Final Prediction: Weighted vote or sum of outputs
 - Examples:
      - AdaBoost
      - Gradient Boosting
      - XGBoost
      - LightGBM
      - CatBoost

3. Stacking (Stacked Generalization)
- Goal: Combine strengths of multiple different models
- How: Use a meta-model to learn how to best combine base models
- Final Prediction: Meta-model output based on predictions from base models

4. Voting
- Goal: Combine predictions from multiple models
- How: Simple voting (hard or soft) across different models
- Types:
    - Hard Voting: Majority vote
    - Soft Voting: Average probabilities (works only if models output probabilities)

---

Summary:

| Ensemble Type | Purpose        | Key Feature           | Example Models           |
| ------------- | -------------- | --------------------- | ------------------------ |
| Bagging       | ↓ Variance     | Parallel learners     | Random Forest            |
| Boosting      | ↓ Bias         | Sequential correction | AdaBoost, XGBoost        |
| Stacking      | ↑ Performance  | Meta-model combiner   | Blend of any classifiers |
| Voting        | Simple combine | Hard or soft voting   | VotingClassifier         |


In [1]:
# Import Libraries
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

In [2]:
# Ensemble Methods
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier, VotingClassifier, StackingClassifier

In [3]:
# Base Models
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB

# 1. Load and Split Data


In [4]:
iris = load_iris()
X = iris.data
y = iris.target

In [5]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 2. Random Forest (Bagging)


In [6]:
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)
rf_pred = rf.predict(X_test)
rf_acc = accuracy_score(y_test, rf_pred)
print(f"Random Forest Accuracy: {rf_acc:.2f}")

Random Forest Accuracy: 1.00


# 3. AdaBoost (Boosting)


In [7]:
ada = AdaBoostClassifier(n_estimators=50, random_state=42)
ada.fit(X_train, y_train)
ada_pred = ada.predict(X_test)
ada_acc = accuracy_score(y_test, ada_pred)
print(f"AdaBoost Accuracy: {ada_acc:.2f}")

AdaBoost Accuracy: 0.93


# 4. Voting Classifier


In [8]:
# SVC must have `probability=True` for soft voting
clf1 = LogisticRegression()
clf2 = DecisionTreeClassifier()
clf3 = SVC(probability=True)

voting = VotingClassifier(estimators=[
    ('lr', clf1), ('dt', clf2), ('svc', clf3)],
    voting='soft')  # Change to 'hard' for hard voting

voting.fit(X_train, y_train)
voting_pred = voting.predict(X_test)
voting_acc = accuracy_score(y_test, voting_pred)
print(f"Voting Classifier Accuracy: {voting_acc:.2f}")


Voting Classifier Accuracy: 1.00


# 5. Stacking Classifier


In [9]:
# Base learners
estimators = [
    ('lr', LogisticRegression()),
    ('dt', DecisionTreeClassifier()),
    ('nb', GaussianNB())
]

In [10]:
# Meta-learner
stacking = StackingClassifier(estimators=estimators, final_estimator=SVC())

stacking.fit(X_train, y_train)
stacking_pred = stacking.predict(X_test)
stacking_acc = accuracy_score(y_test, stacking_pred)
print(f"Stacking Classifier Accuracy: {stacking_acc:.2f}")

Stacking Classifier Accuracy: 1.00
