<a href="https://colab.research.google.com/github/Tanu-N-Prabhu/Python/blob/master/Machine%20Learning%20Interview%20Prep%20Questions/Supervised%20Learning%20Algorithms/Ensemble%20Learning/ensemble_learning_from_scratch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Ensemble Learning from Scratch (No ML Libraries)

In this notebook, you'll learn:

- What ensemble learning is and why it works
- Key types: Bagging, Boosting, and Stacking
- How to implement **Bagging (Random Forest)** and **Boosting (AdaBoost)** using NumPy


## What is Ensemble Learning?

Ensemble learning combines predictions from **multiple models** (often weak learners) to create a **stronger** final model.

### Three Common Types:
- **Bagging**: Train models independently on random subsets (e.g., Random Forest)
- **Boosting**: Train models sequentially, focusing on mistakes (e.g., AdaBoost)
- **Stacking**: Combine outputs of base models using a meta-model

Why it works:
- Reduces overfitting (variance)
- Improves accuracy
- Increases robustness


## Dataset
We'll use a simple 1D classification task for illustration:


In [1]:
import numpy as np
import matplotlib.pyplot as plt

# Generate synthetic 1D binary classification data
np.random.seed(42)
X = np.linspace(0, 10, 100).reshape(-1, 1)
y = (X[:, 0] > 5).astype(int)
y[:20] = 1  # Add noise
y[-20:] = 0

In [2]:
print(X)
print(y)

[[ 0.        ]
 [ 0.1010101 ]
 [ 0.2020202 ]
 [ 0.3030303 ]
 [ 0.4040404 ]
 [ 0.50505051]
 [ 0.60606061]
 [ 0.70707071]
 [ 0.80808081]
 [ 0.90909091]
 [ 1.01010101]
 [ 1.11111111]
 [ 1.21212121]
 [ 1.31313131]
 [ 1.41414141]
 [ 1.51515152]
 [ 1.61616162]
 [ 1.71717172]
 [ 1.81818182]
 [ 1.91919192]
 [ 2.02020202]
 [ 2.12121212]
 [ 2.22222222]
 [ 2.32323232]
 [ 2.42424242]
 [ 2.52525253]
 [ 2.62626263]
 [ 2.72727273]
 [ 2.82828283]
 [ 2.92929293]
 [ 3.03030303]
 [ 3.13131313]
 [ 3.23232323]
 [ 3.33333333]
 [ 3.43434343]
 [ 3.53535354]
 [ 3.63636364]
 [ 3.73737374]
 [ 3.83838384]
 [ 3.93939394]
 [ 4.04040404]
 [ 4.14141414]
 [ 4.24242424]
 [ 4.34343434]
 [ 4.44444444]
 [ 4.54545455]
 [ 4.64646465]
 [ 4.74747475]
 [ 4.84848485]
 [ 4.94949495]
 [ 5.05050505]
 [ 5.15151515]
 [ 5.25252525]
 [ 5.35353535]
 [ 5.45454545]
 [ 5.55555556]
 [ 5.65656566]
 [ 5.75757576]
 [ 5.85858586]
 [ 5.95959596]
 [ 6.06060606]
 [ 6.16161616]
 [ 6.26262626]
 [ 6.36363636]
 [ 6.46464646]
 [ 6.56565657]
 [ 6.66666

## Bagging Example: Random Forest (Simplified)
We'll simulate a Random Forest using multiple shallow decision stumps:

In [3]:
def stump_predict(X, threshold):
    return (X[:, 0] > threshold).astype(int)

def bagging_predict(X, thresholds):
    predictions = []
    for t in thresholds:
        pred = stump_predict(X, t)
        predictions.append(pred)
    return np.round(np.mean(predictions, axis=0)).astype(int)

# Create a forest of 5 random decision stumps
np.random.seed(0)
thresholds = np.random.uniform(2, 8, size=5)
y_pred_bag = bagging_predict(X, thresholds)

accuracy_bag = np.mean(y_pred_bag == y)
print(f"Bagging Accuracy: {accuracy_bag:.2f}")

Bagging Accuracy: 0.57


## Boosting Example: AdaBoost with Decision Stumps

In [4]:
def adaboost(X, y, num_rounds=5):
    n = X.shape[0]
    weights = np.ones(n) / n
    stumps = []
    alphas = []

    for _ in range(num_rounds):
        best_error = float('inf')
        best_threshold = None
        best_pred = None

        for t in np.linspace(0, 10, 100):
            pred = stump_predict(X, t)
            error = np.sum(weights * (pred != y))
            if error < best_error:
                best_error = error
                best_threshold = t
                best_pred = pred

        alpha = 0.5 * np.log((1 - best_error + 1e-10) / (best_error + 1e-10))
        weights *= np.exp(-alpha * y * (2 * best_pred - 1))
        weights /= np.sum(weights)

        stumps.append(best_threshold)
        alphas.append(alpha)

    return stumps, alphas

def adaboost_predict(X, stumps, alphas):
    pred_sum = np.zeros(X.shape[0])
    for t, a in zip(stumps, alphas):
        pred = stump_predict(X, t)
        pred = 2 * pred - 1  # Convert {0,1} to {-1,1}
        pred_sum += a * pred
    return (pred_sum > 0).astype(int)

# Train and predict
stumps, alphas = adaboost(X, y, num_rounds=5)
y_pred_boost = adaboost_predict(X, stumps, alphas)

accuracy_boost = np.mean(y_pred_boost == y)
print(f"AdaBoost Accuracy: {accuracy_boost:.2f}")

AdaBoost Accuracy: 0.60


## Summary

- Learned what ensemble learning is
- Implemented **Bagging** using random thresholds (Random Forest idea)
- Implemented **Boosting** using AdaBoost and decision stumps
- Used only NumPy and basic functions — no ML libraries

This notebook builds strong intuition for how real ensemble algorithms like Random Forest and Gradient Boosting work.
