# 📅 Day 11: Random Forests & Ensemble Methods

## 🎯 Objective
Learn what ensemble learning is and how to use a Random Forest classifier to improve performance over a single decision tree.

## 🌲 What is a Random Forest?
- An ensemble method that builds multiple decision trees and merges their results to improve accuracy and reduce overfitting.
- Works well with both classification and regression problems.

## 🤝 Why Ensemble Learning?
- Combines multiple weak learners to form a strong learner
- Reduces variance, improves generalization
- Types: Bagging (Random Forest), Boosting, Stacking

## 📦 Step 1 – Load & Prepare Data (Breast Cancer Dataset)

In [None]:
from sklearn.datasets import load_breast_cancer
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Load data
data = load_breast_cancer()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target

# Split & scale
X = df.drop('target', axis=1)
y = df['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

## 🌳 Step 2 – Train a Random Forest Classifier

In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report

rf_model = RandomForestClassifier(n_estimators=100, max_depth=6, random_state=42)
rf_model.fit(X_train_scaled, y_train)
y_pred = rf_model.predict(X_test_scaled)

print('Random Forest Accuracy:', accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred, target_names=data.target_names))

## 📊 Step 3 – Feature Importance

In [None]:
import matplotlib.pyplot as plt
import numpy as np

importances = rf_model.feature_importances_
indices = np.argsort(importances)[::-1]

plt.figure(figsize=(12, 6))
plt.title('Feature Importances')
plt.bar(range(X.shape[1]), importances[indices], align='center')
plt.xticks(range(X.shape[1]), df.columns[indices], rotation=90)
plt.tight_layout()
plt.show()

## 🔁 Step 4 – Try Different Parameters

In [None]:
# Change number of trees and max depth
rf_model_alt = RandomForestClassifier(n_estimators=50, max_depth=3, random_state=42)
rf_model_alt.fit(X_train_scaled, y_train)
alt_pred = rf_model_alt.predict(X_test_scaled)
print('Alt Random Forest Accuracy:', accuracy_score(y_test, alt_pred))

## ✅ Summary
- Random Forests average many trees to reduce overfitting.
- More stable than a single Decision Tree.
- Feature importance helps interpret models.