## Day 34 — SVM Case Study & Ensemble Learning (Bagging, Voting, Stacking)

This notebook is part of my **Machine Learning Learning Journey** and focuses on a
practical **SVM case study** followed by an introduction to **Ensemble Learning**.

The session explains:
- Why SVM optimization is solved using the dual form
- Soft-margin SVM and hinge loss
- Kernel trick intuition (recap)
- Why single models fail
- How ensemble methods improve performance


## 1. Recap: Support Vector Machines (SVM)

Key ideas:
- Maximize margin between classes
- Decision boundary defined by support vectors
- Only a few critical points affect the model

Objective:
\[
\min \frac{1}{2}\|w\|^2
\]
subject to:
\[
y_i(w^Tx_i + b) \ge 1
\]


## 2. Why Hard Margin SVM Is Not Practical

Hard Margin assumptions:
- Perfectly separable data
- No noise or outliers

Reality:
- Real-world data is noisy
- Outliers violate constraints

Hence, **Hard Margin SVM is not solvable in practice**.


## 3. Soft Margin SVM

Soft Margin introduces slack variables \( \xi_i \):

\[
y_i(w^Tx_i + b) \ge 1 - \xi_i
\]

Objective:
\[
\min \frac{1}{2}\|w\|^2 + C \sum \xi_i
\]

Where:
- \( \xi_i > 0 \) → margin violation
- \( \xi_i > 1 \) → misclassification


## 4. Interpretation of Slack Variable \( \xi \)

- \( \xi < 0 \) → correctly classified (outside margin)
- \( 0 < \xi < 1 \) → inside margin
- \( \xi > 1 \) → misclassified

Soft margin allows controlled mistakes.


## 5. Hinge Loss in SVM

Hinge loss:
\[
\max(0, 1 - y_i(w^Tx_i))
\]

Properties:
- Zero loss for correctly classified points
- Penalizes margin violations
- Used instead of log loss


## 6. Role of Regularization Parameter (C)

- Large C:
  - Low bias
  - Small margin
  - Risk of overfitting

- Small C:
  - High bias
  - Large margin
  - Better generalization


## 7. Kernel Trick (Recap)

Idea:
- Map data from low-dimensional space to high-dimensional space
- Solve complex boundaries using linear hyperplanes

Kernel replaces dot product:
\[
K(x_i, x_j)
\]

Common kernels:
- Linear
- Polynomial
- RBF (Gaussian)
- Sigmoid


## 8. Multiclass Classification in SVM

Binary SVM extended using:

- One-vs-Rest (OvR)
  - One classifier per class
- One-vs-One (OvO)
  - Classifier for every pair of classes

Example:
26 classes →
- OvR → 26 classifiers
- OvO → 325 classifiers


## 9. Why Ensemble Learning?

Single models suffer from:
- High bias (underfitting)
- High variance (overfitting)

Ensembles:
- Combine multiple learners
- Reduce variance
- Improve generalization


## 10. Ensemble Learning

Ensemble = Combine predictions from multiple models

Learners can be:
- Logistic Regression
- Decision Tree
- KNN
- SVM

Goal:
Build a **strong learner** from multiple **weak learners**.


## 11. Weak vs Strong Learners

Weak learner:
- Slightly better than random
- High bias (e.g., decision stump)

Strong learner:
- Performs well
- Low bias and reasonable variance


## 12. Types of Ensemble Methods

Based on execution:
1. Parallel ensembles
2. Sequential ensembles

Based on learners:
1. Homogeneous (same model)
2. Heterogeneous (different models)
