# Module 09: Support Vector Machines

**Difficulty**: ⭐⭐ Intermediate  
**Estimated Time**: 75 minutes  
**Prerequisites**: 
- [Module 04: Logistic Regression](04_logistic_regression.ipynb)
- [Module 07: Cross-Validation and Hyperparameter Tuning](07_cross_validation_hyperparameter_tuning.ipynb)

## Learning Objectives

By the end of this notebook, you will be able to:
1. Understand the intuition behind Support Vector Machines (maximum margin)
2. Apply Linear SVM for classification problems
3. Use the kernel trick (RBF, polynomial) for non-linear problems
4. Tune hyperparameters: C (regularization) and gamma (kernel coefficient)
5. Distinguish between SVC and SVR (classification vs regression)
6. Visualize decision boundaries in 2D

## 1. SVM Intuition: Maximum Margin Classifier

### The Problem

**Given**: Two classes of data points  
**Goal**: Find the "best" line to separate them

### Many Lines Can Separate!

```
    O  O  O         Different lines:
   ________         Line A: Close to O's
    X  X  X         Line B: Close to X's
                    Line C: In the middle ✓
```

**Question**: Which line is best?

### SVM's Answer: Maximum Margin!

**Margin**: Distance from decision boundary to nearest data points

**SVM Strategy**: Find the line with the LARGEST margin
- Maximizes distance to both classes
- More robust to new data
- Better generalization

### Key Terms

**Decision Boundary**: The line (or hyperplane) that separates classes  
**Support Vectors**: Data points closest to decision boundary  
**Margin**: Distance from decision boundary to support vectors  

### Real-World Analogy

**Building a road between two cities**:
- Many possible routes
- Best route: Maximizes distance from obstacles on both sides
- Support vectors = closest obstacles that determine the route

In [None]:
# Setup
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

# Set random seed for reproducibility
np.random.seed(42)

# Visualization settings
%matplotlib inline
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette('husl')

print("✓ Setup complete!")

In [None]:
# Create simple 2D dataset for visualization
from sklearn.datasets import make_classification

X_simple, y_simple = make_classification(
    n_samples=100, n_features=2, n_redundant=0, n_informative=2,
    n_clusters_per_class=1, random_state=42
)

# Visualize data
plt.figure(figsize=(8, 6))
plt.scatter(X_simple[y_simple==0, 0], X_simple[y_simple==0, 1], 
           c='blue', marker='o', s=50, alpha=0.7, label='Class 0')
plt.scatter(X_simple[y_simple==1, 0], X_simple[y_simple==1, 1], 
           c='red', marker='s', s=50, alpha=0.7, label='Class 1')
plt.xlabel('Feature 1', fontsize=12)
plt.ylabel('Feature 2', fontsize=12)
plt.title('Sample Dataset\nGoal: Find best separating line', fontsize=13, fontweight='bold')
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("Two linearly separable classes")
print("SVM will find the line with maximum margin!")

## 2. Linear SVM

### How Linear SVM Works

**Step 1**: Find decision boundary (hyperplane)  
**Step 2**: Maximize margin to nearest points  
**Step 3**: Only support vectors matter!

### Mathematical Formulation

```
Minimize: (1/2)||w||² + C × Σ(slack_i)
            ↑              ↑
      Maximize margin   Allow some errors
```

### C Parameter (Regularization)

**C**: Controls tradeoff between margin size and misclassification

- **Small C** (e.g., 0.1):
  - Large margin
  - More errors allowed
  - Simple decision boundary
  - May underfit

- **Large C** (e.g., 100):
  - Small margin
  - Few errors allowed
  - Complex decision boundary
  - May overfit

**Default**: C=1 (balanced)

In [None]:
# Train Linear SVM
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

# Scale features (IMPORTANT for SVM!)
scaler_simple = StandardScaler()
X_simple_scaled = scaler_simple.fit_transform(X_simple)

# Train Linear SVM
svm_linear = SVC(kernel='linear', C=1.0, random_state=42)
svm_linear.fit(X_simple_scaled, y_simple)

# Get support vectors
support_vectors = svm_linear.support_vectors_
n_support = svm_linear.n_support_

print(f"Number of support vectors: {len(support_vectors)}")
print(f"  Class 0: {n_support[0]}")
print(f"  Class 1: {n_support[1]}")
print(f"\nTraining accuracy: {svm_linear.score(X_simple_scaled, y_simple):.3f}")
print("\nOnly these support vectors determine the decision boundary!")

In [None]:
# Visualize Linear SVM decision boundary
def plot_svm_boundary(model, X, y, title):
    """Helper function to plot SVM decision boundary and margins"""
    plt.figure(figsize=(10, 7))
    
    # Create mesh for decision boundary
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 200),
                         np.linspace(y_min, y_max, 200))
    
    # Predict for mesh
    Z = model.decision_function(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    
    # Plot decision boundary and margins
    plt.contourf(xx, yy, Z, levels=[-999, -1, 1, 999], 
                colors=['lightblue', 'white', 'lightcoral'], alpha=0.3)
    plt.contour(xx, yy, Z, levels=[-1, 0, 1], 
               linestyles=['--', '-', '--'], linewidths=[2, 3, 2], 
               colors=['blue', 'black', 'red'])
    
    # Plot data points
    plt.scatter(X[y==0, 0], X[y==0, 1], c='blue', marker='o', s=50, 
               alpha=0.7, edgecolors='k', label='Class 0')
    plt.scatter(X[y==1, 0], X[y==1, 1], c='red', marker='s', s=50, 
               alpha=0.7, edgecolors='k', label='Class 1')
    
    # Highlight support vectors
    plt.scatter(model.support_vectors_[:, 0], model.support_vectors_[:, 1],
               s=200, linewidth=2, facecolors='none', edgecolors='green',
               label='Support Vectors')
    
    plt.xlabel('Feature 1 (scaled)', fontsize=12)
    plt.ylabel('Feature 2 (scaled)', fontsize=12)
    plt.title(title, fontsize=13, fontweight='bold')
    plt.legend(fontsize=11)
    plt.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.show()

plot_svm_boundary(svm_linear, X_simple_scaled, y_simple, 
                 'Linear SVM\nSolid line = Decision boundary, Dashed lines = Margins')

print("Green circles = Support vectors (determine the boundary)")
print("Dashed lines = Margin boundaries")
print("Solid black line = Decision boundary (maximum margin!)")

In [None]:
# Demonstrate effect of C parameter
fig, axes = plt.subplots(1, 3, figsize=(16, 5))
C_values = [0.1, 1, 100]

for idx, C in enumerate(C_values):
    model = SVC(kernel='linear', C=C, random_state=42)
    model.fit(X_simple_scaled, y_simple)
    
    # Create mesh
    x_min, x_max = X_simple_scaled[:, 0].min() - 1, X_simple_scaled[:, 0].max() + 1
    y_min, y_max = X_simple_scaled[:, 1].min() - 1, X_simple_scaled[:, 1].max() + 1
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 200),
                         np.linspace(y_min, y_max, 200))
    Z = model.decision_function(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    
    # Plot
    axes[idx].contourf(xx, yy, Z, levels=20, cmap='RdBu', alpha=0.3)
    axes[idx].contour(xx, yy, Z, levels=[0], linewidths=3, colors='black')
    axes[idx].scatter(X_simple_scaled[y_simple==0, 0], X_simple_scaled[y_simple==0, 1],
                     c='blue', marker='o', s=50, alpha=0.7, edgecolors='k')
    axes[idx].scatter(X_simple_scaled[y_simple==1, 0], X_simple_scaled[y_simple==1, 1],
                     c='red', marker='s', s=50, alpha=0.7, edgecolors='k')
    axes[idx].scatter(model.support_vectors_[:, 0], model.support_vectors_[:, 1],
                     s=200, linewidth=2, facecolors='none', edgecolors='green')
    
    title = f"C = {C}\n"
    if C == 0.1:
        title += "Large margin\n(Few support vectors)"
    elif C == 1:
        title += "Balanced\n(Moderate support vectors)"
    else:
        title += "Small margin\n(Many support vectors)"
    
    axes[idx].set_title(title, fontsize=11, fontweight='bold')
    axes[idx].set_xlabel('Feature 1', fontsize=10)
    axes[idx].set_ylabel('Feature 2', fontsize=10)
    axes[idx].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("Observation: Larger C → More complex boundary → More support vectors")

## 3. The Kernel Trick

### The Problem: Non-Linear Data

**Linear SVM fails when data is not linearly separable!**

Example:
```
    O O O
   O X X O
    O O O
```
No straight line can separate X's from O's!

### The Solution: Kernel Trick

**Key Idea**: Transform data to higher dimension where it becomes linearly separable!

**Magic**: SVM can do this efficiently without explicitly computing the transformation!

### Common Kernels

1. **Linear**: `kernel='linear'`
   - No transformation
   - Use when: Data is linearly separable

2. **RBF (Radial Basis Function)**: `kernel='rbf'` (default)
   - Gaussian kernel
   - Can model any shape
   - Most popular choice
   - Use when: Non-linear patterns

3. **Polynomial**: `kernel='poly'`
   - Polynomial transformation
   - Degree parameter
   - Use when: Polynomial relationships

4. **Sigmoid**: `kernel='sigmoid'`
   - Similar to neural network
   - Rarely used

### Gamma Parameter (for RBF kernel)

**Gamma**: Defines how far the influence of a single training point reaches

- **Small gamma** (e.g., 0.01):
  - Far reach
  - Smooth decision boundary
  - May underfit

- **Large gamma** (e.g., 10):
  - Close reach
  - Wiggly decision boundary
  - May overfit

**Default**: gamma='scale' (1 / (n_features × variance))

In [None]:
# Load non-linear dataset (moons)
moons_df = pd.read_csv('data/sample/moons_nonlinear.csv')

X_moons = moons_df[['feature_1', 'feature_2']].values
y_moons = moons_df['target'].values

# Scale features
scaler_moons = StandardScaler()
X_moons_scaled = scaler_moons.fit_transform(X_moons)

# Visualize
plt.figure(figsize=(8, 6))
plt.scatter(X_moons_scaled[y_moons==0, 0], X_moons_scaled[y_moons==0, 1],
           c='blue', marker='o', s=50, alpha=0.7, label='Class 0')
plt.scatter(X_moons_scaled[y_moons==1, 0], X_moons_scaled[y_moons==1, 1],
           c='red', marker='s', s=50, alpha=0.7, label='Class 1')
plt.xlabel('Feature 1 (scaled)', fontsize=12)
plt.ylabel('Feature 2 (scaled)', fontsize=12)
plt.title('Non-Linear Dataset (Moons)\nNo straight line can separate these!', 
         fontsize=13, fontweight='bold')
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("This dataset requires non-linear decision boundary!")

In [None]:
# Compare Linear vs RBF kernel
svm_linear_moons = SVC(kernel='linear', C=1, random_state=42)
svm_rbf_moons = SVC(kernel='rbf', C=1, gamma='scale', random_state=42)

svm_linear_moons.fit(X_moons_scaled, y_moons)
svm_rbf_moons.fit(X_moons_scaled, y_moons)

linear_score = svm_linear_moons.score(X_moons_scaled, y_moons)
rbf_score = svm_rbf_moons.score(X_moons_scaled, y_moons)

print("Performance on Non-Linear Data:\n")
print(f"Linear Kernel: {linear_score:.3f} accuracy")
print(f"RBF Kernel:    {rbf_score:.3f} accuracy")
print(f"\nRBF is much better! (+{(rbf_score-linear_score)*100:.1f} percentage points)")

In [None]:
# Visualize Linear vs RBF
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

models = [svm_linear_moons, svm_rbf_moons]
titles = [f'Linear Kernel\nAccuracy: {linear_score:.3f}', 
         f'RBF Kernel\nAccuracy: {rbf_score:.3f}']

for idx, (model, title) in enumerate(zip(models, titles)):
    # Create mesh
    x_min, x_max = X_moons_scaled[:, 0].min() - 0.5, X_moons_scaled[:, 0].max() + 0.5
    y_min, y_max = X_moons_scaled[:, 1].min() - 0.5, X_moons_scaled[:, 1].max() + 0.5
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 200),
                         np.linspace(y_min, y_max, 200))
    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    
    # Plot decision boundary
    axes[idx].contourf(xx, yy, Z, alpha=0.3, cmap='RdBu')
    axes[idx].scatter(X_moons_scaled[y_moons==0, 0], X_moons_scaled[y_moons==0, 1],
                     c='blue', marker='o', s=50, alpha=0.7, edgecolors='k', label='Class 0')
    axes[idx].scatter(X_moons_scaled[y_moons==1, 0], X_moons_scaled[y_moons==1, 1],
                     c='red', marker='s', s=50, alpha=0.7, edgecolors='k', label='Class 1')
    axes[idx].set_xlabel('Feature 1', fontsize=11)
    axes[idx].set_ylabel('Feature 2', fontsize=11)
    axes[idx].set_title(title, fontsize=12, fontweight='bold')
    axes[idx].legend(fontsize=10)
    axes[idx].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("RBF kernel captures the curved boundary perfectly!")

In [None]:
# Demonstrate effect of gamma parameter
fig, axes = plt.subplots(1, 3, figsize=(16, 5))
gamma_values = [0.1, 1, 10]

for idx, gamma in enumerate(gamma_values):
    model = SVC(kernel='rbf', C=1, gamma=gamma, random_state=42)
    model.fit(X_moons_scaled, y_moons)
    
    # Create mesh
    x_min, x_max = X_moons_scaled[:, 0].min() - 0.5, X_moons_scaled[:, 0].max() + 0.5
    y_min, y_max = X_moons_scaled[:, 1].min() - 0.5, X_moons_scaled[:, 1].max() + 0.5
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 200),
                         np.linspace(y_min, y_max, 200))
    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    
    # Plot
    axes[idx].contourf(xx, yy, Z, alpha=0.3, cmap='RdBu')
    axes[idx].scatter(X_moons_scaled[y_moons==0, 0], X_moons_scaled[y_moons==0, 1],
                     c='blue', marker='o', s=30, alpha=0.7, edgecolors='k')
    axes[idx].scatter(X_moons_scaled[y_moons==1, 0], X_moons_scaled[y_moons==1, 1],
                     c='red', marker='s', s=30, alpha=0.7, edgecolors='k')
    
    score = model.score(X_moons_scaled, y_moons)
    title = f"gamma = {gamma}\nAccuracy: {score:.3f}\n"
    if gamma == 0.1:
        title += "Smooth boundary"
    elif gamma == 1:
        title += "Balanced"
    else:
        title += "Overfitting!"
    
    axes[idx].set_title(title, fontsize=11, fontweight='bold')
    axes[idx].set_xlabel('Feature 1', fontsize=10)
    axes[idx].set_ylabel('Feature 2', fontsize=10)
    axes[idx].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("Observation: Large gamma → Complex boundary → Risk of overfitting")

## 4. SVM on Real Dataset

### Breast Cancer Classification

Let's apply SVM to a real medical dataset!

In [None]:
# Load breast cancer dataset
from sklearn.model_selection import cross_val_score, GridSearchCV
from sklearn.metrics import classification_report, confusion_matrix

cancer_df = pd.read_csv('data/sample/breast_cancer.csv')

X_cancer = cancer_df.drop(['target', 'diagnosis'], axis=1)
y_cancer = cancer_df['target']

# Split data
X_train_cancer, X_test_cancer, y_train_cancer, y_test_cancer = train_test_split(
    X_cancer, y_cancer, test_size=0.3, random_state=42, stratify=y_cancer
)

# Scale features (CRUCIAL for SVM!)
scaler_cancer = StandardScaler()
X_train_cancer_scaled = scaler_cancer.fit_transform(X_train_cancer)
X_test_cancer_scaled = scaler_cancer.transform(X_test_cancer)

print(f"Dataset shape: {X_cancer.shape}")
print(f"Class distribution:\n{y_cancer.value_counts()}")
print("\n0 = Malignant (cancer), 1 = Benign (no cancer)")

In [None]:
# Train SVM with default parameters
svm_cancer = SVC(kernel='rbf', random_state=42)
svm_cancer.fit(X_train_cancer_scaled, y_train_cancer)

# Evaluate
train_score_cancer = svm_cancer.score(X_train_cancer_scaled, y_train_cancer)
test_score_cancer = svm_cancer.score(X_test_cancer_scaled, y_test_cancer)

print("SVM with Default Parameters:")
print(f"Train accuracy: {train_score_cancer:.3f}")
print(f"Test accuracy:  {test_score_cancer:.3f}")

# Cross-validation
cv_scores = cross_val_score(svm_cancer, X_train_cancer_scaled, y_train_cancer, cv=5)
print(f"\nCross-validation: {cv_scores.mean():.3f} ± {cv_scores.std():.3f}")

In [None]:
# Hyperparameter tuning with GridSearchCV
param_grid = {
    'C': [0.1, 1, 10, 100],
    'gamma': ['scale', 'auto', 0.001, 0.01, 0.1, 1],
    'kernel': ['rbf', 'linear']
}

print("Performing Grid Search...")
print(f"Testing {np.prod([len(v) for v in param_grid.values()])} combinations\n")

grid_search = GridSearchCV(
    SVC(random_state=42),
    param_grid,
    cv=5,
    scoring='accuracy',
    n_jobs=-1,
    verbose=0
)

grid_search.fit(X_train_cancer_scaled, y_train_cancer)

print("Best hyperparameters:")
for param, value in grid_search.best_params_.items():
    print(f"  {param}: {value}")

print(f"\nBest cross-validation score: {grid_search.best_score_:.3f}")
print(f"Test score: {grid_search.score(X_test_cancer_scaled, y_test_cancer):.3f}")

In [None]:
# Detailed evaluation of best model
best_svm = grid_search.best_estimator_
y_pred_cancer = best_svm.predict(X_test_cancer_scaled)

print("Classification Report:\n")
print(classification_report(y_test_cancer, y_pred_cancer, 
                           target_names=['Malignant', 'Benign']))

# Confusion matrix
cm = confusion_matrix(y_test_cancer, y_pred_cancer)

plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', cbar=False,
           xticklabels=['Malignant', 'Benign'],
           yticklabels=['Malignant', 'Benign'])
plt.xlabel('Predicted', fontsize=12)
plt.ylabel('Actual', fontsize=12)
plt.title('Confusion Matrix - Optimized SVM', fontsize=13, fontweight='bold')
plt.tight_layout()
plt.show()

print(f"\nFalse Negatives (missed cancers): {cm[0, 1]}")
print(f"False Positives (false alarms): {cm[1, 0]}")

## 5. SVR (Support Vector Regression)

### SVM for Regression!

**SVR**: Support Vector Regression (same idea, different goal)

**Goal**: Fit as many points as possible within a margin (epsilon-tube)

### Key Differences

**SVC (Classification)**:
- Maximize margin between classes
- Output: Class labels

**SVR (Regression)**:
- Fit within epsilon margin
- Output: Continuous values

### Additional Parameter

**epsilon**: Width of margin (default=0.1)
- Points within epsilon: No penalty
- Points outside epsilon: Penalized

In [None]:
# SVR example on California housing
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error, r2_score

housing_df = pd.read_csv('data/sample/california_housing.csv')

X_housing = housing_df.drop('median_house_value', axis=1)
y_housing = housing_df['median_house_value']

# Split and scale
X_train_h, X_test_h, y_train_h, y_test_h = train_test_split(
    X_housing, y_housing, test_size=0.3, random_state=42
)

scaler_h = StandardScaler()
X_train_h_scaled = scaler_h.fit_transform(X_train_h)
X_test_h_scaled = scaler_h.transform(X_test_h)

# Train SVR
svr_model = SVR(kernel='rbf', C=100, gamma='scale', epsilon=0.1)
svr_model.fit(X_train_h_scaled, y_train_h)

# Evaluate
y_pred_h = svr_model.predict(X_test_h_scaled)

rmse = np.sqrt(mean_squared_error(y_test_h, y_pred_h))
r2 = r2_score(y_test_h, y_pred_h)

print("SVR Performance:")
print(f"RMSE: ${rmse:,.2f}")
print(f"R²:   {r2:.3f}")
print(f"\nPredictions are off by about ${rmse:,.0f} on average")

In [None]:
# Visualize predictions
plt.figure(figsize=(10, 6))
plt.scatter(y_test_h, y_pred_h, alpha=0.3, s=20)
plt.plot([y_test_h.min(), y_test_h.max()], 
        [y_test_h.min(), y_test_h.max()], 
        'r--', linewidth=2, label='Perfect predictions')
plt.xlabel('Actual House Value', fontsize=12)
plt.ylabel('Predicted House Value', fontsize=12)
plt.title(f'SVR Predictions\nR² = {r2:.3f}', fontsize=13, fontweight='bold')
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## Exercises

### Exercise 1: Kernel Comparison

Compare different kernels on the moons dataset:
1. Train SVM with kernels: linear, rbf, poly (degree=3), sigmoid
2. Use same C=1 for all
3. Evaluate with cross-validation (5-fold)
4. Create a bar chart comparing performance
5. Visualize decision boundaries for each kernel
6. Which kernel works best?

In [None]:
# Your code here


### Exercise 2: C and Gamma Interaction

Explore how C and gamma interact:
1. Use wine dataset
2. Create a grid: C = [0.1, 1, 10, 100], gamma = [0.001, 0.01, 0.1, 1]
3. For each combination, compute cross-validation score
4. Create a heatmap showing performance
5. What patterns do you observe?

In [None]:
# Your code here


### Exercise 3: Multiclass SVM

Apply SVM to iris dataset (3 classes):
1. Load iris dataset
2. Split into train/test (70/30)
3. Scale features
4. Use GridSearchCV to find best C and gamma
5. Train final model and evaluate
6. Create confusion matrix
7. Which classes are most confused?

In [None]:
# Your code here


### Exercise 4: SVR Hyperparameter Tuning

Optimize SVR for California housing:
1. Use RandomizedSearchCV (50 iterations)
2. Search over:
   - C: log-uniform from 0.1 to 1000
   - gamma: ['scale', 'auto'] + log-uniform from 0.0001 to 1
   - epsilon: uniform from 0.01 to 1
3. Use R² as scoring metric
4. Print best parameters
5. Compare with default SVR
6. How much improvement did you get?

In [None]:
# Your code here


## Summary

### Key Concepts

1. **SVM Intuition**:
   - Find decision boundary with **maximum margin**
   - Only **support vectors** matter (points closest to boundary)
   - Robust and generalizes well

2. **Linear SVM**:
   - Works when data is linearly separable
   - Simple and interpretable
   - Controlled by C parameter

3. **Kernel Trick**:
   - Transforms data to higher dimension
   - Enables non-linear decision boundaries
   - Computationally efficient (no explicit transformation)

4. **Common Kernels**:
   - **Linear**: For linearly separable data
   - **RBF**: Most versatile (default choice)
   - **Polynomial**: For polynomial relationships
   - **Sigmoid**: Rarely used

5. **Hyperparameters**:
   - **C**: Regularization (small C = large margin, large C = small margin)
   - **gamma**: RBF kernel width (small = smooth, large = wiggly)
   - Both affect overfitting/underfitting tradeoff

6. **SVC vs SVR**:
   - **SVC**: Classification (predict class labels)
   - **SVR**: Regression (predict continuous values)
   - Same principles, different objectives

7. **Best Practices**:
   - **Always scale features!** (SVM is sensitive to scale)
   - Start with RBF kernel (good default)
   - Use GridSearchCV or RandomizedSearchCV for tuning
   - Cross-validate to prevent overfitting
   - Consider computation time (SVM slow on large datasets)

### When to Use SVM

✓ **Good for**:
- Small to medium datasets
- High-dimensional data (many features)
- Non-linear patterns
- Clear margin of separation

✗ **Not ideal for**:
- Very large datasets (slow training)
- Many overlapping classes
- When interpretability is crucial

### What's Next?

In **Module 10: K-Nearest Neighbors**, you'll learn:
- Distance-based classification
- Different distance metrics (Euclidean, Manhattan)
- Choosing optimal K value
- Weighted vs uniform neighbors
- Impact of feature scaling
- Curse of dimensionality

### Additional Resources

- [SVM Explained - StatQuest](https://www.youtube.com/watch?v=efR1C6CvhmE)
- [Kernel Trick - Andrew Ng](https://www.youtube.com/watch?v=XUj5JbQihlU)
- [scikit-learn SVM Guide](https://scikit-learn.org/stable/modules/svm.html)
- [SVM Tutorial](https://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf)