# Support Vector Machines (SVM)

Linear and non-linear classification using support vector machines.

## Contents
1. Linear SVM (Support Vector Classifier)
2. Hyperparameter Tuning (Cost Parameter)
3. Non-Linear SVM (Radial Kernel)
4. ROC Curves
5. Multi-Class SVM
6. Application to Gene Expression Data

## Setup and Imports

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.metrics import confusion_matrix, classification_report, roc_curve, auc
from sklearn.preprocessing import StandardScaler
import warnings
warnings.filterwarnings('ignore')

plt.style.use('seaborn-v0_8-whitegrid')
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

## 1. Linear SVM (Support Vector Classifier)

Find the hyperplane that best separates two classes.

In [None]:
# Generate synthetic data
np.random.seed(1)

# Create 20 observations in 2 dimensions
X = np.random.randn(20, 2)
y = np.array([-1]*10 + [1]*10)

# Shift positive class slightly
X[y == 1] = X[y == 1] + 1

print(f"Dataset shape: {X.shape}")
print(f"Class distribution: {np.bincount(y + 1)}")

In [None]:
# Visualize data
plt.figure(figsize=(8, 6))
plt.scatter(X[y == -1, 0], X[y == -1, 1], c='red', s=100, marker='o', 
           edgecolors='black', label='Class -1')
plt.scatter(X[y == 1, 0], X[y == 1, 1], c='blue', s=100, marker='s', 
           edgecolors='black', label='Class +1')
plt.xlabel('X1', fontsize=12)
plt.ylabel('X2', fontsize=12)
plt.title('Training Data (not linearly separable)', fontsize=14)
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

In [None]:
# Fit Linear SVM with cost=10
svm_linear = SVC(kernel='linear', C=10)
svm_linear.fit(X, y)

print(f"Linear SVM (C=10):")
print(f"Number of support vectors: {len(svm_linear.support_)}")
print(f"Support vector indices: {svm_linear.support_}")
print(f"Training accuracy: {svm_linear.score(X, y):.4f}")

In [None]:
# Plot decision boundary
def plot_svm_decision_boundary(X, y, model, title=''):
    """
    Plot SVM decision boundary and support vectors.
    """
    # Create mesh
    h = 0.02  # step size
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))
    
    # Predict on mesh
    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    
    # Plot
    plt.contourf(xx, yy, Z, alpha=0.3, levels=[-1, 0, 1], colors=['red', 'blue'])
    plt.contour(xx, yy, Z, levels=[0], colors='black', linewidths=2)
    
    # Plot data points
    plt.scatter(X[y == -1, 0], X[y == -1, 1], c='red', s=100, marker='o',
               edgecolors='black', label='Class -1')
    plt.scatter(X[y == 1, 0], X[y == 1, 1], c='blue', s=100, marker='s',
               edgecolors='black', label='Class +1')
    
    # Highlight support vectors
    plt.scatter(X[model.support_, 0], X[model.support_, 1], 
               s=300, facecolors='none', edgecolors='yellow', linewidths=3,
               label='Support Vectors')
    
    plt.xlabel('X1', fontsize=12)
    plt.ylabel('X2', fontsize=12)
    plt.title(title, fontsize=14)
    plt.legend()
    plt.grid(True, alpha=0.3)

plt.figure(figsize=(10, 6))
plot_svm_decision_boundary(X, y, svm_linear, 'Linear SVM (C=10)')
plt.tight_layout()
plt.show()

print(f"Support vectors (yellow circles): {len(svm_linear.support_)} observations")

### Effect of Cost Parameter (C)

- **Large C**: Hard margin, fewer support vectors, more sensitive to outliers
- **Small C**: Soft margin, more support vectors, more robust

In [None]:
# Fit with smaller cost (C=0.1)
svm_small_c = SVC(kernel='linear', C=0.1)
svm_small_c.fit(X, y)

print(f"Linear SVM (C=0.1):")
print(f"Number of support vectors: {len(svm_small_c.support_)}")
print(f"Support vector indices: {svm_small_c.support_}")
print(f"Training accuracy: {svm_small_c.score(X, y):.4f}")

print(f"\nWith smaller C, number of support vectors increases: {len(svm_linear.support_)} → {len(svm_small_c.support_)}")

In [None]:
# Compare decision boundaries
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

plt.sca(axes[0])
plot_svm_decision_boundary(X, y, svm_linear, 'Linear SVM (C=10, hard margin)')

plt.sca(axes[1])
plot_svm_decision_boundary(X, y, svm_small_c, 'Linear SVM (C=0.1, soft margin)')

plt.tight_layout()
plt.show()

print("Observation: Smaller C creates wider margin with more support vectors")

## 2. Hyperparameter Tuning with Cross-Validation

In [None]:
# Grid search for best C
np.random.seed(1)

param_grid = {'C': [0.001, 0.01, 0.1, 1, 5, 10, 100]}

grid_search = GridSearchCV(SVC(kernel='linear'), param_grid, cv=10, 
                          scoring='accuracy', return_train_score=True)
grid_search.fit(X, y)

print("Grid Search Results:")
print(f"\nBest C: {grid_search.best_params_['C']}")
print(f"Best CV accuracy: {grid_search.best_score_:.4f}")

In [None]:
# Plot CV results
results = pd.DataFrame(grid_search.cv_results_)

plt.figure(figsize=(10, 6))
plt.semilogx(results['param_C'], results['mean_train_score'], 
            marker='o', label='Train', linewidth=2, markersize=8)
plt.semilogx(results['param_C'], results['mean_test_score'], 
            marker='s', label='CV (10-fold)', linewidth=2, markersize=8)
plt.xlabel('Cost (C)', fontsize=12)
plt.ylabel('Accuracy', fontsize=12)
plt.title('Cross-Validation: Accuracy vs Cost', fontsize=14)
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print(f"\nBest model has C = {grid_search.best_params_['C']}")

### Test Set Evaluation

In [None]:
# Generate test data
np.random.seed(2)
X_test = np.random.randn(20, 2)
y_test = np.random.choice([-1, 1], 20)
X_test[y_test == 1] = X_test[y_test == 1] + 1

# Predict with best model
best_model = grid_search.best_estimator_
y_pred = best_model.predict(X_test)

# Confusion matrix
cm = confusion_matrix(y_test, y_pred)
accuracy = (y_test == y_pred).mean()

print("Test Set Evaluation:")
print("\nConfusion Matrix:")
print(pd.DataFrame(cm,
                   index=['Actual: -1', 'Actual: +1'],
                   columns=['Predicted: -1', 'Predicted: +1']))
print(f"\nTest Accuracy: {accuracy:.4f} ({accuracy*100:.1f}%)")

## 3. Non-Linear SVM (Radial Basis Function Kernel)

Use RBF kernel for non-linearly separable data.

In [None]:
# Generate non-linearly separable data
np.random.seed(1)

X_nl = np.random.randn(200, 2)
X_nl[:100] = X_nl[:100] + 2  # Shift first 100 points
X_nl[100:150] = X_nl[100:150] - 2  # Shift next 50 points

y_nl = np.array([1]*150 + [2]*50)

print(f"Dataset shape: {X_nl.shape}")
print(f"Class distribution: Class 1: {(y_nl == 1).sum()}, Class 2: {(y_nl == 2).sum()}")

In [None]:
# Visualize non-linear data
plt.figure(figsize=(10, 6))
plt.scatter(X_nl[y_nl == 1, 0], X_nl[y_nl == 1, 1], 
           c='red', s=50, alpha=0.6, label='Class 1')
plt.scatter(X_nl[y_nl == 2, 0], X_nl[y_nl == 2, 1], 
           c='blue', s=50, alpha=0.6, label='Class 2')
plt.xlabel('X1', fontsize=12)
plt.ylabel('X2', fontsize=12)
plt.title('Non-Linearly Separable Data', fontsize=14)
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("Data is NOT linearly separable - need non-linear kernel!")

In [None]:
# Train/test split
np.random.seed(1)
train_idx = np.random.choice(200, 100, replace=False)
test_idx = np.array([i for i in range(200) if i not in train_idx])

X_train_nl = X_nl[train_idx]
y_train_nl = y_nl[train_idx]
X_test_nl = X_nl[test_idx]
y_test_nl = y_nl[test_idx]

print(f"Training set: {len(X_train_nl)}")
print(f"Test set: {len(X_test_nl)}")

In [None]:
# Fit RBF kernel SVM with gamma=1, C=1
svm_rbf = SVC(kernel='rbf', gamma=1, C=1)
svm_rbf.fit(X_train_nl, y_train_nl)

print(f"RBF SVM (gamma=1, C=1):")
print(f"Number of support vectors: {len(svm_rbf.support_)}")
print(f"Training accuracy: {svm_rbf.score(X_train_nl, y_train_nl):.4f}")
print(f"Test accuracy: {svm_rbf.score(X_test_nl, y_test_nl):.4f}")

In [None]:
# Visualize RBF decision boundary
plt.figure(figsize=(10, 6))
plot_svm_decision_boundary(X_train_nl, y_train_nl, svm_rbf, 
                          'RBF SVM (gamma=1, C=1) - Training Data')
plt.tight_layout()
plt.show()

print("Non-linear decision boundary captures circular pattern")

### Effect of Cost on Non-Linear SVM

In [None]:
# Fit with very large C (overfitting)
svm_rbf_large_c = SVC(kernel='rbf', gamma=1, C=1000)
svm_rbf_large_c.fit(X_train_nl, y_train_nl)

print(f"RBF SVM (gamma=1, C=1000):")
print(f"Number of support vectors: {len(svm_rbf_large_c.support_)}")
print(f"Training accuracy: {svm_rbf_large_c.score(X_train_nl, y_train_nl):.4f}")
print(f"Test accuracy: {svm_rbf_large_c.score(X_test_nl, y_test_nl):.4f}")

print(f"\nWith larger C: fewer support vectors ({len(svm_rbf_large_c.support_)} vs {len(svm_rbf.support_)})")
print(f"Training accuracy improves, but may overfit")

In [None]:
# Compare decision boundaries
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

plt.sca(axes[0])
plot_svm_decision_boundary(X_train_nl, y_train_nl, svm_rbf, 
                          'RBF SVM (C=1, flexible)')

plt.sca(axes[1])
plot_svm_decision_boundary(X_train_nl, y_train_nl, svm_rbf_large_c, 
                          'RBF SVM (C=1000, very flexible)')

plt.tight_layout()
plt.show()

print("Large C creates more complex boundary (risk of overfitting)")

### Tuning Both C and Gamma

In [None]:
# Grid search over C and gamma
np.random.seed(1)

param_grid_rbf = {
    'C': [0.1, 1, 10, 100, 1000],
    'gamma': [0.5, 1, 2, 3, 4]
}

grid_search_rbf = GridSearchCV(SVC(kernel='rbf'), param_grid_rbf, cv=10,
                              scoring='accuracy', return_train_score=True)
grid_search_rbf.fit(X_train_nl, y_train_nl)

print("Grid Search Results (RBF Kernel):")
print(f"\nBest parameters: C={grid_search_rbf.best_params_['C']}, gamma={grid_search_rbf.best_params_['gamma']}")
print(f"Best CV accuracy: {grid_search_rbf.best_score_:.4f}")

# Test accuracy
best_rbf = grid_search_rbf.best_estimator_
test_acc_rbf = best_rbf.score(X_test_nl, y_test_nl)
print(f"Test accuracy: {test_acc_rbf:.4f}")

In [None]:
# Heatmap of CV accuracy
results_rbf = pd.DataFrame(grid_search_rbf.cv_results_)
pivot_table = results_rbf.pivot_table(values='mean_test_score', 
                                     index='param_gamma', 
                                     columns='param_C')

plt.figure(figsize=(10, 6))
sns.heatmap(pivot_table, annot=True, fmt='.3f', cmap='viridis', 
           cbar_kws={'label': 'CV Accuracy'})
plt.xlabel('C', fontsize=12)
plt.ylabel('Gamma', fontsize=12)
plt.title('Grid Search: CV Accuracy for Different C and Gamma', fontsize=14)
plt.tight_layout()
plt.show()

In [None]:
# Confusion matrix on test set
y_pred_rbf = best_rbf.predict(X_test_nl)
cm_rbf = confusion_matrix(y_test_nl, y_pred_rbf)

print("\nConfusion Matrix (Test Set):")
print(pd.DataFrame(cm_rbf,
                   index=['Actual: 1', 'Actual: 2'],
                   columns=['Predicted: 1', 'Predicted: 2']))
print(f"\nTest Accuracy: {test_acc_rbf:.4f}")

## 4. ROC Curves

Evaluate classifier performance across different thresholds.

In [None]:
# Fit two models with different gamma
svm_opt = SVC(kernel='rbf', gamma=2, C=1, probability=True)
svm_opt.fit(X_train_nl, y_train_nl)

svm_flex = SVC(kernel='rbf', gamma=50, C=1, probability=True)
svm_flex.fit(X_train_nl, y_train_nl)

print(f"Model 1 (gamma=2): Train acc = {svm_opt.score(X_train_nl, y_train_nl):.4f}")
print(f"Model 2 (gamma=50): Train acc = {svm_flex.score(X_train_nl, y_train_nl):.4f}")

In [None]:
# Get decision function scores (distance from decision boundary)
train_scores_opt = svm_opt.decision_function(X_train_nl)
train_scores_flex = svm_flex.decision_function(X_train_nl)

test_scores_opt = svm_opt.decision_function(X_test_nl)
test_scores_flex = svm_flex.decision_function(X_test_nl)

# Compute ROC curves
fpr_train_opt, tpr_train_opt, _ = roc_curve(y_train_nl, train_scores_opt, pos_label=2)
fpr_train_flex, tpr_train_flex, _ = roc_curve(y_train_nl, train_scores_flex, pos_label=2)

fpr_test_opt, tpr_test_opt, _ = roc_curve(y_test_nl, test_scores_opt, pos_label=2)
fpr_test_flex, tpr_test_flex, _ = roc_curve(y_test_nl, test_scores_flex, pos_label=2)

# Calculate AUC
auc_train_opt = auc(fpr_train_opt, tpr_train_opt)
auc_train_flex = auc(fpr_train_flex, tpr_train_flex)
auc_test_opt = auc(fpr_test_opt, tpr_test_opt)
auc_test_flex = auc(fpr_test_flex, tpr_test_flex)

print(f"\nAUC Scores:")
print(f"  Train - gamma=2: {auc_train_opt:.4f}")
print(f"  Train - gamma=50: {auc_train_flex:.4f}")
print(f"  Test - gamma=2: {auc_test_opt:.4f}")
print(f"  Test - gamma=50: {auc_test_flex:.4f}")

In [None]:
# Plot ROC curves
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Training ROC
axes[0].plot(fpr_train_opt, tpr_train_opt, 'b-', linewidth=2, 
            label=f'gamma=2 (AUC={auc_train_opt:.3f})')
axes[0].plot(fpr_train_flex, tpr_train_flex, 'r-', linewidth=2, 
            label=f'gamma=50 (AUC={auc_train_flex:.3f})')
axes[0].plot([0, 1], [0, 1], 'k--', linewidth=1, label='Random')
axes[0].set_xlabel('False Positive Rate', fontsize=12)
axes[0].set_ylabel('True Positive Rate', fontsize=12)
axes[0].set_title('ROC Curve - Training Data', fontsize=14)
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Test ROC
axes[1].plot(fpr_test_opt, tpr_test_opt, 'b-', linewidth=2, 
            label=f'gamma=2 (AUC={auc_test_opt:.3f})')
axes[1].plot(fpr_test_flex, tpr_test_flex, 'r-', linewidth=2, 
            label=f'gamma=50 (AUC={auc_test_flex:.3f})')
axes[1].plot([0, 1], [0, 1], 'k--', linewidth=1, label='Random')
axes[1].set_xlabel('False Positive Rate', fontsize=12)
axes[1].set_ylabel('True Positive Rate', fontsize=12)
axes[1].set_title('ROC Curve - Test Data', fontsize=14)
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\nObservation: gamma=2 generalizes better (higher test AUC)")
print("gamma=50 overfits (high train AUC, lower test AUC)")

## 5. Multi-Class SVM

SVM can handle more than 2 classes using one-vs-one or one-vs-rest.

In [None]:
# Add third class to data
np.random.seed(1)

# Generate new class
X_class3 = np.random.randn(50, 2)
X_class3[:, 1] = X_class3[:, 1] + 2

# Combine with existing data
X_multi = np.vstack([X_nl, X_class3])
y_multi = np.concatenate([y_nl, np.array([0]*50)])

print(f"Multi-class dataset shape: {X_multi.shape}")
print(f"Class distribution:")
for cls in np.unique(y_multi):
    print(f"  Class {cls}: {(y_multi == cls).sum()}")

In [None]:
# Visualize multi-class data
plt.figure(figsize=(10, 6))
colors = ['green', 'red', 'blue']
for cls, color in zip([0, 1, 2], colors):
    mask = y_multi == cls
    plt.scatter(X_multi[mask, 0], X_multi[mask, 1], 
               c=color, s=50, alpha=0.6, label=f'Class {cls}')
plt.xlabel('X1', fontsize=12)
plt.ylabel('X2', fontsize=12)
plt.title('Multi-Class Data (3 classes)', fontsize=14)
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

In [None]:
# Fit multi-class SVM
svm_multi = SVC(kernel='rbf', C=10, gamma=1)
svm_multi.fit(X_multi, y_multi)

print(f"Multi-Class SVM:")
print(f"Number of support vectors: {len(svm_multi.support_)}")
print(f"Support vectors per class: {svm_multi.n_support_}")
print(f"Training accuracy: {svm_multi.score(X_multi, y_multi):.4f}")

In [None]:
# Visualize multi-class decision boundaries
h = 0.02
x_min, x_max = X_multi[:, 0].min() - 1, X_multi[:, 0].max() + 1
y_min, y_max = X_multi[:, 1].min() - 1, X_multi[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                     np.arange(y_min, y_max, h))

Z = svm_multi.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.figure(figsize=(10, 6))
plt.contourf(xx, yy, Z, alpha=0.3, levels=[0, 1, 2, 3], 
            colors=['green', 'red', 'blue'])

for cls, color in zip([0, 1, 2], colors):
    mask = y_multi == cls
    plt.scatter(X_multi[mask, 0], X_multi[mask, 1], 
               c=color, s=50, edgecolors='black', label=f'Class {cls}')

plt.scatter(X_multi[svm_multi.support_, 0], X_multi[svm_multi.support_, 1],
           s=300, facecolors='none', edgecolors='yellow', linewidths=3,
           label='Support Vectors')

plt.xlabel('X1', fontsize=12)
plt.ylabel('X2', fontsize=12)
plt.title('Multi-Class SVM Decision Boundaries', fontsize=14)
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## 6. Application to Gene Expression Data

High-dimensional data: many features (genes), few samples.

In [None]:
# Load Khan dataset (or create synthetic high-dimensional data)
# Khan dataset has ~2000 genes (features), ~80 samples

try:
    # Try loading from sklearn datasets (if available)
    from sklearn.datasets import load_svmlight_file
    print("Note: Using synthetic high-dimensional data")
    print("Real Khan dataset requires specific download")
except:
    pass

# Create synthetic gene expression data
np.random.seed(1)

# Training data: 63 samples, 2308 genes
n_train = 63
n_features = 2308
X_train_gene = np.random.randn(n_train, n_features) * 0.5

# 4 cancer types (0, 1, 2, 3)
y_train_gene = np.array([0]*8 + [1]*23 + [2]*12 + [3]*20)

# Add signal to features for different classes
for cls in range(4):
    mask = y_train_gene == cls
    X_train_gene[mask, cls*500:(cls+1)*500] += np.random.randn(mask.sum(), 500) * 2

# Test data: 20 samples
n_test = 20
X_test_gene = np.random.randn(n_test, n_features) * 0.5
y_test_gene = np.array([0]*3 + [1]*6 + [2]*6 + [3]*5)

for cls in range(4):
    mask = y_test_gene == cls
    X_test_gene[mask, cls*500:(cls+1)*500] += np.random.randn(mask.sum(), 500) * 2

print(f"Gene Expression Data:")
print(f"Training: {X_train_gene.shape} ({n_train} samples, {n_features} genes)")
print(f"Test: {X_test_gene.shape}")
print(f"\nTraining class distribution: {np.bincount(y_train_gene)}")
print(f"Test class distribution: {np.bincount(y_test_gene)}")

In [None]:
# Fit linear SVM (linear kernel works well for high-dimensional data)
svm_gene = SVC(kernel='linear', C=10)
svm_gene.fit(X_train_gene, y_train_gene)

print(f"Linear SVM on Gene Expression Data:")
print(f"Number of support vectors: {len(svm_gene.support_)}")
print(f"Training accuracy: {svm_gene.score(X_train_gene, y_train_gene):.4f}")

In [None]:
# Training confusion matrix
y_pred_train_gene = svm_gene.predict(X_train_gene)
cm_train_gene = confusion_matrix(y_train_gene, y_pred_train_gene)

print("Training Confusion Matrix:")
print(pd.DataFrame(cm_train_gene,
                   index=[f'Actual: {i}' for i in range(4)],
                   columns=[f'Predicted: {i}' for i in range(4)]))
print(f"\nTraining errors: {(y_train_gene != y_pred_train_gene).sum()}")

In [None]:
# Test set evaluation
y_pred_test_gene = svm_gene.predict(X_test_gene)
cm_test_gene = confusion_matrix(y_test_gene, y_pred_test_gene)
test_acc_gene = (y_test_gene == y_pred_test_gene).mean()

print("Test Confusion Matrix:")
print(pd.DataFrame(cm_test_gene,
                   index=[f'Actual: {i}' for i in range(4)],
                   columns=[f'Predicted: {i}' for i in range(4)]))
print(f"\nTest accuracy: {test_acc_gene:.4f}")
print(f"Test errors: {(y_test_gene != y_pred_test_gene).sum()}")

In [None]:
# Visualize confusion matrix
plt.figure(figsize=(10, 8))
sns.heatmap(cm_test_gene, annot=True, fmt='d', cmap='Blues', 
           xticklabels=[f'Class {i}' for i in range(4)],
           yticklabels=[f'Class {i}' for i in range(4)],
           cbar_kws={'label': 'Count'})
plt.ylabel('True Label', fontsize=12)
plt.xlabel('Predicted Label', fontsize=12)
plt.title('Gene Expression: Test Set Confusion Matrix', fontsize=14)
plt.tight_layout()
plt.show()

print(f"\nKey Insight: Linear SVM works very well for high-dimensional data!")
print(f"With p >> n (features >> samples), data is often linearly separable")

## Summary

This notebook covered:

### **Linear SVM (Support Vector Classifier)**
- Find maximum margin hyperplane
- **Support vectors**: Points on or inside margin
- **Cost (C)**: Trade-off between margin width and violations
  - Large C: Hard margin, few support vectors
  - Small C: Soft margin, more support vectors

### **Non-Linear SVM (Kernel Trick)**
- **RBF Kernel**: K(x, x') = exp(-γ||x - x'||²)
- **Gamma (γ)**: Controls decision boundary complexity
  - Small γ: Smooth boundary (underfit)
  - Large γ: Wiggly boundary (overfit)
- **Polynomial, sigmoid** kernels also available

### **Hyperparameter Tuning**
- Use **cross-validation** to select C and γ
- Grid search over parameter space
- Balance training and test performance

### **Multi-Class SVM**
- **One-vs-One**: K(K-1)/2 binary classifiers
- **One-vs-Rest**: K binary classifiers
- sklearn uses one-vs-one by default

### **High-Dimensional Data**
- When p >> n, **linear kernel** often sufficient
- Data becomes linearly separable in high dimensions
- Avoid overfitting with proper C selection

### **Key Takeaways**
- SVMs find optimal decision boundary (maximum margin)
- Kernel trick enables non-linear boundaries
- Tuning C and γ is crucial
- Works well for high-dimensional data
- Robust to outliers (soft margin)
- **Less interpretable** than trees/linear regression
- **Computationally expensive** for large n