# Exercise-3: Support Vector Machines (SVM) [30 points]

This notebook covers the implementation and analysis of Support Vector Machine algorithms for machine learning tasks.

## Learning Objectives
- Understanding SVM algorithms and their mathematical foundations
- Implementing SVMs for classification and regression
- Exploring different kernel functions and their effects
- Hyperparameter tuning for optimal performance
- Analyzing decision boundaries and support vectors
- Handling non-linearly separable data

## Instructions
Complete the exercises below by implementing the required code in the designated cells.

## 1. Import Required Libraries

Import the necessary libraries for SVM implementation and analysis.

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.svm import SVC, SVR
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.pipeline import Pipeline
from mlxtend.plotting import plot_decision_regions
import warnings
warnings.filterwarnings('ignore')

# Set random seed for reproducibility
np.random.seed(42)

# Configure plotting
plt.style.use('default')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (12, 8)

## 2. Load Preprocessed Data

Load the data that was prepared in Exercise-1.

In [None]:
# TODO: Load the preprocessed data from Exercise-1
# Example:
# X_train = pd.read_csv('../data/X_train_processed.csv')
# X_test = pd.read_csv('../data/X_test_processed.csv')
# y_train = pd.read_csv('../data/y_train.csv').squeeze()
# y_test = pd.read_csv('../data/y_test.csv').squeeze()

# TODO: Display the shape of the datasets
# Ensure data is properly scaled for SVM (SVMs are sensitive to feature scaling)

## 3. Data Scaling for SVM

Ensure data is properly scaled since SVMs are sensitive to feature scaling.

In [None]:
# TODO: Scale the features if not already done
# scaler = StandardScaler()
# X_train_scaled = scaler.fit_transform(X_train)
# X_test_scaled = scaler.transform(X_test)

# TODO: Convert back to DataFrame for easier handling
# X_train_scaled = pd.DataFrame(X_train_scaled, columns=X_train.columns)
# X_test_scaled = pd.DataFrame(X_test_scaled, columns=X_test.columns)

## 4. Linear SVM Implementation

Start with a linear SVM to establish baseline performance.

In [None]:
# TODO: Create a linear SVM classifier/regressor
# linear_svm = SVC(kernel='linear', random_state=42)  # or SVR for regression

# TODO: Train the model
# linear_svm.fit(X_train_scaled, y_train)

# TODO: Make predictions
# y_pred_linear = linear_svm.predict(X_test_scaled)

## 5. Linear SVM Evaluation

Evaluate the performance of the linear SVM model.

In [None]:
# TODO: Calculate and display evaluation metrics
# For classification: accuracy, precision, recall, F1-score
# For regression: MSE, RMSE, R²

# TODO: Display confusion matrix (for classification)
# TODO: Analyze support vectors
# print(f"Number of support vectors: {linear_svm.n_support_}")
# print(f"Support vector indices: {linear_svm.support_[:10]}...")  # First 10

## 6. Kernel SVM Implementation

Implement SVMs with different kernel functions.

In [None]:
# TODO: Implement SVMs with different kernels
kernels = ['linear', 'poly', 'rbf', 'sigmoid']
svm_models = {}
svm_predictions = {}
svm_scores = {}

# TODO: Train models with each kernel
# for kernel in kernels:
#     svm = SVC(kernel=kernel, random_state=42)
#     svm.fit(X_train_scaled, y_train)
#     
#     predictions = svm.predict(X_test_scaled)
#     score = svm.score(X_test_scaled, y_test)
#     
#     svm_models[kernel] = svm
#     svm_predictions[kernel] = predictions
#     svm_scores[kernel] = score
#     
#     print(f"{kernel.upper()} kernel - Test score: {score:.4f}")

## 7. Kernel Comparison and Analysis

Compare the performance of different kernel functions.

In [None]:
# TODO: Create a comparison chart of kernel performances
# kernel_comparison = pd.DataFrame({
#     'Kernel': list(svm_scores.keys()),
#     'Test Score': list(svm_scores.values())
# })

# plt.figure(figsize=(10, 6))
# sns.barplot(data=kernel_comparison, x='Kernel', y='Test Score')
# plt.title('SVM Performance Comparison by Kernel Type')
# plt.ylabel('Test Score')
# plt.show()

## 8. Hyperparameter Tuning

Optimize SVM hyperparameters for the best performing kernel.

In [None]:
# TODO: Define hyperparameter grid for the best kernel
# For RBF kernel example:
# param_grid = {
#     'C': [0.1, 1, 10, 100, 1000],
#     'gamma': ['scale', 'auto', 0.001, 0.01, 0.1, 1],
#     'kernel': ['rbf']
# }

# TODO: Perform GridSearchCV
# grid_search = GridSearchCV(
#     SVC(random_state=42),
#     param_grid,
#     cv=5,
#     scoring='accuracy',  # or appropriate metric
#     n_jobs=-1,
#     verbose=1
# )
# grid_search.fit(X_train_scaled, y_train)

# TODO: Display best parameters and score
# print("Best parameters:", grid_search.best_params_)
# print("Best cross-validation score:", grid_search.best_score_)

## 9. Optimized SVM Evaluation

Evaluate the performance of the optimized SVM model.

In [None]:
# TODO: Get the best model from GridSearch
# best_svm = grid_search.best_estimator_

# TODO: Make predictions with the optimized model
# y_pred_optimized = best_svm.predict(X_test_scaled)

# TODO: Evaluate the optimized model
# Compare with previous models

## 10. Decision Boundary Visualization

Visualize decision boundaries for different SVM configurations (for 2D data).

In [None]:
# TODO: If dataset has many features, select 2 most important features for visualization
# For demonstration purposes, you can use PCA to reduce to 2D
# from sklearn.decomposition import PCA

# pca = PCA(n_components=2)
# X_train_2d = pca.fit_transform(X_train_scaled)
# X_test_2d = pca.transform(X_test_scaled)

# TODO: Train SVM on 2D data and visualize decision boundaries
# for kernel in ['linear', 'rbf']:
#     svm_2d = SVC(kernel=kernel, random_state=42)
#     svm_2d.fit(X_train_2d, y_train)
#     
#     plt.figure(figsize=(10, 8))
#     plot_decision_regions(X_train_2d, y_train.values, clf=svm_2d, legend=2)
#     plt.title(f'SVM Decision Boundary - {kernel.upper()} Kernel')
#     plt.xlabel('First Principal Component')
#     plt.ylabel('Second Principal Component')
#     plt.show()

## 11. Support Vector Analysis

Analyze the support vectors and their importance.

In [None]:
# TODO: Analyze support vectors from the best model
# print(f"Number of support vectors: {best_svm.n_support_}")
# print(f"Total support vectors: {len(best_svm.support_)}")
# print(f"Percentage of support vectors: {len(best_svm.support_) / len(X_train_scaled) * 100:.2f}%")

# TODO: Examine the characteristics of support vectors
# support_vectors = X_train_scaled.iloc[best_svm.support_]
# print("\nSupport vector statistics:")
# print(support_vectors.describe())

## 12. Regularization Parameter (C) Analysis

Analyze the effect of the regularization parameter C on model performance.

In [None]:
# TODO: Test different C values and plot the effect
# c_values = [0.001, 0.01, 0.1, 1, 10, 100, 1000]
# train_scores = []
# test_scores = []
# n_support_vectors = []

# for c in c_values:
#     svm = SVC(C=c, kernel='rbf', random_state=42)
#     svm.fit(X_train_scaled, y_train)
#     
#     train_score = svm.score(X_train_scaled, y_train)
#     test_score = svm.score(X_test_scaled, y_test)
#     n_sv = len(svm.support_)
#     
#     train_scores.append(train_score)
#     test_scores.append(test_score)
#     n_support_vectors.append(n_sv)

# TODO: Plot the results
# fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))

# ax1.semilogx(c_values, train_scores, 'o-', label='Training Score')
# ax1.semilogx(c_values, test_scores, 'o-', label='Test Score')
# ax1.set_xlabel('C (Regularization Parameter)')
# ax1.set_ylabel('Score')
# ax1.set_title('SVM Performance vs Regularization Parameter')
# ax1.legend()
# ax1.grid(True)

# ax2.semilogx(c_values, n_support_vectors, 'ro-')
# ax2.set_xlabel('C (Regularization Parameter)')
# ax2.set_ylabel('Number of Support Vectors')
# ax2.set_title('Support Vectors vs Regularization Parameter')
# ax2.grid(True)

# plt.tight_layout()
# plt.show()

## 13. Cross-Validation Analysis

Perform cross-validation to get robust performance estimates.

In [None]:
# TODO: Perform k-fold cross-validation with the best model
# cv_scores = cross_val_score(
#     best_svm,
#     X_train_scaled,
#     y_train,
#     cv=5,
#     scoring='accuracy'  # or appropriate metric
# )

# TODO: Display cross-validation results
# print(f"Cross-validation scores: {cv_scores}")
# print(f"Mean CV score: {cv_scores.mean():.3f} (+/- {cv_scores.std() * 2:.3f})")

# TODO: Compare with other models from previous exercises

## 14. Feature Importance Analysis

Analyze feature importance in SVM (using permutation importance or coefficients for linear SVM).

In [None]:
# TODO: For linear SVM, analyze feature coefficients
# if best_svm.kernel == 'linear':
#     feature_importance = np.abs(best_svm.coef_[0])
#     importance_df = pd.DataFrame({
#         'feature': X_train_scaled.columns,
#         'importance': feature_importance
#     }).sort_values('importance', ascending=False)
#     
#     plt.figure(figsize=(10, 6))
#     sns.barplot(data=importance_df.head(10), x='importance', y='feature')
#     plt.title('Linear SVM Feature Coefficients (Absolute Values)')
#     plt.show()

# TODO: For non-linear kernels, use permutation importance
# from sklearn.inspection import permutation_importance
# perm_importance = permutation_importance(best_svm, X_test_scaled, y_test, n_repeats=10, random_state=42)
# importance_df = pd.DataFrame({
#     'feature': X_train_scaled.columns,
#     'importance': perm_importance.importances_mean
# }).sort_values('importance', ascending=False)
# 
# plt.figure(figsize=(10, 6))
# sns.barplot(data=importance_df.head(10), x='importance', y='feature')
# plt.title('SVM Permutation Feature Importance')
# plt.show()

## 15. Model Comparison Summary

Compare SVM with the Decision Tree model from Exercise-2.

In [None]:
# TODO: Create a comparison table of SVM vs Decision Tree
# Include metrics like accuracy, training time, prediction time, interpretability

# TODO: Discuss the trade-offs between the two approaches

## Summary and Conclusions

Summarize the findings from the SVM analysis.

## Reflection Questions

1. How do different kernel functions affect SVM performance on your dataset?
2. What is the relationship between the regularization parameter C and model complexity?
3. How does the number of support vectors relate to model generalization?
4. When would you choose SVM over Decision Trees and vice versa?
5. How does feature scaling impact SVM performance compared to Decision Trees?
6. What are the computational considerations when using different SVM kernels?

**TODO: Answer the reflection questions above in markdown cells below.**