# Task 2.4: Support Vector Machine (SVM) Implementation

## Objective

The goal of this task is to implement Support Vector Machine (SVM) classifiers with both RBF (Radial Basis Function) and Linear kernels. SVM is a powerful algorithm that works by finding the optimal hyperplane that separates different classes with maximum margin. We will tune the C and gamma parameters and compare performance with tree-based models (Random Forest and XGBoost).

## Understanding SVM

**What is SVM?**

Support Vector Machine is a supervised learning algorithm that:
1. **Finds the best boundary (hyperplane)** between classes
2. **Maximizes the margin** - the distance between the boundary and nearest data points
3. **Uses support vectors** - the critical data points closest to the boundary
4. **Can handle non-linear patterns** using kernel tricks

**Think of it like this:**
- Imagine drawing a line to separate two groups of points
- SVM finds the line that's as far as possible from both groups
- The closest points to this line are the "support vectors"
- For complex patterns, SVM uses "kernels" to transform the data

## SVM vs Tree-Based Models

| Aspect | SVM | Random Forest | XGBoost |
|--------|-----|---------------|----------|
| Learning Method | Margin maximization | Ensemble of trees | Sequential boosting |
| Decision Boundary | Smooth hyperplane | Rectangular regions | Rectangular regions |
| Feature Scaling | **Required** | Not required | Not required |
| Interpretability | Low | Medium (feature importance) | Medium (feature importance) |
| Training Speed | Slow (large datasets) | Fast | Medium |
| High-Dimensional Data | Excellent | Good | Good |
| Non-Linear Patterns | Excellent (with kernels) | Good | Excellent |

## Why Use SVM?

1. **Effective in high-dimensional spaces** - Works well when you have many features
2. **Memory efficient** - Only uses support vectors (subset of training data)
3. **Versatile** - Different kernel functions for different decision boundaries
4. **Robust to overfitting** - Especially in high-dimensional space with proper regularization
5. **Works well with clear margin of separation** - Excellent when classes are well-separated

## Understanding Kernels

**What is a Kernel?**

A kernel is a mathematical function that transforms data into a higher dimension where it becomes linearly separable.

**Linear Kernel:**
- No transformation, works in original space
- Best for linearly separable data
- Faster training, simpler model
- Formula: K(x, y) = x · y

**RBF (Radial Basis Function) Kernel:**
- Transforms data into infinite dimensions
- Can handle complex, non-linear patterns
- Most popular kernel for general use
- Formula: K(x, y) = exp(-gamma * ||x - y||²)
- Creates circular/spherical decision boundaries

## Step 1: Environment Setup and Data Loading

In this step, we import the required libraries and load the preprocessed dataset.

### Libraries Used:
- **pandas & numpy:** For data manipulation and numerical operations
- **SVC:** Support Vector Classifier from sklearn
- **sklearn.metrics:** For evaluating model performance
- **LabelEncoder:** To convert categorical target labels to numeric format
- **pickle:** For saving the trained models
- **time:** To measure training time (SVM can be slow)

### Why Scaled Data is CRITICAL for SVM:

Unlike tree-based models, **SVM is extremely sensitive to feature scales** because:
1. **Distance-based algorithm:** SVM uses distances between points
2. **Features with larger scales dominate:** A feature ranging 0-1000 will dominate one ranging 0-1
3. **Affects kernel calculations:** RBF kernel uses distances, so scale matters
4. **Impacts convergence:** Unscaled data makes training slower and less accurate

**Example:**
- Feature A: price (range 50-500)
- Feature B: number of bedrooms (range 1-5)
- Without scaling, price dominates the distance calculation
- With scaling, both features contribute equally

This is why we MUST use the scaled datasets (X_train_landlord_scaled.csv, X_test_landlord_scaled.csv).

In [None]:
import pandas as pd
import numpy as np
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report
from sklearn.preprocessing import LabelEncoder
import pickle
import time
import warnings
warnings.filterwarnings('ignore')

# Load SCALED data (CRITICAL for SVM!)
X_train = pd.read_csv('../../data/processed/X_train_landlord_scaled.csv')
X_test = pd.read_csv('../../data/processed/X_test_landlord_scaled.csv')
y_train = pd.read_csv('../../data/processed/y_train_landlord.csv')
y_test = pd.read_csv('../../data/processed/y_test_landlord.csv')

# Remove ID columns if present
if 'id' in X_train.columns:
    X_train = X_train.drop('id', axis=1)
if 'id' in X_test.columns:
    X_test = X_test.drop('id', axis=1)
    
    

# Encode target labels (SVM requires numeric labels)
label_encoder = LabelEncoder()
y_train_encoded = label_encoder.fit_transform(y_train['value_category'])
y_test_encoded = label_encoder.transform(y_test['value_category'])

print(f"Training samples: {X_train.shape[0]}")
print(f"Testing samples: {X_test.shape[0]}")
print(f"Number of features: {X_train.shape[1]}")
print(f"\nTarget distribution (Training):")
unique, counts = np.unique(y_train_encoded, return_counts=True)
for val, count in zip(unique, counts):
    category = label_encoder.classes_[val]
    print(f"  Class {val} ({category}): {count} samples ({count/len(y_train_encoded)*100:.2f}%)")



## Step 2: SVM with RBF Kernel

We start with the RBF (Radial Basis Function) kernel, which is the most popular and versatile kernel for SVM.

### Understanding RBF Kernel Parameters:

**1. C (Regularization Parameter):**
- Controls the trade-off between smooth decision boundary and classifying training points correctly
- **Small C (e.g., 0.1):**
  - Wider margin, more tolerance for misclassification
  - Simpler model, less overfitting
  - May underfit if too small
- **Large C (e.g., 100):**
  - Narrow margin, tries to classify all training points correctly
  - More complex model, risk of overfitting
  - Better training accuracy but may not generalize well
- **Our choice: C=10** - Good balance for most problems

**2. gamma (Kernel Coefficient):**
- Defines how far the influence of a single training example reaches
- **Small gamma (e.g., 0.001):**
  - Far reach, considers points far away
  - Smoother decision boundary
  - May underfit
- **Large gamma (e.g., 1.0):**
  - Close reach, only considers nearby points
  - More complex, wiggly decision boundary
  - Risk of overfitting
- **Our choice: gamma='scale'** - Automatically calculated as 1/(n_features * X.var())

**3. decision_function_shape='ovr':**
- One-vs-Rest strategy for multi-class classification
- Trains one classifier per class vs all other classes
- Alternative: 'ovo' (One-vs-One) trains one classifier for each pair

### Why RBF Kernel?

1. **Handles non-linear patterns** - Can create circular/curved decision boundaries
2. **Works well when classes overlap** - Can separate complex patterns
3. **Default choice** - Good starting point for most problems
4. **Flexible** - Can approximate many different decision boundaries

### Expected Training Time:

SVM with RBF kernel can be slow on large datasets (10,000+ samples). Training may take 1-5 minutes depending on your machine.

In [None]:


# Train SVM with RBF kernel
start_time = time.time()

svm_rbf = SVC(
    kernel='rbf',
    C=10,
    gamma='scale',
    decision_function_shape='ovr',
    random_state=42,
    max_iter=1000
)

svm_rbf.fit(X_train, y_train_encoded)

rbf_training_time = time.time() - start_time

print(f"\n RBF SVM training complete!")
print(f"Training time: {rbf_training_time:.2f} seconds")
print(f"Number of support vectors: {len(svm_rbf.support_)}")
print(f"Support vectors per class: {svm_rbf.n_support_}")

## Step 3: Evaluate RBF SVM Performance

Now we evaluate how well the RBF SVM performs on both training and testing data.

### Understanding Support Vectors:

- **Support vectors** are the training samples that lie closest to the decision boundary
- They are the "critical" points that define the boundary
- Fewer support vectors = simpler model, faster predictions
- More support vectors = more complex model, may indicate difficult classification

### What to Look For:

1. **Training vs Testing Accuracy Gap:**
   - Small gap (< 5%) = good generalization
   - Large gap (> 10%) = overfitting, consider reducing C or increasing gamma

2. **Comparison with Tree Models:**
   - SVM may perform better on high-dimensional data
   - Tree models may be faster to train
   - Check which has better test accuracy

3. **Training Time:**
   - SVM is typically slower than Random Forest
   - But may be faster than XGBoost on small datasets

In [None]:
# Predictions
y_train_pred_rbf = svm_rbf.predict(X_train)
y_test_pred_rbf = svm_rbf.predict(X_test)

# Calculate metrics
train_acc_rbf = accuracy_score(y_train_encoded, y_train_pred_rbf)
test_acc_rbf = accuracy_score(y_test_encoded, y_test_pred_rbf)
test_precision_rbf = precision_score(y_test_encoded, y_test_pred_rbf, average='macro')
test_recall_rbf = recall_score(y_test_encoded, y_test_pred_rbf, average='macro')
test_f1_rbf = f1_score(y_test_encoded, y_test_pred_rbf, average='macro')

print("\nSVM with RBF Kernel Performance:")
print(f"  Training Accuracy: {train_acc_rbf:.4f}")
print(f"  Testing Accuracy:  {test_acc_rbf:.4f}")
print(f"  Precision (Macro): {test_precision_rbf:.4f}")
print(f"  Recall (Macro):    {test_recall_rbf:.4f}")
print(f"  F1-Score (Macro):  {test_f1_rbf:.4f}")
print(f"\n Overfitting Check: {abs(train_acc_rbf - test_acc_rbf):.4f} gap")
if abs(train_acc_rbf - test_acc_rbf) < 0.05:
    print("Good generalization!")
elif abs(train_acc_rbf - test_acc_rbf) < 0.10:
    print("Slight overfitting")
else:
    print("Significant overfitting - consider reducing C")

print("\nClassification Report (RBF Kernel):")
print(classification_report(y_test_encoded, y_test_pred_rbf, 
                          target_names=label_encoder.classes_))

## Step 4: SVM with Linear Kernel

Now we train SVM with a Linear kernel, which works best when data is linearly separable.

### Understanding Linear Kernel:

**What is Linear Kernel?**
- Creates a straight line (2D) or flat hyperplane (higher dimensions) to separate classes
- No transformation of data, works in original feature space
- Much faster to train than RBF kernel
- Simpler model, easier to interpret

**When to Use Linear Kernel:**
1. **Large number of features** - When features >> samples
2. **Text classification** - Sparse, high-dimensional data
3. **Linearly separable data** - When classes can be separated by a straight line
4. **Need for speed** - When training time is critical
5. **Interpretability** - When you need to understand feature weights

### Linear Kernel Parameters:

**C (Regularization Parameter):**
- Same meaning as in RBF kernel
- Controls margin width vs classification accuracy
- **Our choice: C=1.0** - Standard default value
- Linear kernel is less sensitive to C than RBF

**No gamma parameter:**
- Linear kernel doesn't use gamma
- Only needs to tune C parameter
- Simpler hyperparameter tuning

### Linear vs RBF Kernel:

| Aspect | Linear | RBF |
|--------|--------|-----|
| Decision Boundary | Straight line/plane | Curved/circular |
| Training Speed | Fast | Slow |
| Parameters to Tune | C only | C and gamma |
| Overfitting Risk | Lower | Higher |
| Flexibility | Low | High |
| Best For | Linearly separable data | Complex patterns |

### Expected Performance:

- Linear kernel should train much faster (seconds vs minutes)
- May have lower accuracy than RBF if data is non-linear
- But could perform similarly if data is mostly linear
- Less prone to overfitting

In [None]:
print("\n" + "="*60)
print("="*60)

# Train SVM with Linear kernel
start_time = time.time()

svm_linear = SVC(
    kernel='linear',
    C=1.0,
    decision_function_shape='ovr',
    random_state=42,
    max_iter=1000
)

svm_linear.fit(X_train, y_train_encoded)

linear_training_time = time.time() - start_time

print(f"\nLinear SVM training complete!")
print(f"Training time: {linear_training_time:.2f} seconds")
print(f"Number of support vectors: {len(svm_linear.support_)}")
print(f"Support vectors per class: {svm_linear.n_support_}")
print(f"\nSpeed comparison: Linear is {rbf_training_time/linear_training_time:.2f}x faster than RBF")

## Step 5: Evaluate Linear SVM Performance

Evaluate the Linear SVM and compare it with RBF SVM.

### Key Comparison Points:

1. **Accuracy:** Which kernel gives better test accuracy?
2. **Training Time:** How much faster is Linear?
3. **Overfitting:** Which kernel generalizes better?
4. **Support Vectors:** Fewer support vectors = simpler model

### Decision Guide:

**Choose Linear Kernel if:**
- Accuracy is similar to RBF (within 1-2%)
- Training time is much faster
- You need a simpler, more interpretable model
- You have many features

**Choose RBF Kernel if:**
- Accuracy is significantly better (> 2%)
- Training time is acceptable
- Data has complex, non-linear patterns
- You can afford the computational cost

In [None]:
# Predictions
y_train_pred_linear = svm_linear.predict(X_train)
y_test_pred_linear = svm_linear.predict(X_test)

# Calculate metrics
train_acc_linear = accuracy_score(y_train_encoded, y_train_pred_linear)
test_acc_linear = accuracy_score(y_test_encoded, y_test_pred_linear)
test_precision_linear = precision_score(y_test_encoded, y_test_pred_linear, average='macro')
test_recall_linear = recall_score(y_test_encoded, y_test_pred_linear, average='macro')
test_f1_linear = f1_score(y_test_encoded, y_test_pred_linear, average='macro')

print("\nSVM with Linear Kernel Performance:")
print(f"  Training Accuracy: {train_acc_linear:.4f}")
print(f"  Testing Accuracy:  {test_acc_linear:.4f}")
print(f"  Precision (Macro): {test_precision_linear:.4f}")
print(f"  Recall (Macro):    {test_recall_linear:.4f}")
print(f"  F1-Score (Macro):  {test_f1_linear:.4f}")
print(f"\n  Overfitting Check: {abs(train_acc_linear - test_acc_linear):.4f} gap")
if abs(train_acc_linear - test_acc_linear) < 0.05:
    print("Good generalization!")
elif abs(train_acc_linear - test_acc_linear) < 0.10:
    print("Slight overfitting")
else:
    print("Significant overfitting - consider reducing C")

print("\nClassification Report (Linear Kernel):")
print(classification_report(y_test_encoded, y_test_pred_linear, 
                          target_names=label_encoder.classes_))

## Step 6: RBF vs Linear Kernel Comparison

Direct comparison between the two SVM kernels to determine which performs better for our Airbnb value category classification problem.

### What This Comparison Tells Us:

1. **Data Linearity:**
   - If Linear performs similarly to RBF → Data is mostly linearly separable
   - If RBF significantly outperforms Linear → Data has non-linear patterns

2. **Model Complexity:**
   - More support vectors = more complex decision boundary
   - RBF typically needs more support vectors than Linear

3. **Practical Trade-offs:**
   - Is the accuracy gain of RBF worth the extra training time?
   - For deployment, Linear is faster for predictions too

4. **Overfitting Risk:**
   - Compare training-testing gaps
   - Smaller gap indicates better generalization

In [None]:
print("\n" + "="*60)
print("="*60)

print("\nPerformance Metrics:")
print(f"\n{'Metric':<25} {'RBF':<15} {'Linear':<15} {'Difference'}")
print("-" * 70)
print(f"{'Training Accuracy':<25} {train_acc_rbf:<15.4f} {train_acc_linear:<15.4f} {train_acc_rbf - train_acc_linear:+.4f}")
print(f"{'Testing Accuracy':<25} {test_acc_rbf:<15.4f} {test_acc_linear:<15.4f} {test_acc_rbf - test_acc_linear:+.4f}")
print(f"{'F1-Score (Macro)':<25} {test_f1_rbf:<15.4f} {test_f1_linear:<15.4f} {test_f1_rbf - test_f1_linear:+.4f}")
print(f"{'Precision (Macro)':<25} {test_precision_rbf:<15.4f} {test_precision_linear:<15.4f} {test_precision_rbf - test_precision_linear:+.4f}")
print(f"{'Recall (Macro)':<25} {test_recall_rbf:<15.4f} {test_recall_linear:<15.4f} {test_recall_rbf - test_recall_linear:+.4f}")

print("\n Model Complexity:")
print(f"{'Support Vectors (RBF)':<25} {len(svm_rbf.support_)}")
print(f"{'Support Vectors (Linear)':<25} {len(svm_linear.support_)}")
print(f"{'Training Time (RBF)':<25} {rbf_training_time:.2f} seconds")
print(f"{'Training Time (Linear)':<25} {linear_training_time:.2f} seconds")
print(f"{'Speed Advantage':<25} Linear is {rbf_training_time/linear_training_time:.2f}x faster")

print("\nWinner Determination:")
acc_diff = test_acc_rbf - test_acc_linear
if abs(acc_diff) < 0.01:
    print(" Tie: Both kernels perform similarly")
    print(" Recommendation: Use Linear kernel (faster, simpler)")
elif acc_diff > 0.02:
    print(f"RBF wins by {acc_diff:.4f} ({acc_diff*100:.2f}%)")
    print(" Recommendation: Use RBF kernel (better accuracy, worth the extra time)")
elif acc_diff > 0:
    print(f"RBF slightly better by {acc_diff:.4f} ({acc_diff*100:.2f}%)")
    print(" Recommendation: Consider Linear kernel (marginal difference, much faster)")
elif acc_diff < -0.02:
    print(f"Linear wins by {abs(acc_diff):.4f} ({abs(acc_diff)*100:.2f}%)")
    print(" Recommendation: Use Linear kernel (better accuracy AND faster!)")
else:
    print(f"Linear slightly better by {abs(acc_diff):.4f} ({abs(acc_diff)*100:.2f}%)")
    print(" Recommendation: Use Linear kernel (better accuracy AND faster!)")

print("\n Data Insights:")
if abs(acc_diff) < 0.02:
    print(" Data appears to be mostly linearly separable")
    print(" Complex non-linear patterns are minimal")
else:
    print(" Data contains non-linear patterns")
    print(" Kernel transformation provides benefit")

print("="*60)

## Step 7: Comparison with Tree-Based Models

Now we compare SVM performance with Random Forest and XGBoost to determine which algorithm works best for our problem.

### Why Compare Different Algorithm Types?

1. **Different Strengths:** Each algorithm excels in different scenarios
2. **No Free Lunch:** No single algorithm is best for all problems
3. **Practical Considerations:** Training time, interpretability, deployment complexity
4. **Model Selection:** Choose the best model for production deployment

### What to Look For:

**Performance:**
- Which model has the highest test accuracy?
- Which has the best F1-score (balanced metric)?
- Are the differences significant (> 2%) or marginal?

**Generalization:**
- Which model has the smallest train-test gap?
- Lower gap = better generalization

**Efficiency:**
- Training time comparison
- Prediction speed (SVM can be slow with many support vectors)

**Interpretability:**
- Tree models provide feature importance
- SVM is more of a "black box"

### Expected Outcomes:

**SVM might win if:**
- Data is high-dimensional
- Clear margin between classes
- Features are well-scaled

**Tree models might win if:**
- Data has complex interactions
- Features have different scales (though we scaled them)
- Need feature importance for interpretation

In [None]:
print("\n" + "="*70)
print("="*70)

# Load Random Forest and XGBoost results
try:
    rf_results = pd.read_csv('../../data/processed/random_forest_results.csv')
    xgb_results = pd.read_csv('../../data/processed/xgboost_results.csv')
    
    print("\n Performance Comparison:")
    print(f"\n{'Model':<20} {'Train Acc':<12} {'Test Acc':<12} {'F1-Score':<12} {'Gap':<10}")
    print("-" * 70)
    
    # Random Forest
    rf_gap = rf_results['train_accuracy'].values[0] - rf_results['test_accuracy'].values[0]
    print(f"{'Random Forest':<20} {rf_results['train_accuracy'].values[0]:<12.4f} "
          f"{rf_results['test_accuracy'].values[0]:<12.4f} "
          f"{rf_results['f1_macro'].values[0]:<12.4f} {rf_gap:<10.4f}")
    
    # XGBoost
    xgb_gap = xgb_results['train_accuracy'].values[0] - xgb_results['test_accuracy'].values[0]
    print(f"{'XGBoost':<20} {xgb_results['train_accuracy'].values[0]:<12.4f} "
          f"{xgb_results['test_accuracy'].values[0]:<12.4f} "
          f"{xgb_results['f1_macro'].values[0]:<12.4f} {xgb_gap:<10.4f}")
    
    # SVM RBF
    rbf_gap = train_acc_rbf - test_acc_rbf
    print(f"{'SVM (RBF)':<20} {train_acc_rbf:<12.4f} {test_acc_rbf:<12.4f} "
          f"{test_f1_rbf:<12.4f} {rbf_gap:<10.4f}")
    
    # SVM Linear
    linear_gap = train_acc_linear - test_acc_linear
    print(f"{'SVM (Linear)':<20} {train_acc_linear:<12.4f} {test_acc_linear:<12.4f} "
          f"{test_f1_linear:<12.4f} {linear_gap:<10.4f}")
    
    # Find best model
    models = {
        'Random Forest': rf_results['test_accuracy'].values[0],
        'XGBoost': xgb_results['test_accuracy'].values[0],
        'SVM (RBF)': test_acc_rbf,
        'SVM (Linear)': test_acc_linear
    }
    
    best_model = max(models, key=models.get)
    best_acc = models[best_model]
    
    print("\n WINNER:")
    print(f"  {best_model} with {best_acc:.4f} test accuracy")
    
    # Performance ranking
    print("\n Ranking by Test Accuracy:")
    sorted_models = sorted(models.items(), key=lambda x: x[1], reverse=True)
    for i, (model, acc) in enumerate(sorted_models, 1):
        print(f"  {i}. {model:<20} {acc:.4f}")
    
    # Generalization ranking (smallest gap)
    gaps = {
        'Random Forest': rf_gap,
        'XGBoost': xgb_gap,
        'SVM (RBF)': rbf_gap,
        'SVM (Linear)': linear_gap
    }

    print("\n Ranking by Generalization (smallest train-test gap):")
    sorted_gaps = sorted(gaps.items(), key=lambda x: abs(x[1]))
    for i, (model, gap) in enumerate(sorted_gaps, 1):
        print(f"  {i}. {model:<20} {abs(gap):.4f} gap")
    
    # Final recommendation
    if best_model.startswith('SVM'):
        print(f"Use {best_model} for deployment")
        print("Reasons:")
        print("- Highest test accuracy")
        print("- Good generalization")
        if best_model == 'SVM (Linear)':
            print("- Fast training and prediction")
        print("Trade-offs:")
        print("- No feature importance (less interpretable)")
        if best_model == 'SVM (RBF)':
            print("-Slower training time")
    else:
        print(f" Use {best_model} for deployment")
        print("Reasons:")
        print("- Highest test accuracy")
        print("- Provides feature importance (interpretable)")
        print("- Fast prediction speed")
       
    
    print("="*70)
    
except FileNotFoundError as e:
    print("\n Could not load tree model results.")
    print("Please run Task 2.2 (Random Forest) and Task 2.3 (XGBoost) first.")
    print(f"Missing file: {e.filename}")

## Step 8: Save Results and Models

Save all SVM results, predictions, and trained models for future use and comparison.

### Files Saved:

**1. Models:**
- `svm_rbf_model.pkl` - Trained RBF SVM model
- `svm_linear_model.pkl` - Trained Linear SVM model
- Can be loaded later for predictions without retraining

**2. Results:**
- `svm_rbf_results.csv` - RBF SVM performance metrics
- `svm_linear_results.csv` - Linear SVM performance metrics
- Easy comparison with other models

**3. Predictions:**
- `svm_rbf_predictions.csv` - RBF SVM predictions on test set
- `svm_linear_predictions.csv` - Linear SVM predictions on test set
- Useful for error analysis and ensemble methods

**4. Comparison:**
- `svm_kernel_comparison.csv` - Direct comparison of both kernels
- Helps decide which kernel to use

### Why Save These Files:

1. **Reproducibility:** Can recreate results without retraining
2. **Model Selection:** Easy to compare all models (LR, RF, XGB, SVM)
3. **Deployment:** Saved models ready for production use
4. **Documentation:** Complete record for final report
5. **Ensemble Methods:** Can combine predictions from multiple models

In [None]:
# Save RBF SVM results
rbf_results_df = pd.DataFrame({
    'model': ['SVM_RBF'],
    'kernel': ['rbf'],
    'C': [10],
    'gamma': ['scale'],
    'train_accuracy': [train_acc_rbf],
    'test_accuracy': [test_acc_rbf],
    'precision_macro': [test_precision_rbf],
    'recall_macro': [test_recall_rbf],
    'f1_macro': [test_f1_rbf],
    'training_time': [rbf_training_time],
    'n_support_vectors': [len(svm_rbf.support_)]
})
rbf_results_df.to_csv('../../data/processed/svm_rbf_results.csv', index=False)

# Save Linear SVM results
linear_results_df = pd.DataFrame({
    'model': ['SVM_Linear'],
    'kernel': ['linear'],
    'C': [1.0],
    'gamma': ['N/A'],
    'train_accuracy': [train_acc_linear],
    'test_accuracy': [test_acc_linear],
    'precision_macro': [test_precision_linear],
    'recall_macro': [test_recall_linear],
    'f1_macro': [test_f1_linear],
    'training_time': [linear_training_time],
    'n_support_vectors': [len(svm_linear.support_)]
})
linear_results_df.to_csv('../../data/processed/svm_linear_results.csv', index=False)

# Save kernel comparison
comparison_df = pd.DataFrame({
    'metric': ['Test Accuracy', 'F1-Score', 'Training Time (s)', 'Support Vectors'],
    'RBF': [test_acc_rbf, test_f1_rbf, rbf_training_time, len(svm_rbf.support_)],
    'Linear': [test_acc_linear, test_f1_linear, linear_training_time, len(svm_linear.support_)],
    'Difference': [
        test_acc_rbf - test_acc_linear,
        test_f1_rbf - test_f1_linear,
        rbf_training_time - linear_training_time,
        len(svm_rbf.support_) - len(svm_linear.support_)
    ]
})
comparison_df.to_csv('../../data/processed/svm_kernel_comparison.csv', index=False)

# Save predictions
rbf_predictions_df = pd.DataFrame({
    'y_true': y_test_encoded,
    'y_pred': y_test_pred_rbf
})
rbf_predictions_df.to_csv('../../data/processed/svm_rbf_predictions.csv', index=False)

linear_predictions_df = pd.DataFrame({
    'y_true': y_test_encoded,
    'y_pred': y_test_pred_linear
})
linear_predictions_df.to_csv('../../data/processed/svm_linear_predictions.csv', index=False)

# Save models
with open('../../models/svm_rbf_model.pkl', 'wb') as f:
    pickle.dump(svm_rbf, f)

with open('../../models/svm_linear_model.pkl', 'wb') as f:
    pickle.dump(svm_linear, f)


print("\n Models:")
print("  - ../../models/svm_rbf_model.pkl")
print("  - ../../models/svm_linear_model.pkl")
print("\n Results:")
print("  - ../../data/processed/svm_rbf_results.csv")
print("  - ../../data/processed/svm_linear_results.csv")
print("  - ../../data/processed/svm_kernel_comparison.csv")
print("\n  Predictions:")
print("  - ../../data/processed/svm_rbf_predictions.csv")
print("  - ../../data/processed/svm_linear_predictions.csv")

## Conclusion and Key Takeaways

### Summary of SVM Implementation:

We successfully implemented and compared two SVM variants:
1. **SVM with RBF Kernel** - For non-linear patterns
2. **SVM with Linear Kernel** - For linearly separable data

### Key Learnings:

**1. Feature Scaling is Critical:**
- SVM is distance-based, so feature scaling is mandatory
- Tree-based models don't need scaling
- Always use StandardScaler or MinMaxScaler for SVM

**2. Kernel Selection Matters:**
- RBF kernel: More flexible, handles non-linear patterns
- Linear kernel: Faster, simpler, works for linear data
- Compare both to find the best fit

**3. Hyperparameter Tuning:**
- **C parameter:** Controls regularization (margin vs accuracy)
- **gamma parameter:** Controls kernel influence (RBF only)
- Start with defaults, then tune if needed

**4. SVM vs Tree Models:**
- SVM: Better for high-dimensional data, smooth boundaries
- Trees: Better for feature interactions, provide interpretability
- No guaranteed winner - depends on the data

### Hyperparameter Tuning Tips:

If you want to improve SVM performance further:

**For RBF Kernel:**
- Try C values: [0.1, 1, 10, 100]
- Try gamma values: ['scale', 'auto', 0.001, 0.01, 0.1]
- Use GridSearchCV for systematic tuning

**For Linear Kernel:**
- Try C values: [0.01, 0.1, 1, 10, 100]
- Linear is less sensitive to hyperparameters
