# Support Vector Machine (SVM) - Regression (SVR)

### What is SVR?
Support Vector Regression (SVR) is an extension of SVM for **regression problems**.  
Instead of finding a hyperplane to separate classes, SVR tries to fit the data within a **tube (epsilon margin)** around the regression function.  

- **Objective:** Minimize error within the margin while keeping the model as simple (flat) as possible.  
- **Kernels:** Like SVM, SVR can use kernels (RBF, polynomial, etc.) to handle non-linear relationships.  

### Why California Housing Dataset?
- A popular dataset where the goal is to predict **median house value** from features such as:  
  - Median income  
  - House age  
  - Average rooms, population, etc.  
- Regression task → output is a continuous variable (house price).  

### What We Do in This Notebook:
1. Load the **California Housing dataset**.  
2. Standardize features using **StandardScaler** (important for SVR).  
3. Split into **training and test sets**.  
4. Use **GridSearchCV** to tune hyperparameters (`C`, `gamma`, `kernel`).  
5. Train the best SVR model.  
6. Evaluate performance using:  
   - **Root Mean Squared Error (RMSE)**  
   - **Actual vs Predicted scatter plot**  

### Expected Output
- Best hyperparameters chosen by GridSearchCV.  
- Reasonable RMSE value on the test set.  
- Scatter plot showing how closely predictions match actual house values.  


### SVM Regression on California Housing Dataset

In [1]:
# Imports
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.svm import SVR
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import root_mean_squared_error

In [2]:
# Load dataset
# California Housing dataset: Predict house value from features
data = fetch_california_housing()
X, y = data.data, data.target

# Standardize features (important for SVR performance)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Train-test split (80/20)
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

In [3]:
#  Define parameter grid ---
param_grid = {
    'C': [0.1, 1, 10],      # Regularization
    'gamma': [0.01, 0.1, 'scale'],
    'kernel': ['rbf']       # RBF works well for regression
}

In [None]:
# Train with GridSearchCV ---
gcv = GridSearchCV(SVR(), param_grid, cv=3, scoring='neg_root_mean_squared_error')
gcv.fit(X_train, y_train)

best_model = gcv.best_estimator_
print("Best Parameters:", gcv.best_params_)

In [None]:
# Evaluate on test set ---
y_pred = best_model.predict(X_test)

rmse = root_mean_squared_error(y_test, y_pred)
print("Test RMSE:", rmse)

In [None]:
# Scatter Plot of Predictions ---
plt.scatter(y_test, y_pred, alpha=0.5)
plt.xlabel("Actual Values")
plt.ylabel("Predicted Values")
plt.title("SVR - Actual vs Predicted (California Housing)")
plt.show()