# Support Vector Regression (SVR) Demo
This notebook shows how to train, evaluate, and visualize Support Vector Regression (SVR) models using scikit-learn.

## What is SVR?
SVR adapts Support Vector Machines to regression. Instead of a hard decision boundary, SVR fits a function that stays within an **epsilon-insensitive tube** around the data. Points inside the tube are treated as correct; points outside become **support vectors** that influence the fit.

- **Linear kernel**: assumes a mostly linear relationship.
- **Polynomial kernel**: bends the line using polynomial features.
- **RBF kernel (Gaussian)**: measures similarity with distance; a flexible default for non-linear patterns.

Scaling is essential because kernels rely on distances. We use `StandardScaler` to give each feature zero mean and unit variance.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.svm import SVR
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

sns.set(style='whitegrid')


## Load and inspect the data
We use the California Housing dataset. It has eight numeric features and a continuous target (`MedHouseVal`).

In [None]:
data = fetch_california_housing(as_frame=True)
X = data.data
y = data.target
X.head()

## Train/test split and scaling
SVR is sensitive to feature scales, so we wrap scaling and the model in one pipeline.

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

rbf_pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('svr', SVR(kernel='rbf', C=10.0, epsilon=0.1))
])

rbf_pipeline.fit(X_train, y_train)
rbf_preds = rbf_pipeline.predict(X_test)

mse = mean_squared_error(y_test, rbf_preds)
mae = mean_absolute_error(y_test, rbf_preds)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, rbf_preds)

print({'MSE': mse, 'MAE': mae, 'RMSE': rmse, 'R2': r2})

## Kernel comparison
Try different kernels to see how the fit changes.

In [None]:
kernels = ['linear', 'poly', 'rbf']
scores = {}
for kernel in kernels:
    pipeline = Pipeline([('scaler', StandardScaler()), ('svr', SVR(kernel=kernel))])
    pipeline.fit(X_train, y_train)
    preds = pipeline.predict(X_test)
    scores[kernel] = r2_score(y_test, preds)

scores

## Visualize predictions
A scatter plot of actual vs. predicted values helps spot under/over-prediction.

In [None]:
plt.figure(figsize=(6,6))
sns.scatterplot(x=y_test, y=rbf_preds, alpha=0.6)
lims = [min(y_test.min(), rbf_preds.min()), max(y_test.max(), rbf_preds.max())]
plt.plot(lims, lims, 'r--', label='Ideal')
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.title('SVR Predictions vs Actual (RBF kernel)')
plt.legend()
plt.show()