# Support Vector Regression (SVR) From Scratch

SVR applies the principles of SVM to regression problems. It tries to find a function that deviates at most $\epsilon$ from the actual targets for all training data, while being as flat as possible.

## Key Concepts:
- **Epsilon-insensitive Loss**: Errors less than $\epsilon$ are ignored
- **Tolerance ($\epsilon$)**: The tube size around the regression line
- **Regularization**: Penalizing large weights to ensure flatness
- **Primal optimization**: Gradient descent on our structured loss function

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.svm import SVR as SklearnSVR

## 1. Mathematical Foundation

### Cost Function
Minimize:
$$J(w, b) = \frac{1}{2}||w||^2 + C \sum_{i=1}^n L_\epsilon(y_i, f(x_i))$$

Where $L_\epsilon$ is the $\epsilon$-insensitive loss function:
$$L_\epsilon(y, f(x)) = \max(0, |y - f(x)| - \epsilon)$$

### Gradients (for linear SVR)
If $|y_i - f(x_i)| \leq \epsilon$, the loss part is zero, only regularization contributes.
If $|y_i - f(x_i)| > \epsilon$, the gradient depends on the direction of error.

In [None]:
class SVR:
    def __init__(self, learning_rate=0.001, C=1.0, epsilon=0.1, n_iters=1000):
        self.lr = learning_rate
        self.C = C
        self.epsilon = epsilon
        self.n_iters = n_iters
        self.w = None
        self.b = None

    def fit(self, X, y):
        n_samples, n_features = X.shape
        self.w = np.zeros(n_features)
        self.b = 0

        for _ in range(self.n_iters):
            for i in range(n_samples):
                prediction = np.dot(X[i], self.w) + self.b
                error = prediction - y[i]
                
                if abs(error) > self.epsilon:
                    # Gradient for weight: w + C * sign(error) * x_i
                    # Gradient for bias: C * sign(error)
                    target_sign = np.sign(error)
                    self.w -= self.lr * (self.w + self.C * target_sign * X[i])
                    self.b -= self.lr * (self.C * target_sign)
                else:
                    self.w -= self.lr * self.w

    def predict(self, X):
        return np.dot(X, self.w) + self.b

    def score(self, X, y):
        y_pred = self.predict(X)
        return 1 - np.sum((y - y_pred)**2) / np.sum((y - np.mean(y))**2)

## 2. Testing and Visualization

In [None]:
X, y = make_regression(n_samples=100, n_features=1, noise=15, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

svr = SVR(epsilon=10, C=10)
svr.fit(X_train, y_train)
print(f"Our SVR R2: {svr.score(X_test, y_test):.4f}")

X_line = np.linspace(X.min(), X.max(), 100).reshape(-1, 1)
y_line = svr.predict(X_line)
plt.scatter(X, y)
plt.plot(X_line, y_line, color='red')
plt.plot(X_line, y_line + svr.epsilon, color='grey', linestyle='--')
plt.plot(X_line, y_line - svr.epsilon, color='grey', linestyle='--')
plt.title("SVR with Epsilon Tube")
plt.show()