# **SVR Model Theory**


## Theory
Support Vector Regressor (SVR) is a regression model based on the Support Vector Machine (SVM) algorithm. It is designed to find a function that approximates the target variable while maintaining a margin of tolerance for the error. The goal of SVR is to find a function that lies within a predefined margin (epsilon) from the actual data points, with the least amount of error outside this margin. The SVR aims to achieve both a low training error and high generalization ability by controlling the complexity of the model.

The model function for SVR is:

$$ f(x) = w^T x + b $$

Where:
- $f(x)$ is the predicted output.
- $w$ is the weight vector (the model coefficients).
- $x$ is the input feature vector.
- $b$ is the bias term.

SVR differs from regular SVM because it allows for errors within a margin (controlled by epsilon), and the goal is to minimize the complexity of the model while respecting the margin constraints.

## Model Training

### Forward Pass

In the forward pass of SVR, the model makes predictions using the learned weights and bias. The predictions are based on the learned decision function:

$$ f(x) = w^T x + b $$

Where the decision function is used to approximate the target variable, while ensuring that the margin defined by epsilon is respected.

### Cost Function

The cost function in SVR is designed to penalize predictions that fall outside the margin. The epsilon-insensitive loss function is used, which penalizes predictions that are further than epsilon from the actual target value. The cost function is:

$$ J(w,b,\epsilon) = \frac{1}{2} \|w\|^2 + C \sum_{i=1}^{m} \max(0, |f(x^{(i)}) - y^{(i)}| - \epsilon) $$

Where:
- $J(w,b,\epsilon)$ is the cost function.
- $w$ is the weight vector.
- $b$ is the bias.
- $m$ is the number of training examples.
- $x^{(i)}$ and $y^{(i)}$ are the input and actual output for the $i$-th example.
- $\epsilon$ is the margin of tolerance for errors.
- $C$ is the regularization parameter that controls the trade-off between model complexity and the margin of tolerance.

The first term, $\frac{1}{2} \|w\|^2$, represents the regularization term that aims to minimize the model complexity (i.e., finding a simpler model with smaller weights). The second term penalizes data points that fall outside the epsilon margin.

### Gradient Computation

The gradient of the cost function with respect to the weights and bias is computed to update the model parameters during training. The update equations are based on the principle of minimizing the error, while also ensuring that the margin constraints are satisfied.

## Training Process

The training process of an SVR involves finding the optimal weights ($w$) and bias ($b$) that minimize the cost function while satisfying the epsilon margin. This is typically done through convex optimization techniques such as Quadratic Programming (QP) or using more efficient algorithms like Sequential Minimal Optimization (SMO).

The steps are as follows:
1. **Initial prediction**: Start with an initial guess for the weights and bias, often set to zero.
2. **Compute loss**: Calculate the loss function based on the current weights and bias, including both the margin constraint and the regularization term.
3. **Update parameters**: Update the weights and bias using optimization techniques like gradient descent or SMO to minimize the cost function.
4. **Repeat**: Repeat the process until the model parameters converge or a stopping criterion is met.

SVR also supports tuning of hyperparameters such as the regularization parameter $C$, the margin parameter $\epsilon$, and the kernel used in the transformation of the input space (e.g., linear, polynomial, or RBF kernels).

By adjusting the regularization and margin parameters, SVR can be optimized to make accurate predictions while avoiding overfitting, providing a robust model for regression tasks.


## **Model Evaluation**

### 1. Mean Squared Error (MSE)

**Formula:**
$$
\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_{\text{true}_i} - y_{\text{pred}_i})^2
$$

**Description:**
- **Mean Squared Error (MSE)** is a widely used metric for evaluating the accuracy of regression models.
- It measures the average squared difference between the predicted values ($y_{\text{pred}}$) and the actual target values ($y_{\text{true}}$).
- The squared differences are averaged across all data points in the dataset.

**Interpretation:**
- A lower MSE indicates a better fit of the model to the data, as it means the model's predictions are closer to the actual values.
- MSE is sensitive to outliers because the squared differences magnify the impact of large errors.
- **Limitations:**
  - MSE can be hard to interpret because it is in squared units of the target variable.
  - It disproportionately penalizes larger errors due to the squaring process.

---

### 2. Root Mean Squared Error (RMSE)

**Formula:**
$$
\text{RMSE} = \sqrt{\text{MSE}}
$$

**Description:**
- **Root Mean Squared Error (RMSE)** is a variant of MSE that provides the square root of the average squared difference between predicted and actual values.
- It is often preferred because it is in the same unit as the target variable, making it more interpretable.

**Interpretation:**
- Like MSE, a lower RMSE indicates a better fit of the model to the data.
- RMSE is also sensitive to outliers due to the square root operation.
- **Advantages over MSE:**
  - RMSE provides a more intuitive interpretation since it is in the same scale as the target variable.
  - It can be more directly compared to the values of the actual data.

---

### 3. R-squared ($R^2$)

**Formula:**
$$
R^2 = 1 - \frac{\text{SSR}}{\text{SST}}
$$

**Description:**
- **R-squared ($R^2$)**, also known as the coefficient of determination, measures the proportion of the variance in the dependent variable ($y_{\text{true}}$) that is predictable from the independent variable(s) ($y_{\text{pred}}$) in a regression model.
- It ranges from 0 to 1, where 0 indicates that the model does not explain any variance, and 1 indicates a perfect fit.

**Interpretation:**
- A higher $R^2$ value suggests that the model explains a larger proportion of the variance in the target variable.
- However, $R^2$ does not provide information about the goodness of individual predictions or whether the model is overfitting or underfitting.
- **Limitations:**
  - $R^2$ can be misleading in cases of overfitting, especially with polynomial regression models. Even if $R^2$ is high, the model may not generalize well to unseen data.
  - It doesn’t penalize for adding irrelevant predictors, so adjusted $R^2$ is often preferred for models with multiple predictors.

---

### 4. Adjusted R-squared

**Formula:**
$$
\text{Adjusted } R^2 = 1 - \left(1 - R^2\right) \frac{n-1}{n-p-1}
$$
where \(n\) is the number of data points and \(p\) is the number of predictors.

**Description:**
- **Adjusted R-squared** adjusts the R-squared value to account for the number of predictors in the model, helping to prevent overfitting when adding more terms to the model.
- Unlike $R^2$, it can decrease if the additional predictors do not improve the model significantly.

**Interpretation:**
- A higher adjusted $R^2$ suggests that the model is not just overfitting, but has genuine explanatory power with the number of predictors taken into account.
- It is especially useful when comparing models with different numbers of predictors.

---

### 5. Mean Absolute Error (MAE)

**Formula:**
$$
\text{MAE} = \frac{1}{n} \sum_{i=1}^{n} |y_{\text{true}_i} - y_{\text{pred}_i}|
$$

**Description:**
- **Mean Absolute Error (MAE)** measures the average of the absolute errors between the predicted and actual values.
- Unlike MSE and RMSE, MAE is not sensitive to outliers because it does not square the errors.

**Interpretation:**
- MAE provides a straightforward understanding of the average error magnitude.
- A lower MAE suggests better model accuracy, but it may not highlight the impact of large errors as much as MSE or RMSE.

## sklearn template [scikit-learn: SVR](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVR.html)

### class sklearn.svm.SVR(*, C=1.0, epsilon=0.1, kernel='rbf', degree=3, gamma='scale', coef0=0.0, shrinking=True, tol=1e-3, cache_size=200, verbose=False, max_iter=-1)

| **Parameter**        | **Description**                                                                                          | **Default**  |
|----------------------|----------------------------------------------------------------------------------------------------------|--------------|
| `C`                  | Regularization parameter. The strength of regularization (higher means more regularization).              | `1.0`        |
| `epsilon`            | Epsilon parameter for the epsilon-SVR model. Specifies the epsilon-tube within which no penalty is associated. | `0.1`        |
| `kernel`             | Specifies the kernel type ('linear', 'poly', 'rbf', etc.).                                                | `'rbf'`      |
| `degree`             | Degree of the polynomial kernel function ('poly'). Ignored by other kernels.                             | `3`          |
| `gamma`              | Kernel coefficient for ‘rbf’, ‘poly’, and ‘sigmoid’.                                                     | `'scale'`    |
| `coef0`              | Independent term in kernel function.                                                                     | `0.0`        |
| `shrinking`          | Whether to use the shrinking heuristic.                                                                   | `True`       |
| `tol`                | Tolerance for stopping criteria.                                                                          | `1e-3`       |
| `cache_size`         | Size of the kernel cache in MB.                                                                          | `200`        |
| `verbose`            | Whether to print progress messages during fitting.                                                       | `False`      |
| `max_iter`           | The maximum number of iterations. -1 means no limit.                                                     | `-1`         |

---

| **Attribute**        | **Description**                                                                                           |
|----------------------|-----------------------------------------------------------------------------------------------------------|
| `coef_`              | Weights assigned to the features (only for 'linear' kernel).                                                |
| `dual_coef_`         | Coefficients of the support vector in the decision function.                                                |
| `intercept_`         | Constants in decision function.                                                                            |
| `n_features_in_`     | Number of features seen during fit.                                                                        |
| `feature_names_in_`  | Names of features seen during fit (only if `X` contains feature names).                                    |
| `n_iter_`            | Number of iterations run by the optimization routine.                                                      |
| `n_support_`         | Number of support vectors for each class.                                                                  |
| `support_`           | Indices of support vectors.                                                                                |
| `support_vectors_`   | Support vectors.                                                                                           |

---

| **Method**           | **Description**                                                                                           |
|----------------------|-----------------------------------------------------------------------------------------------------------|
| `fit(X, y)`          | Fit the model to the training data `X` and target values `y`.                                              |
| `predict(X)`         | Predict regression target for the input data `X`.                                                         |
| `score(X, y)`        | Return the coefficient of determination (R² score).                                                       |
| `get_params()`       | Get the parameters of the SVR model.                                                                       |
| `set_params(**params)`| Set the parameters of the SVR model.                                                                      |




# XXXXXXXX regression - Example

## Data loading

##  Data processing

## Plotting data

## Model definition

## Model evaulation