# Regression Model Evaluation & Selection

---

Residual Sum of Squares:

$$ SS_{res} = \sum (y_{i} - \hat{y}_i)^2 $$

Total Sum of Squares:

$$ SS_{tot} = \sum ( y_{i} - y_{avg} )^2 $$

---

## R Squared

$$ R^2 = 1 - { SS_{res} \over SS_{tot} } $$

The better our model fits our data, the smaller $ SS_{res} $ will be, meaning that $ R^2 $ will be closer to 1.

| $ R^2 $ value | Quality of our Model      |
| :---:         | :---:                     |
| 1             | Impossible, suspicious    |
| ~ 0.9         | Very good                 |
| < 0.7         | Not great                 |
| < 0.4         | Terrible                  |
| < 0           | Model makes no sense      |

*Highly dependent on the context*

---

## Adjusted R Squared

When we add a new variable to our regression, the $ SS_{tot} $ will stay the same, on the other hand, the $ SS_{res} $ wil always stay the same or decrease; meaning that if we used the ***R Squared*** method to evaluate or model, we will never get e better value than out previous model with less predictors. This is because of ***Ordinary Least Squares*** will always try to minimize the $ SS_{res} $.

When added a new variable, the ***Ordinary Least Squares*** method will look for a $ b_{n+1} $ coefficient that improve the $ \hat{y}_i $ value. In the case that the regression cannot find a $ b_{n+1} $ that makes the $ SS_{res} $ better, it will give the coefficient a value of $ 0 $. This means that the variable $ x_{n+1} $ will be ignored completely to make predictions, but will be helping to improve the score of $ R^{2} $.

The solution to this problem is the ***Adjusted R Squared***, which is calculated with this formula.

$$ Adj R^{2} = 1 - ( 1 - R^{2} ) \times { n-1 \over n-k-1 } $$

$ k $ - Number of independent variables.

$ n $ - Sample size

This new formula penalizes the score for adding new variables, which fix the problem with the original ***R Squared***

---

## Selection

To select the model that fits the best your data you have to test all of the models and evaluate them using the `r2_score()` method found in the sklearn.metrics package.