# Evaluating Regression Models Performance
There are multiple methods to evaluate regression model performance.

# R Squared
<img src="images/evaluation/r_squared_example.png" height="75%" width="75%"></img>

- SSres is the sum of residuals using the Least Ordinary Squares method
- SStot is the total sum of squares from the average using the Least Ordinary Squares Method
- yAvg is the average salary based on the values of all salaries

R^2 tells us how "good" is our line compared to the average line. It tells us the variance of the observation, in other words how the model is dealing with the "noise" (variance) of the data.

In order to get the best-fitting line, we run a linear regression to get the "best" line compared to the average line. The closer R^2 is to 1, the better the regression line is to model the data set.
- Sometimes R^2 can become negative: whenever SSres is a value greater than SStot
- Sometimes R^2 can be greater than 1: whenever the data set is too small, has no logical meaning

### When To Use R Squared?
R Squared is usually best used for linear regression models (simple, multiple, or polynomial). These models typically follow the Least Ordinary Squares method, which utilizes the R^2 methodology to determine a best-fit line.

Non-linear models like SVR's RBF kernel should use other error metrics like Root Mean Squared Error.

# Adjusted R Squared
R Squared encounters a problem if we keep adding new independent variables to a multi-linear model.

The problem here is that R^2 will never decrease because adding new variables to the model may not affect the model at all. The fact that we're trying to minimize the sum of squares of the residuals, and if adding a new independent variable only makes R^2 worse, then R^2 will never decrease. This is because the model would just add-in a 0 for the coefficient of the new independent variable to prevent R^2 from decreasing.

The R^2 is bias, it will always be increasing and only look for improvements...

### Adjusted R Squared Formula
Therefore, we need to look for a new method to calculate R^2 called Adjusted R^2.
- Unlike R^2, Adjusted R^2 will penalize the model even if new independent variables don't help it

<img src="images/evaluation/adjusted_r_squared_formula.png" height="50%" width="50%"></img>
- p = the number of independent variables (regressors): the number of columns in the 2D Array
- n = number of samples: the number of rows in the 2D Array

If we increase the number of independent variables, the ratio ```(n - 1) / (n - p - 1)``` will increase which causes Adjusted R^2 to decrease if the value of R^2 and n are constant.

However, even if we increase the number of independent variables but R^2 increases substantially while the value of n is constant, then the value of Adjusted R^2 increases.

Therefore, Adjusted R^2 has a balancing mechanism between the value of p and the value of R^2.

In [48]:
# import libraries
import numpy as np
import pandas as pd

In [88]:
# import the linear regression class
from sklearn.linear_model import LinearRegression

# create random, linear, data
x = np.array([[1], [2], [3], [4], [5]])
y = np.array([1.2, 2.7, 3.3, 4.5, 5.2])

# create a linear regressor Object, then fit it to the training data
regressor = LinearRegression()
regressor.fit(x, y)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
         normalize=False)

In [89]:
# import the r^2 metric
from sklearn.metrics import r2_score

# predict the y-values for x
y_pred = regressor.predict(x)

# compare the actual and predicted values using R^2
r2_score(y, y_pred)

0.9812014711892113

# Evaluating R^2
As seen above, the R^2 value is very good because it's a 0.98 (a number very close to 1).

Therefore, we can state that the simple linear regression model is highly accurate.