# SLU12 - Validation metrics for regression: Example Notebook

In this notebook [some regression validation metrics offered by scikit-learn](http://scikit-learn.org/stable/modules/model_evaluation.html#common-cases-predefined-values) are presented.

In [1]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression

# some scikit-learn regression validation metrics
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

np.random.seed(60)

# Load Data
Load the Boston house-prices dataset, fit a Linear Regression, and make prediction on the dataset (used to create the model).

In [2]:
data = load_boston()

x = pd.DataFrame(data['data'], columns=data['feature_names'])
y = pd.Series(data['target'])

lr = LinearRegression()
lr.fit(x, y)

y_hat = lr.predict(x)

## Mean Absolute Error

$$MAE = \frac{1}{N} \sum_{n=1}^N \left| y_n - \hat{y}_n \right|$$

In [3]:
mean_absolute_error(y, y_hat)

3.272944637996938

## Mean Squared Error

$$MSE = \frac{1}{N} \sum_{n=1}^N (y_n - \hat{y}_n)^2$$

In [4]:
mean_squared_error(y, y_hat)

21.8977792176875

## Root Mean Squared Error

$$RMSE = \sqrt{MSE}$$

In [5]:
np.sqrt(mean_squared_error(y, y_hat))

4.679506300635516

## R² score

$$\bar{y} = \frac{1}{N} \sum_{n=1}^N y_n$$

$$R² = 1 - \frac{MSE(y, \hat{y})}{MSE(y, \bar{y})} 
= 1 - \frac{\frac{1}{N} \sum_{n=1}^N (y_n - \hat{y}_n)^2}{\frac{1}{N} \sum_{n=1}^N (y_n - \bar{y})^2}
= 1 - \frac{\sum_{n=1}^N (y_n - \hat{y}_n)^2}{\sum_{n=1}^N (y_n - \bar{y})^2}$$

In [6]:
r2_score(y, y_hat)

0.7406077428649427