## Model evaluation and validation

* How to create a test set for your models.
* How to use confusion matrices to evaluate false positives, and false negatives.
* How to measure accuracy and other model metrics.
* How to evaluate regression.
* How to detect whether you are overfitting or underfitting based on the complexity of your model.
* How to use cross validation to ensure your model is generalizable.

With Regression and Classification.

**Regression** is a definition of a model that predicts a value.

**Classification** is meant to determine a state.

To test the validity of a model we check the difference between the outcome of the model and a test dataset that has been previously detached from the total dataset.

*You shall never use testing data for training.*

Example for model evaluation:

In [4]:
import sklearn
from sklearn.model_selection import train_test_split

X = []
y = []

X_train, y_train, X_test, y_test = train_test_split(
    X,
    y,
    test_size = 0.25
)

## How good is the model? Accuracy

### For Classification

#### Confusion matrix
\begin{matrix}
  relevation/modeled & True & False \\
  True & True-Positive & False-Negative \\
  False & False-Positive & False-Negative
\end{matrix}
 
#### Practical example
##### Matrix

\begin{matrix}
  relevation/modeled & True & False \\
  True & 1000 & 200 \\
  False & 800 & 8000
 \end{matrix}
 
##### Accuracy
 $$  accuracy = \frac{total-of-correctly-modeled}{total-of-observation} $$

```
from sklearn.metrics import accuracy_score
accuracy_score(y_true, y_pred)
```

### For Regression

**mean absolute error:** addition of absolute values of distances of the points to the linear model (non-differentiable)
```
from sklearn.metrics import mean_absolute_error
from sklearn.linear_model import LinearRegression

classifier = LinearRegression()
classifier.fit(X, y)

guesses = classifier.predict(X)

error = mean_absolute_error(y, guesses)
```
**mean squared error:** addition of the squared distances
```
from sklearn.metrics import mean_squared_error
from sklearn.linear_model import LinearRegression

classifier = LinearRegression()
classifier.fit(X, y)

guesses = classifier.predict(X)

error = mean_squared_error(y, guesses)
```
**R2 score:** one minus the division between the addition of the squared distances of all points and the addition of the distance of the points from the average of all points:
$$R2 = 1 - \frac{mean_squared_error}{mean_absolute_error_from_the_average}$$

For a BAD model, $R2$ is close to $0$.

For a GOOD model $R2$ is close to $1$.

```
from sklearn.metrics import r2_score

y_true = [1, 2, 4]
y_pred = [1.3, 2.5, 3.7]

r2_score(y_true, y_pred)

```

## Model complexity graph

For a given model and dataset, it measures the possible underfitting or overfitting of a given model based on its complexity (linear, quadratic, cubic, etc. regression model on the x-axis) by measuring the *training errors* and the *cross-validation errors* of each model (a couple of numbers on the y-axis):

```
linear     |  (3, 2) -> underfitting (high-bias error)
quadratic  |  (1, 1) -> fitting the dataset
cubic      |  (1, 3) -> overfitting (high variance error)
degree 4   |  (0, 5) -> overfitting (high variance error)
```

Quadratic is the choice for this dataset. To perform this kind of choice we use a **cross-validation**. So we are now splitting our dataset into a training set, a cross-validation set and a testing set (to avoid using the testing set before it's really needed).

## K-Fold cross-validation

To avoid wasting data, it is possible to have different cycles of computation with different sets of training and testing and cross-validation data coming from the same dataset. The dataset is split in buckets and the training and testing sets are shuffled and the mean is extracted to avoid overfitting or underfitting.

```
es)
```

In [21]:
from sklearn.model_selection import KFold
import numpy as np

kf = KFold(3, shuffle=True)
for train_indices, test_indices in kf.split(np.array([[1,2,3], [1,2,3], [3,4,5]])):
    print(train_indices, test_indices)

[0 2] [1]
[0 1] [2]
[1 2] [0]
