### Chapter 11
## Model Evaluation

### 11.0 Introduction
### 11.1 Cross-Validating Models
#### Problem
You want to evaluate how well your model will work in the real world

#### Solution
Create a pipeline that preprocesses the data, trains the model, and then evaluates it using cross-validation:

In [1]:
# load libraries
from sklearn import datasets, metrics
from sklearn.model_selection import KFold, cross_val_score
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler

# load digits dataset
digits = datasets.load_digits()

# create features matrix
features = digits.data

# create target vector
target = digits.target

# create standardizer
standardizer = StandardScaler()

# create logitic regression object
logit = LogisticRegression()

# create a pipeline that standardizes, then runs logistic regression
pipeline = make_pipeline(standardizer, logit)

# create k-fold cross-validation
kf = KFold(n_splits=10, shuffle=True, random_state=1)

# conduct k-fold cross-validation
cv_results = cross_val_score(pipeline, # Pipeline
                             features, # feature matrix
                             target, # target vector
                             cv=kf, # cross-validation technique,
                             scoring="accuracy", # loss function
                             n_jobs=-1) # use all CPU cores

# calculate mean
cv_results.mean()

0.964931719428926

#### Discussion
#### See Also
* Why every statistician should know about cross-validation (https://robjhyndman.com/hyndsight/crossvalidation/)
* Cross-Validation Gone Wrong (http://betatim.github.io/posts/cross-validation-gone-wrong/)

### 11.2 Creating a Baseline Regression Model
#### Problem
You want a simple baseline regression model to compare against your model
#### Solution
Use scikit-learn's DummyRegressor to create a simple model to use as a baseline:

In [2]:
# load libraries
from sklearn.datasets import load_boston
from sklearn.dummy import DummyRegressor
from sklearn.model_selection import train_test_split

# load data
boston = load_boston()

# create features
features, target = boston.data, boston.target

# make test and training split
features_train, features_test, target_train, target_test = train_test_split(features, target, random_state=0)

# create a dummy regressor
dummy = DummyRegressor(strategy='mean')

# "Train" dummy regressor
dummy.fit(features_train, target_train)

# Get R-squared score
dummy.score(features_test, target_test)

-0.001119359203955339

To compare, we train our model and evaluate the performance score:

In [4]:
# load library
from sklearn.linear_model import LinearRegression

# train simple linear regression model
ols = LinearRegression()
ols.fit(features_train, target_train)

# get R-squared score
ols.score(features_test, target_test)

0.6353620786674623

#### Discussion
DummyRegressor allows us to create a very simple model that we can use as abaseline to compare against our actual model. This can often be useful to simulate a "naive" existing prediction process in a product or system. For example, a product might have been originally hardcoded to assume that all new users will spend $100 in the first month, regardless of their features. If we encode that assumption into a baseline model, we are able to concretely state the benefits of using a machine learning approach.

DummyRegressor uses the strategy parameter to set the method of making predictions, including the mean or median value in the training set. Furthermore, if we set strategy to constant and use the constant parameter, we can set the dummy regressor to predict some constant value for every observation:

In [5]:
# create dummy regressor that predicts 20's for everything
clf = DummyRegressor(strategy='constant', constant=20)
clf.fit(features_train, target_train)

# evaluate score
clf.score(features_test, target_test)

-0.06510502029325727

One small note regarding score. By default, score returns the coefficient of determination (R-squared, $R^2$) score:

$$R^2 = 1 - \frac{\sum(y_i - \hat y_i)^2}{\sum(y_i - \bar y)^2}$$

where $y_i$ is the true value of the target observation, $\hat y_i$ is the predicted value, and $\bar y$ is the mean value for the target vector

The closer $R^2$ is to 1, the more of the variance in the target vector that is explained by the features.

### 11.3 Creating a Baseline Classification Model