# Chapter 6 - Learning Best Practices for Model Evaluation and Hyperparameter Tuning

### Overview

- [Streaming workflows with pipelines](#Streaming-workflows-with-pipelines)
    - [Loading the Breast Cancer Wisconsin dataset](#Loading-the-Breast-Cancer-Wisconsin-dataset)
    - [Combining transformers and estimators in a pipeline](#Combining-transformers-and-estimatimators-in-a-pipeline)
- [Using k-fold cross-validation to asses model performance](#Using-k-fold-cross-validation-to-asses-model-performance)
  - [The holdout method](#The-holdout-method)
  - [K-fold cross-validation](#K-fold-cross-validation)
- [Debugging algorithms with learning and validation curves](#Debugging-algorithms-with-learning-and-validation-curves)
  - [Diagnosing bias and variance problems with learning curves](#Diagnosing-bias-and-variance-problems-with-learning-curves)
  - [Addressing overfitting and underfitting with validation curves](#Addressing-overfitting-and-underfitting-with-validation-curves)
- [Fine-tuning machine learning models via grid search](#Fine-tuning-machine-learning-models-via-grid-search)
  - [Tuning hyperparameters via grid search](#Tuning-hyperparameters-via-grid-search)
  - [Exploring hyperparameter configurations more widely with randomized search](#Exploring-hyperparameter-configurations-more-widely-with-randomized-search)
  - [More resource-efficient hyperparameter search with successive
halving](#More-resource-efficient-hyperparameter-search-with-successive-halving)
  - [Algorithm selection with nested cross-validation](#Algorithm-selection-with-nested-cross-validation)
- [Looking at different performance evaluation metrics](#Looking-at-different-performance-evaluation-metrics)
  - [Reading a confusion matrix](#Reading-a-confusion-matrix)
  - [Optimizing the precision and recall of a classification model](#Optimizing-the-precision-and-recall-of-a-classification-model)
  - [Plotting a receiver operating characteristic](#Plotting-a-receiver-operating-characteristic)
  - [The scoring metrics for multiclass classification](#The-scoring-metrics-for-multiclass-classification)
- [Dealing with class imbalance](#Dealing-with-class-imbalance)
- [Summary](#Summary)

# Streaming workflows with pipelines

## Loading the Breast Cancer Wisconsin dataset

In [1]:
import pandas as pd

df = pd.read_csv('https://archive.ics.uci.edu/ml/'
                 'machine-learning-databases'
                 '/breast-cancer-wisconsin/wdbc.data', header=None)

df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,22,23,24,25,26,27,28,29,30,31
0,842302,M,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,...,25.38,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189
1,842517,M,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,...,24.99,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902
2,84300903,M,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,...,23.57,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758
3,84348301,M,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,...,14.91,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173
4,84358402,M,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,...,22.54,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678


In [2]:
df.shape

(569, 32)

In [3]:
from sklearn.preprocessing import LabelEncoder

X = df.loc[:, 2:].values
y = df.loc[:, 1].values
le = LabelEncoder()
y = le.fit_transform(y)
le.classes_

array(['B', 'M'], dtype=object)

In [4]:
le.transform(['M', 'B'])

array([1, 0])

In [5]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    test_size=0.2,
                                                    stratify=y,
                                                    random_state=1)

## Combining transformers and estimators in a pipeline

In [6]:
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline

pipe_lr = make_pipeline(StandardScaler(),
                        PCA(n_components=2),
                        LogisticRegression())

pipe_lr.fit(X_train, y_train)
y_pred = pipe_lr.predict(X_test)
test_acc = pipe_lr.score(X_test, y_test)
print(f'Test accuracy: {test_acc:.3f}')

Test accuracy: 0.956


In [7]:
from sklearn.ensemble import RandomForestClassifier

pipe_tr = make_pipeline(PCA(n_components=2),
                        RandomForestClassifier(n_estimators=10,
                                               random_state=1,
                                               n_jobs=4))
pipe_tr.fit(X_train, y_train)
y_pred_tr = pipe_tr.predict(X_test)
test_acc_tr = pipe_tr.score(X_test, y_test)
print(f'Test accuracy: {test_acc_tr:.3f}')

Test accuracy: 0.939


In [8]:
forest = RandomForestClassifier(n_estimators=76,
                                criterion='gini',
                                random_state=1,
                                n_jobs=4)
forest.fit(X_train, y_train)
test_acc_fr = forest.score(X_test, y_test)
print(f'Test accuracy: {test_acc_fr:.3f}')

Test accuracy: 0.956


# Using k-fold cross-validation to asses model performance

## The holdout method

## K-fold cross-validation

In [9]:
import numpy as np
from sklearn.model_selection import StratifiedKFold

kfold = StratifiedKFold(n_splits=10).split(X_train, y_train)

scores = []
for k, (train, test) in enumerate(kfold):
    pipe_lr.fit(X_train[train], y_train[train])
    score = pipe_lr.score(X_train[test], y_train[test])
    scores.append(score)
    
    print(f'Fold: {k+1:02d}, '
          f'Class distr.: {np.bincount(y_train[train])}, '
          f'Acc.: {score:.3f}')
    
mean_acc = np.mean(scores)
std_acc = np.std(scores)
print(f'\nCV accuracy: {mean_acc:.3f} +/- {std_acc:.3f}')

Fold: 01, Class distr.: [256 153], Acc.: 0.935
Fold: 02, Class distr.: [256 153], Acc.: 0.935
Fold: 03, Class distr.: [256 153], Acc.: 0.957
Fold: 04, Class distr.: [256 153], Acc.: 0.957
Fold: 05, Class distr.: [256 153], Acc.: 0.935
Fold: 06, Class distr.: [257 153], Acc.: 0.956
Fold: 07, Class distr.: [257 153], Acc.: 0.978
Fold: 08, Class distr.: [257 153], Acc.: 0.933
Fold: 09, Class distr.: [257 153], Acc.: 0.956
Fold: 10, Class distr.: [257 153], Acc.: 0.956

CV accuracy: 0.950 +/- 0.014


In [10]:
from sklearn.model_selection import cross_val_score

scores = cross_val_score(estimator=pipe_lr,
                         X=X_train,
                         y=y_train,
                         cv=10,
                         n_jobs=-1)
print(f'CV accuracy scores: {scores}')
print(f'CV accuracy: {np.mean(scores):.3f} '
      f'+/- {np.std(scores):.3f}')

CV accuracy scores: [0.93478261 0.93478261 0.95652174 0.95652174 0.93478261 0.95555556
 0.97777778 0.93333333 0.95555556 0.95555556]
CV accuracy: 0.950 +/- 0.014


### 76/77 estimators seems to be about optimal on the RandomForestClassifier with K-Fold cross-validation using gini. With Entropy that changes to 101/102.


In [11]:
scores_f = cross_val_score(estimator=forest,
                           X=X_train,
                           y=y_train,
                           cv=10,
                           n_jobs=-1)
print(f'CV accuracy scores: {scores_f}')
print(f'CV accuracy: {np.mean(scores_f):.3f} '
      f'+/- {np.std(scores_f):.3f}')

CV accuracy scores: [0.95652174 0.97826087 0.97826087 0.97826087 0.95652174 0.95555556
 0.95555556 0.93333333 1.         0.95555556]
CV accuracy: 0.965 +/- 0.018


In [12]:
kfold_f = StratifiedKFold(n_splits=10).split(X_train, y_train)

scores = []
for k, (train, test) in enumerate(kfold_f):
    forest.fit(X_train[train], y_train[train])
    score = forest.score(X_train[test], y_train[test])
    scores.append(score)
    
    print(f'Fold: {k+1:02d}, '
          f'Class distr.: {np.bincount(y_train[train])}, '
          f'Acc.: {score:.3f}')
    
mean_acc = np.mean(scores)
std_acc = np.std(scores)
print(f'\nCV accuracy: {mean_acc:.3f} +/- {std_acc:.3f}')

Fold: 01, Class distr.: [256 153], Acc.: 0.957
Fold: 02, Class distr.: [256 153], Acc.: 0.978
Fold: 03, Class distr.: [256 153], Acc.: 0.978
Fold: 04, Class distr.: [256 153], Acc.: 0.978
Fold: 05, Class distr.: [256 153], Acc.: 0.957
Fold: 06, Class distr.: [257 153], Acc.: 0.956
Fold: 07, Class distr.: [257 153], Acc.: 0.956
Fold: 08, Class distr.: [257 153], Acc.: 0.933
Fold: 09, Class distr.: [257 153], Acc.: 1.000
Fold: 10, Class distr.: [257 153], Acc.: 0.956

CV accuracy: 0.965 +/- 0.018


# Debugging algorithms with learning and validation curves

## Diagnosing bias and variance problems with learning curves

## Addressing overfitting and underfitting with validation curves