> This is a self-correcting activity generated by [nbgrader](https://nbgrader.readthedocs.io). Fill in any place that says `YOUR CODE HERE` or `YOUR ANSWER HERE`. Run subsequent cells to check your code.

---

# Classify handwritten digits

In this activity, you'll try several classifiers on the [UCI handwritten digits dataset](https://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits), either separately or into an ensemble.

![UCI digits](images/uci_digits.png)

## Environment setup

In [24]:
# Import base packages
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

In [25]:
# Setup plots
%matplotlib inline
plt.rcParams['figure.figsize'] = 10, 8
%config InlineBackend.figure_format = 'retina'
sns.set()

In [26]:
# Import ML packages
import sklearn
print(f'scikit-learn version: {sklearn.__version__}')

from sklearn.datasets import load_digits
from sklearn.linear_model import SGDClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.model_selection import train_test_split

scikit-learn version: 0.23.2


## Step 1: Loading and preparing the data

In [27]:
# Load the MNIST digits dataset
digits = load_digits()

# To apply a classifier on this data, we need to flatten the image, to
# turn the data in a (samples, feature) matrix:
n_samples = len(digits.images)
data = digits.images.reshape((n_samples, -1))
print(f'data: {data.shape}. targets: {digits.target.shape}')

data: (1797, 64). targets: (1797,)


### Question

Split the data into training, validation and test sets.

In [28]:
# YOUR CODE HERE
x_train, x_test, y_train, y_test = train_test_split(
    data, digits.target, test_size=0.2)

# Set apart the first 200 images as validation data
x_val, x_train = x_train[:200], x_train[200:]
y_val, y_train = y_train[:200], y_train[200:]

In [29]:
print(f'x_train: {x_train.shape}. x_val: {x_val.shape}. x_test: {x_test.shape}')
print(f'y_train: {y_train.shape}. y_val: {y_val.shape}. y_test: {y_test.shape}')

assert x_train.shape == (1237, 64)
assert x_val.shape == (200, 64)
assert x_test.shape == (360, 64)
assert y_train.shape == (1237,)
assert y_val.shape == (200,)
assert y_test.shape == (360,)

x_train: (1237, 64). x_val: (200, 64). x_test: (360, 64)
y_train: (1237,). y_val: (200,). y_test: (360,)


## Step 2: Train several models

### Question

Create and train various models, such as a linear classifier, a multilayer perceptron, a random forest...

In [30]:
# YOUR CODE HERE
sgd_clf = SGDClassifier(loss='log', random_state=42)
mlp_clf = MLPClassifier(random_state=42)
rf_clf = RandomForestClassifier(n_estimators=100, random_state=42)

models = [sgd_clf, mlp_clf, rf_clf]

for model in models:
    print("Training the", model)
    model.fit(x_train, y_train)

Training the SGDClassifier(loss=&#39;log&#39;, random_state=42)
Training the MLPClassifier(random_state=42)
Training the RandomForestClassifier(random_state=42)


### Question

Display the default score for each model.

In [31]:
# YOUR CODE HERE
for model in models:
    print(model.score(x_val, y_val))

0.94
0.98
0.96


### Question

Create a `VotingClassifier` including all your models. Fit it on the training data.

In [32]:
# YOUR CODE HERE
voting_clf = VotingClassifier([
    ("sgd_clf", sgd_clf),
    ("mlp_clf", mlp_clf),
    ("rf_clf", rf_clf),
])
voting_clf.fit(x_train, y_train)

VotingClassifier(estimators=[(&#39;sgd_clf&#39;,
                              SGDClassifier(loss=&#39;log&#39;, random_state=42)),
                             (&#39;mlp_clf&#39;, MLPClassifier(random_state=42)),
                             (&#39;rf_clf&#39;,
                              RandomForestClassifier(random_state=42))])

### Question

Show the `VotingClassifier` score and compare to each model's individual score.

In [33]:
# YOUR CODE HERE
voting_clf.score(x_val, y_val)

0.98

### Question

Show the score for a soft voting classifier.

In [34]:
# YOUR CODE HERE
voting_clf.voting = "soft"
voting_clf.score(x_val, y_val)

0.96

### Question

Compute the `VotingClassifier` score on the test data. Compare it to each model's individual score.

In [35]:
# YOUR CODE HERE
print(voting_clf.score(x_test, y_test), "\n")
for model in models:
    print(model.score(x_test, y_test))

0.9861111111111112 

0.9666666666666667
0.9805555555555555
0.9777777777777777
