In [None]:
# Import some basic libraries
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set_context('paper')

# Hands-on Activity 16.5: Multi-class logistic regression

## Objectives

+ To demonstrate multi-class logistic regression

## Handwritten Digits

We will demonstrate multi-class logistic regression using a handwritten digits dataset.
The data are in scikit-learn and our example follows very closely [this example](https://scikit-learn.org/stable/auto_examples/classification/plot_digits_classification.html).

First, let's load the dataset.

In [None]:
from sklearn import datasets
# Get the data
digits = datasets.load_digits()
# Here are the description of the dataset:
print(digits.DESCR)

The images are in a 3D array:

In [None]:
print(digits.images.shape)

Each row of this array is an 8x8 image (which is just a matrix).
Here is the first image as just numbers:

In [None]:
print(digits.images[0])

These numbers correspond to the darkness of each pixel. The greater the value the darker the pixel.
Here is how we can visualie the first image:

In [None]:
fig, ax = plt.subplots(dpi=150)
ax.imshow(digits.images[0], cmap=plt.cm.gray_r, interpolation='nearest');

That's clearly a 0. Now each one of the images comes we predetermined labels that we can use to train models.
Here is where you can find the labels:

In [None]:
print(digits.target)

and notice that the first label is a 0, which is great.
Let's now plot several images just to gain some intuition about them:

In [None]:
_, axes = plt.subplots(4, 4)
images_and_labels = list(zip(digits.images, digits.target))
for ax, (image, label) in zip(axes.flatten(), images_and_labels[:16]):
    ax.set_axis_off()
    ax.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
    ax.set_title('Training: %i' % label)

We are going to apply the multi-class logistic regression classifier with 64 linear features, one per pixel.
This assumes that the images are vectorized.
That is, we turn them from $8\times 8$ matrices to $64$-dimensional arrays.
Here is how we can do this:

In [None]:
n_samples = len(digits.images)
data = digits.images.reshape((n_samples, -1))
print(data.shape)

Let's split the dataset into training and validation sets.
We will use the functionality of scikit learn for this:

In [None]:
from sklearn.model_selection import train_test_split
# Here is how the dataset can be split:
X_train, X_test, y_train, y_test = train_test_split(
    data, digits.target, test_size=0.5, shuffle=True)

The model we are going to fit is:
$$
p(y=k|\mathbf{x}, \mathbf{W}) = \operatorname{softmax}_k\left(\mathbf{w}_1^T\mathbf{x},\dots,\mathbf{w}_K^T\mathbf{x}\right),
$$
where $\mathbf{x}$ is the vectorized version of the image.
Let's do it:

In [None]:
from sklearn.linear_model import LogisticRegression

# The classifier object
model = LogisticRegression(max_iter=2000, penalty='none', fit_intercept=True)

# Fit with the training data:
model.fit(X_train, y_train);

Here is how you can get the matrix of all weights $\mathbf{W}$:

In [None]:
print(model.coef_.shape)

Here are point predictions for (picking the label with the highest probability):

In [None]:
predicted = model.predict(X_test)
print('#\tTrue label\tPrediction')
print('-' * 20)
for i in range(10):
    print('{0:d}\t{1:d}\t\t{2:d}'.format(i, y_test[i], predicted[i]))

But we can also make probabilistic predictions:

In [None]:
prob_predict = model.predict_proba(X_test)
# These can be visualized as bars
fig, axes = plt.subplots(10, 2, dpi=150)
for i in range(10):
    axes[i, 0].imshow(X_test[i].reshape((8, 8)), cmap=plt.cm.gray_r, interpolation='nearest')
    axes[i, 0].set_yticks([])
    axes[i, 0].set_xticks([])
    axes[i, 1].set_xticks([])
    axes[i, 1].bar(np.arange(10), prob_predict[i, :])
    axes[i, 1].set_yticks([])
axes[-1, 1].set_xticks(np.arange(10))
axes[-1, 1].set_xticklabels(model.classes_);

Scikit-learn has the capability to run many accuracy metrics at once for you.
Here is everything including the precision matrix:

In [None]:
from sklearn import metrics
print("Classification report for model %s:\n%s\n"
      % (model, metrics.classification_report(y_test, predicted)))
fig, ax = plt.subplots(dpi=150)
disp = metrics.plot_confusion_matrix(classifier, X_test, y_test, ax=ax)
disp.figure_.suptitle("Confusion Matrix")
print("Confusion matrix:\n%s" % disp.confusion_matrix)

### Questions

+ Look at the precision matrix carefully and identify the digits for which the most mistakes are made. Why does this happen? Write code below to visualize some of the wrong predictions.

In [None]:
# Your code below this point