- In the MNIST database, pick a random 5K data points for training, and random 1K data points as testing. Note that, as we did in class, in MNIST the first 60,000 points are the training sample and the last 10,000 points are the test sample. So your 5K random training sample has to come from the first 60,000. And the 1K test sample has to come from the last 10,000 points.

(Hint: you can use any approach to select random data, but one approach could be: do the permutation like usual, and then pick the "first 5k" (:5000) as train. You will have to also carefully do the permutation for the test to pick the first 1k from the permuted last 10,000 as the test sample).

- Build two binary models using any classifier on the "training data" one to predict 5 or not-5, and the other to predict 6 or not-6.
- Compare these two models based on their respective confusion matrixes on the "test data".
- Which of the two models is better based on the confusion matrix?
(Note: if it is easier for you to do steps 1-4 above on the entire dataset (60k training, and 10k test), then you will get 75% of the credit for this question. Use this option only if you are unable to create a random subset of 5k train and 1k test).

In [28]:
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import sklearn
from sklearn.linear_model import SGDClassifier
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import cross_val_predict

In [13]:
try:
    from sklearn.datasets import fetch_openml
    mnist = fetch_openml('mnist_784', version=1, cache=True)
    mnist.target = mnist.target.astype(np.int8) # fetch_openml() returns targets as strings
    #sort_by_target(mnist) # fetch_openml() returns an unsorted dataset
except ImportError:
    from sklearn.datasets import fetch_mldata
    mnist = fetch_mldata('MNIST original') 

In [14]:
X, y = mnist["data"], mnist["target"]

In [15]:
X_train, X_test, y_train, y_test = X[:60000], X[60000:],y[:60000], y[60000:]

In [16]:
shuffle_index_train = np.random.permutation(5000)
shuffle_index_test = np.random.permutation(1000)

In [19]:
X_train, y_train = X_train[shuffle_index_train], y_train[shuffle_index_train]

In [24]:
X_test, y_test = X_train[shuffle_index_test], y_train[shuffle_index_test]

In [25]:
X_test.shape

(1000, 784)

In [27]:
y_test.shape

(1000,)

**SGD Classifier Model for predicting if the number is 5 or not 5**

In [30]:
sgd = SGDClassifier(random_state=42)
y_train_5 = (y_train == 5)
y_test_5 = (y_test == 5)
sgd.fit(X_train, y_train_5)
sgd.predict(X_test)

array([False, False, False, False, False, False, False, False, False,
       False, False,  True, False, False, False, False, False, False,
        True, False, False, False, False, False, False, False, False,
       False, False, False, False,  True, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False,  True, False,
       False, False, False, False, False, False, False, False, False,
       False, False,  True, False, False, False, False, False, False,
       False, False, False, False,  True, False, False,  True, False,
       False, False, False, False, False, False, False, False,  True,
       False, False, False, False, False,  True, False, False, False,
       False,  True, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False,

In [38]:
y_train_5_pred = cross_val_predict(sgd, X_train, y_train_5, cv=3)
y_test_5_pred = cross_val_predict(sgd, X_test, y_test_5, cv=3)

**Confusion Matrix for test data**

In [39]:
confusion_matrix(y_test_5, y_test_5_pred)

array([[886,  26],
       [ 29,  59]])

**SGD Classifier Model for predicting if the number is 6 or not 6**

In [34]:
sgd_6 = SGDClassifier(random_state=42)
y_train_6 = (y_train == 6)
y_test_6 = (y_test == 6)
sgd_6.fit(X_train, y_train_6)
sgd_6.predict(X_test)

array([ True, False, False,  True,  True, False,  True, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False,  True, False, False, False, False, False, False,
       False, False,  True,  True, False, False, False,  True, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False,  True, False, False, False,  True, False, False, False,
        True, False, False,  True, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False,  True, False, False,
       False, False,  True, False, False, False, False, False,  True,
       False,  True, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False,  True,

In [41]:
y_train_6_pred = cross_val_predict(sgd, X_train, y_train_6, cv=3)
y_test_6_pred = cross_val_predict(sgd, X_test, y_test_6, cv=3)

**Confusion Matrix for test data**

In [42]:
confusion_matrix(y_test_6, y_test_6_pred)

array([[874,  13],
       [ 11, 102]])

## Comparing both the confusion matrix

**Confusion Matrix for Predicting 5**
- From the above confusion matrix, The first row considers non 5 images, we can say that 886 were correctly classified as non 5s ( true negatives ) and 26 were wrongly classified as 5s (False positives).
- The second row considers the images of 5s: 29 were wrongly classified as non 5s(false negatives) and 59 were correctly classified as 5s (true positives).

**Confusion Matrix for Predicting 6 or not**
- From the above confusion matrix, The first row considers non 6 images, we can say that 874 were correctly classified as non 6s ( true negatives ) and 13 were wrongly classified as 6s (False positives).
- The second row considers the images of 6s: 11 were wrongly classified as non 6s(false negatives) and 102 were correctly classified as 6s (true positives).

We can compare and say that the accuracy for first model  is 0.945 and accuracy for the second model 0.976.So, the model for predicting 6 or not is better than the previous model.