# MNIST classification
### Example adopted from Chapter 3 of _the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2_ [Text (early release)](https://icenamor.github.io/files/books/Hands-on-Machine-Learning-with-Scikit-2E.pdf) [GitHub](https://github.com/ageron/handson-ml2)

First, let's import a few common modules, ensure MatplotLib plots figures inline and prepare a function to save the figures. We also check that Python 3.5 or later is installed (although Python 2.x may work, it is deprecated so we strongly recommend you use Python 3 instead), as well as Scikit-Learn ≥0.20.

In [None]:
# Python ≥3.5 is required
import sys
assert sys.version_info >= (3, 5)

# Scikit-Learn ≥0.20 is required
import sklearn
assert sklearn.__version__ >= "0.20"

# Common imports
import numpy as np
import os

# to make this notebook's output stable across runs
np.random.seed(42)

# To plot pretty figures
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
mpl.rc('axes', labelsize=14)
mpl.rc('xtick', labelsize=12)
mpl.rc('ytick', labelsize=12)

# Where to save the figures
PROJECT_ROOT_DIR = "."
CHAPTER_ID = "MNIST-classification"
IMAGES_PATH = os.path.join(PROJECT_ROOT_DIR, "images", CHAPTER_ID)
os.makedirs(IMAGES_PATH, exist_ok=True)

def save_fig(fig_id, tight_layout=True, fig_extension="png", resolution=300):
    path = os.path.join(IMAGES_PATH, fig_id + "." + fig_extension)
    print("Saving figure", fig_id)
    if tight_layout:
        plt.tight_layout()
    plt.savefig(path, format=fig_extension, dpi=resolution)

## MNIST

MNIST is a dataset of 70,000 small images of digits handwritten digits. Each image has 28×28 pixels, thus totol of 784 features. Each feature is a grey level value from 0 - 255

In [None]:
from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784', version=1)  # load dataset from https://openml.org/ 
mnist.keys()

In [None]:
X, y = mnist["data"], mnist["target"]
X.shape

In [None]:
y.shape

In [None]:
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt

some_digit = X[0]
some_digit_image = some_digit.reshape(28, 28)
plt.imshow(some_digit_image, cmap=mpl.cm.binary)
plt.axis("off")

#save_fig("some_digit_plot")
plt.show()

In [None]:
y[0]

Convert the char type into int

In [None]:
y = y.astype(np.uint8)
y[0]

In [None]:
def plot_digit(data):
    image = data.reshape(28, 28)
    plt.imshow(image, cmap = mpl.cm.binary,
               interpolation="nearest")
    plt.axis("off")

In [None]:
# EXTRA
def plot_digits(instances, images_per_row=10, **options):
    size = 28
    images_per_row = min(len(instances), images_per_row)
    images = [instance.reshape(size,size) for instance in instances]
    n_rows = (len(instances) - 1) // images_per_row + 1
    row_images = []
    n_empty = n_rows * images_per_row - len(instances)
    images.append(np.zeros((size, size * n_empty)))
    for row in range(n_rows):
        rimages = images[row * images_per_row : (row + 1) * images_per_row]
        row_images.append(np.concatenate(rimages, axis=1))
    image = np.concatenate(row_images, axis=0)
    plt.imshow(image, cmap = mpl.cm.binary, **options)
    plt.axis("off")

In [None]:
plt.figure(figsize=(9,9))
example_images = X[:100]
plot_digits(example_images, images_per_row=10)
save_fig("more_digits_plot")
plt.show()

The MNIST dataset is already split into a training set (the first 60,000 images) and a test set (the last 10,000 images)

In [None]:
X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]

## Binary classifier
Implement a _5 detector_

Prepare a data set for binary classification: 5 or not 5

In [None]:
y_train_5 = (y_train == 5)
y_test_5 = (y_test == 5)

## Use the Perceptron model

In [None]:
from sklearn.linear_model import Perceptron

p_clf = Perceptron(tol=1e-3, eta0=0.1, random_state=42)  
p_clf.fit(X_train, y_train_5) 

hyperparameters: tol - the stopping criterion; eta0 - the learning rate; max_iter - optional (default=1000), and more

**Note**: some hyperparameters will have a different defaut value in future versions of Scikit-Learn, such as `max_iter` and `tol`. 

Try predict the first two images

In [None]:
p_clf.predict([X[0], X[1]])

Cross validation and use accuracy scoring

In [None]:
from sklearn.model_selection import cross_val_score
cross_val_score(p_clf, X_train, y_train_5, cv=3, scoring="accuracy")

### Let's try a very dumb classifier

In [None]:
from sklearn.base import BaseEstimator
class Never5Classifier(BaseEstimator):
    def fit(self, X, y=None):
        pass
    def predict(self, X):
        return np.zeros((len(X), 1), dtype=bool)

In [None]:
never_5_clf = Never5Classifier()
cross_val_score(never_5_clf, X_train, y_train_5, cv=3, scoring="accuracy")

### Model Performance Evaluation with precision and recall

In [None]:
cross_val_score(p_clf, X_train, y_train_5, cv=3, scoring="precision")

In [None]:
cross_val_score(p_clf, X_train, y_train_5, cv=3, scoring="recall")

### Confusion Matrix

In [None]:
from sklearn.model_selection import cross_val_predict
from sklearn.metrics import confusion_matrix

y_train_pred = cross_val_predict(p_clf, X_train, y_train_5, cv=3)
confusion_matrix(y_train_5, y_train_pred)

### Evalute the model on the hold-out test data

In [None]:
y_test_pred = p_clf.predict(X_test)

In [None]:
confusion_matrix(y_test_5, y_test_pred)

In [None]:
from sklearn.metrics import accuracy_score, precision_score, recall_score

print("Accuracy score: ", (accuracy_score(y_test_5, y_test_pred)))
print("Precision score: ", (precision_score(y_test_5, y_test_pred)))
print("Recall score: ", (recall_score(y_test_5, y_test_pred)))

In [None]:

#iris = load_iris()
#X = iris.data[:, (2, 3)] # petal length, petal width
#y = (iris.target == 0).astype(np.int) # Iris Setosa?

X, y = mnist["data"], mnist["target"]


import matplotlib as mpl
import matplotlib.pyplot as plt

some_digit = X[0]
some_digit_image = some_digit.reshape(28, 28)
plt.imshow(some_digit_image, cmap = mpl.cm.binary, interpolation="nearest")
plt.axis("off")
plt.show()


y = y.astype(np.uint8)


# putting the image files into variables for test set and training (which has already been set up automatically):
X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]

# TRAINING BINARY CLASSIFIER:
y_train_5 = (y_train == 5)
# 5 vs not-5: True for 5, False for all other digits.
y_test_5 = (y_test == 5)


# PICKING/TRAINING A CLASSIFIER (and training):
# Stochastic Gradient Descent (SGD) classifier

from sklearn.linear_model import SGDClassifier
sgd_clf = SGDClassifier(random_state=42)
sgd_clf.fit(X_train, y_train_5)

sgd_clf.predict([some_digit])

# -------------------------------------------
# performance eval (cross-validation and accuracy):

from sklearn.model_selection import cross_val_score

print('accuracy determined from cross validation: ' + str(cross_val_score(sgd_clf, X_train, y_train_5, cv=3, scoring="accuracy")))
print('precision determined from cross validation: ' + str(cross_val_score(sgd_clf, X_train, y_train_5, cv=3, scoring="precision")))
print('recall determined from cross validation: ' + str(cross_val_score(sgd_clf, X_train, y_train_5, cv=3, scoring="recall")))

# a BETTER way to eval than cross-validation: CONFUSION MATRIX:
from sklearn.model_selection import cross_val_predict
from sklearn.metrics import confusion_matrix

y_train_pred = cross_val_predict(sgd_clf, X_train, y_train_5, cv=3)
confusion_matrix(y_train_5, y_train_pred)


# -------------------------------------------
# Deciding on a Threshold value to determine how/where classes get divided up by examining the precision/recall Tradeoff:

from sklearn.metrics import precision_score, recall_score

precision_score(y_train_5, y_train_pred)
recall_score(y_train_5, y_train_pred)

y_scores = sgd_clf.decision_function([some_digit])

threshold = 8000
y_some_digit_pred = (y_scores > threshold)
y_some_digit_pred



# get decision scores data to compute precision and recall for all possible thresholds-- 
y_scores = cross_val_predict(sgd_clf, X_train, y_train_5, cv=3, method="decision_function")

# --by way of precision_recall_curve() function:

from sklearn.metrics import precision_recall_curve
precisions, recalls, thresholds = precision_recall_curve(y_train_5, y_scores)

def plot_precision_recall_vs_threshold(precisions, recalls, thresholds):
	plt.plot(thresholds, precisions[:-1], "b--", label="Precision")
	plt.plot(thresholds, recalls[:-1], "g-", label="Recall")


plot_precision_recall_vs_threshold(precisions, recalls, thresholds)
plt.show()


# -------------------------------------------

threshold_90_precision = thresholds[np.argmax(precisions >= 0.90)]
y_train_pred_90 = (y_scores >= threshold_90_precision)

print('precision score calculated using 90 percent precision classifier: ' + str(precision_score(y_train_5, y_train_pred_90)))
print('recall score calculated using 90 percent precision classifier: ' + str(recall_score(y_train_5, y_train_pred_90)))

#...We have now finished construction of a 90% precision classifier.

# -------------------------------------------





In [None]:
# here is our work (most of it-- didnt have time to test since every re-test takes ten minutes minimum);
import numpy as np

# fetch dataset:
from sklearn.datasets import fetch_openml

#mnist = fetch_openml('mnist_784', version=1)
mnist = fetch_openml('mnist_784', cache=True, version=1)
mnist.keys()

#iris = load_iris()
#X = iris.data[:, (2, 3)] # petal length, petal width
#y = (iris.target == 0).astype(np.int) # Iris Setosa?

X, y = mnist["data"], mnist["target"]


import matplotlib as mpl
import matplotlib.pyplot as plt

some_digit = X[0]
some_digit_image = some_digit.reshape(28, 28)
plt.imshow(some_digit_image, cmap = mpl.cm.binary, interpolation="nearest")
plt.axis("off")
plt.show()


y = y.astype(np.uint8)


# putting the image files into variables for test set and training (which has already been set up automatically):
X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]

# TRAINING BINARY CLASSIFIER:
y_train_5 = (y_train == 5)
# 5 vs not-5: True for 5, False for all other digits.
y_test_5 = (y_test == 5)


# PICKING/TRAINING A CLASSIFIER (and training):
# Stochastic Gradient Descent (SGD) classifier

from sklearn.linear_model import SGDClassifier
sgd_clf = SGDClassifier(random_state=42)
sgd_clf.fit(X_train, y_train_5)

sgd_clf.predict([some_digit])

# -------------------------------------------
# performance eval (cross-validation and accuracy):

from sklearn.model_selection import cross_val_score

print('accuracy determined from cross validation: ' + str(cross_val_score(sgd_clf, X_train, y_train_5, cv=3, scoring="accuracy")))
print('precision determined from cross validation: ' + str(cross_val_score(sgd_clf, X_train, y_train_5, cv=3, scoring="precision")))
print('recall determined from cross validation: ' + str(cross_val_score(sgd_clf, X_train, y_train_5, cv=3, scoring="recall")))

# a BETTER way to eval than cross-validation: CONFUSION MATRIX:
from sklearn.model_selection import cross_val_predict
from sklearn.metrics import confusion_matrix

y_train_pred = cross_val_predict(sgd_clf, X_train, y_train_5, cv=3)
confusion_matrix(y_train_5, y_train_pred)


# -------------------------------------------
# Deciding on a Threshold value to determine how/where classes get divided up by examining the precision/recall Tradeoff:

from sklearn.metrics import precision_score, recall_score

precision_score(y_train_5, y_train_pred)
recall_score(y_train_5, y_train_pred)

y_scores = sgd_clf.decision_function([some_digit])

threshold = 8000
y_some_digit_pred = (y_scores > threshold)
y_some_digit_pred



# get decision scores data to compute precision and recall for all possible thresholds-- 
y_scores = cross_val_predict(sgd_clf, X_train, y_train_5, cv=3, method="decision_function")

# --by way of precision_recall_curve() function:

from sklearn.metrics import precision_recall_curve
precisions, recalls, thresholds = precision_recall_curve(y_train_5, y_scores)

def plot_precision_recall_vs_threshold(precisions, recalls, thresholds):
	plt.plot(thresholds, precisions[:-1], "b--", label="Precision")
	plt.plot(thresholds, recalls[:-1], "g-", label="Recall")


plot_precision_recall_vs_threshold(precisions, recalls, thresholds)
plt.show()



# -------------------------------------------

threshold_90_precision = thresholds[np.argmax(precisions >= 0.90)]
y_train_pred_90 = (y_scores >= threshold_90_precision)

print('precision score calculated using 90 percent precision classifier: ' + str(precision_score(y_train_5, y_train_pred_90)))
print('recall score calculated using 90 percent precision classifier: ' + str(recall_score(y_train_5, y_train_pred_90)))

#...We have now finished construction of a 90% precision classifier.

# -------------------------------------------






In [None]:
from sklearn.model_selection import cross_val_predict
from sklearn.metrics import confusion_matrix

y_train_pred = cross_val_predict(sgd_clf, X_train, y_train_5, cv=3)
print('Confusion Matrix for stochastic gradient descent: ' + str(confusion_matrix(y_train_5, y_train_pred)))

c_matrix1 = confusion_matrix(y_train_5, y_train_pred)

# TP FP
# FN TN

TP1 = c_matrix[0][0]
FP1 = c_matrix[0][1]
FN1 = c_matrix[1][0]
TN1 = c_matrix[1][1]

matrix_accuracy = (TP1+TN1) / (TP1+FP1+FN1+TN1)
print('accuracy calculated from confusion matrix was: ' + str(matrix_accuracy))



### Exercise 1: Use a Stochastic Gradient Descent Classifier and evalute the model performance. Evalue the accuracy, precision and recall scores with cross validation. Print the confusion matrix. Try differnt hyperparameters and what is the best model you can get. 

In [None]:
# resulting output from previous tests was as follows:
#accuracy determined from cross validation: [0.95035 0.96035 0.9604 ]
#precision determined from cross validation: [0.95936795 0.89060092 0.74963109]
#recall determined from cross validation: [0.47039292 0.63973437 0.84338683]
#precision score calculated using 90 percent precision classifier: 0.9000345901072293
#recall score calculated using 90 percent precision classifier: 0.4799852425751706

from sklearn.model_selection import cross_val_predict
from sklearn.metrics import confusion_matrix



### Exercise 2: Apply feature standardization to see if such feature transformation can improve the performance of the Stochastic Gradient Descent on the test data. Explain your findings. 

In [1]:
from sklearn.preprocessing import StandardScaler

sc = StandardScaler() #center the distribution around zero (mean), with a standard deviation of 1.
sc.fit(X_train)
X_train_std = sc.transform(X_train)
X_test_std = sc.transform(X_test)
