# TP5: Multi-Class Perceptron

---


## Predicting the digits from handwritten digits.

**Dataset**: We will use the MNIST database of handwritten digits which has a training set of 60,000 examples, and a test set of 10,000 examples. The digits have been size-normalized and centered in a fixed-size image.



In [None]:
# Common imports
import numpy as np
import os
import io
import warnings

# Specific imports
from sklearn.preprocessing import MinMaxScaler

#from sklearn.pipeline import Pipeline

#from sklearn.dummy import DummyClassifier

from sklearn.linear_model import Perceptron

from sklearn.metrics import hinge_loss
from sklearn.metrics import ConfusionMatrixDisplay, confusion_matrix, precision_recall_curve
from sklearn.metrics import precision_score, recall_score, classification_report
from sklearn.metrics import make_scorer

#from sklearn.model_selection import cross_validate, cross_val_predict, GridSearchCV

#from pprint import pprint
import matplotlib

import matplotlib.pyplot as plt
import seaborn as sns

# Basic Importing and Plotting

* MNIST set is a large collection of handwritten digits. It is a very popular dataset in the field of image processing. It is often used for benchmarking machine learning algorithms.

* MNIST is short for Modified National Institute of Standards and Technology database.

* MNIST contains a collection of 70,000, 28 x 28 **graysclae** images of handwritten digits from 0 to 9.

* The dataset is already divided into training and testing sets. We will see this later in the tutorial.

* We are going to import the dataset from Keras.


In [None]:
from keras.datasets import mnist

The statement `from keras.datasets import mnist` is a Python code that imports the MNIST dataset from the Keras library.

In [None]:
(train_X, train_y), (test_X, test_y) = mnist.load_data()

The statement `(train_X, train_y), (test_X, test_y) = mnist.load_data()` loads the MNIST dataset and assigns the training set and test set to the variables `train_X`, `train_y`, `test_X`, and `test_y`.

Let’s find out how many images are there in the training and testing sets. In other words, let’s try and find out the split ratio of the this dataset.

In [None]:
print('X_train: ' + str(train_X.shape))
print('Y_train: ' + str(train_y.shape))
print('X_test:  '  + str(test_X.shape))
print('Y_test:  '  + str(test_y.shape))

We can see that there are 60k images in the training set and 10k images in the testing set.

The dimension of our training vector is (60000, 28, 28), this is because there are 60,000 grayscale images with the dimension 28X28.

# Plotting the MNIST dataset using matplotlib

In [None]:
import matplotlib.pyplot as plt
for i in range(9):
 plt.subplot(330 + 1 + i)
 plt.imshow(train_X[i], cmap=plt.get_cmap('gray'))
plt.show()

The code snippet imports the `pyplot` module from the `matplotlib` library and then plots the first 9 images from the MNIST training set.

The for loop iterates through the range of numbers from 0 to 8 (i.e., 9 images) and creates a grid of subplots using the `plt.subplot()` function.

The `plt.imshow()` function is used to display each image on its corresponding subplot, with the `cmap` parameter set to '`gray`' to display the image in grayscale.

Finally, the `plt.show()` function is used to display the entire plot on the screen.

# Yet another way to import the dataset!

In [None]:
from sklearn.datasets import fetch_openml
X, y = fetch_openml('mnist_784', version=1, return_X_y=True)

This code imports the `fetch_openml` function from the `sklearn.datasets` module and then uses it to load the MNIST dataset.

The `fetch_openml()` function is a scikit-learn utility function that can be used to load datasets from the OpenML platform.

The function takes three arguments:

* The first argument (`'mnist_784'`) is the name of the dataset to load. In this case, we are loading the MNIST dataset, which contains 784 features (28 x 28 pixels) for each image.
* The second argument (`version=1`) specifies the version of the dataset to load.
* The third argument (`return_X_y=True`) specifies that we want to return both the input data (`X`) and the target labels (`y`) as separate variables.

After the `fetch_openml()` function loads the dataset, it returns two variables: `X` and `y`. `X` is a 2D array of shape (70000, 784), where each row represents an image and each column represents a pixel value. `y` is a 1D array of shape (70000,), where each element represents the corresponding label (i.e., the digit) for the corresponding image in `X`.


In [None]:
target_names = np.unique(y)
print('Number of samples: {0}, type:{1}'.format(X.shape[0], type(X)))
print('Number of features: {0}'.format(X.shape[1]))
print("Minimum:{0}, Maximum:{1}".format(np.min(X), np.max(X)))
print('Number of classes: {0}, type:{1}'.format(len(target_names), y.dtype))
print('Labels: {0}'.format(target_names))

This code snippet performs several operations to explore and understand the MNIST dataset loaded earlier.

* The first line uses the `numpy` module to extract the unique labels from the `y` array and assign them to the variable `target_names`.

* The next line prints out the number of samples in the dataset (`X.shape[0]`) and the data type of `X`.

* The following line prints out the number of features in the dataset (`X.shape[1]`), which corresponds to the number of pixels in each image.

* The fourth line prints out the minimum and maximum pixel values in the dataset using the `np.min()` and `np.max()` functions.

* The fifth line prints out the number of unique classes in the dataset (`len(target_names)`) and the data type of the labels (`y.dtype`).

* Finally, the last line prints out the unique labels in the dataset (i.e., the digits from 0 to 9).

## Scaling the data

In [None]:
X = MinMaxScaler().fit_transform(X)
print("Minimum:{0}, Maximum:{1}".format(np.min(X), np.max(X)))

This code snippet applies feature scaling to the input data `X` using the `MinMaxScaler` class from the `sklearn.preprocessing` module.

The `fit_transform()` method of `MinMaxScaler` fits the scaler to the data and then applies a linear transformation to scale the data to the range [0, 1]. The resulting scaled data is assigned back to the variable `X`.

The second line prints out the minimum and maximum values in the scaled data to confirm that the values now fall within the desired range.

## Splitting the data

In [None]:
X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]

This code snippet splits the MNIST dataset into training and testing sets.

The first line creates two variables, `X_train` and `X_test`, by slicing the input data array `X`. The first slice `X[:60000]` contains the first 60,000 rows of `X`, which correspond to the training data, and the second slice `X[60000:]` contains the remaining rows, which correspond to the testing data.

Similarly, the second line creates two variables, `y_train` and `y_test`, by slicing the label array `y`. The first slice `y[:60000]` contains the first 60,000 elements of `y`, which correspond to the training labels, and the second slice` y[60000:]` contains the remaining elements, which correspond to the testing labels.

In [None]:
plt.figure(figsize=(10,4))
sns.histplot(data=np.int8(y_train), binwidth=0.45, bins=11)
plt.xticks(ticks=[0,1,2, 3, 4, 5, 6, 7, 8, 9], labels=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
plt.xlabel('Class')
plt.title('Distribution of Samples')
plt.show()

This code generates a histogram plot using the `Seaborn` library to show the distribution of samples in the training dataset. Here's what each line of the code does:

* Create a new figure with a specified width and height using `plt.figure()`.
* Generate the histogram plot using `sns.histplot()`, specifying the input data as a `NumPy` array of integer values (`np.int8(y_train)`), and setting the bin width and number of bins.
* Set the x-axis tick locations and labels using `plt.xticks()`.
* Set the x-axis label using `plt.xlabel()`.
* Set the plot title using `plt.title()`.
* Show the plot using `plt.show()`.

# Transformation to a Multiple Binary Classification Problems

* Since the original label vector contains 10 classes, we need to modify the number of classes to 2.
* Therefore, the label 0 will be changed to 1 and all other labels(1-9) will be changed to -1.
* We name the label vectors as `y_train_0` and `y_test_0`.

In [None]:
# Initialize new variable names with all -1
y_train_0 = -1*np.ones((len(y_train)))
y_test_0 = -1*np.ones((len(y_test)))

# find indices of digit 0 image
indx_0 = np.where(y_train == '0') # remember original labels are of type str not int
# use those indices to modify y_train_0 & y_test_0
y_train_0[indx_0] =1
indx_0 = np.where(y_test=='0')
y_test_0[indx_0] = 1

This code snippet performs the following tasks:

* Initializes new variables `y_train_0` and `y_test_0` as `NumPy` arrays with the same length as` y_train` and `y_test`, respectively. Each element of these arrays is set to -1.
* Finds the indices of the digit 0 images in` y_train` and `y_test` using `np.where()`. The `where()` function returns the indices of elements in an array that meet a certain condition.
* Modifies the elements of `y_train_0` and `y_test_0` corresponding to the indices of digit 0 images to 1. This is done using array indexing, where the elements of `y_train_0` and `y_test_0` are updated only where the corresponding elements in `y_train` and `y_test` are equal to '0'. The label '0' is a string, so the `where()` function returns the indices of elements where the value is '0' as a string, which are then used to update the corresponding elements in `y_train_0` and `y_test_0` to 1.

In [None]:
plt.figure(figsize=(10,4))
sns.histplot(data=np.int8(y_train_0), binwidth=0.45, bins=2)
plt.xticks(ticks=[0,1], labels=["Non-Zero Digit", "Zero Digit"])
plt.xlabel('Class')
plt.title('Distribution of Samples after Binary Transformation')
plt.show()

This code generates a histogram plot using the Seaborn library to show the distribution of samples in the binary classification problem of identifying digit 0 vs. non-zero digits. Here's what each line of the code does:

* Create a new figure with a specified width and height using `plt.figure()`.
* Generate the histogram plot using `sns.histplot()`, specifying the input data as a `NumPy` array of integer values (`np.int8(y_train_0)`), and setting the bin width and number of bins. Since there are only two classes now (0 and non-zero), the number of bins is set to 2.
* Set the x-axis tick locations and labels using `plt.xticks()`.
* Set the x-axis label using `plt.xlabel()`.
* Set the plot title using `plt.title()`.
* Show the plot using `plt.show()`.

## Creating 10 Perceptrons Automatically

Instead of doing 10 transformations manually, we will use an API that **automatically** creates 10 binary classifiers, converted labels to binary sparse matrix and trained them with the binarized labels!

In [None]:
from sklearn.linear_model import Perceptron
from sklearn.preprocessing import LabelBinarizer

The first line of this code imports the `Perceptron` class from the `linear_model` module of the `scikit-learn` library, which can be used to train a binary classification model using the perceptron algorithm.

The second line imports the `LabelBinarizer` class from the `preprocessing` module of `scikit-learn`. This class can be used to transform a categorical label into a binary label, which is necessary for training a binary classifier with `scikit-learn`. Specifically, `LabelBinarizer` will convert a label vector into a binary matrix, where each column represents a unique class, and each row represents a sample in the data set. The value in each row/column is either 0 or 1, indicating whether the sample belongs to that class or not.

In [None]:
clf = Perceptron(random_state=1729)

This line of code creates a new instance of the `Perceptron` class from `scikit-learn`, and assigns it to a variable called `clf`.

The `random_state` parameter is set to `1729`, which is an arbitrary integer that is used to initialize the random number generator used by the `Perceptron` during training. This ensures that the same random numbers are generated each time the code is run, which can be useful for reproducibility purposes.

In [None]:
# Let's use label binarizer just to see the encoding
y_train_ovr = LabelBinarizer().fit_transform(y_train) # setting sparse_output=True in LabelBinarizer() improves efficiency
for i in range(10):
    print('{0}:{1}'.format(y_train[i], y_train_ovr[i]))

In this code, `LabelBinarizer()` is used to transform the `y_train` labels from strings to binary format. The resulting binary labels are assigned to a new variable `y_train_ovr`.

The `fit_transform()` method of `LabelBinarizer()` is called on `y_train`, which fits the binarizer to the labels and transforms them to binary format. The resulting binary labels are a matrix of shape (n_samples, n_classes), where n_samples is the number of samples in the dataset, and n_classes is the number of unique classes in the labels.

The for loop iterates over the first 10 samples in `y_train`, and prints the original label value, followed by its corresponding binary representation as given by `y_train_ovr`.

* The`y_train_ovr` will be of size of size 60000 x 10.
* The first column will be a (binary) label vector for 0-detector and the next one for 1-detector and so on.

In [None]:
clf.fit(X_train, y_train)

In this code, the `fit()` method of the `Perceptron` classifier (`clf`) is used to train the model on the training data (`X_train` and `y_train`).

The `fit()` method fits the perceptron to the training data by adjusting the model parameters to minimize the training error. During training, the perceptron updates its weights based on the error between the predicted outputs and the true outputs.

Once training is complete, the perceptron will have learned a set of weights that can be used to make predictions on new, unseen data.

* What had actually happened internally was the API automatically created 10 binary classifiers, converted labels to binary sparse matrix and trained them with the binarized labels!
* During the inference time, the input will be passed through all these 10 classifiers and the highest score among the output from the classifiers will be considered as the predicted class.
* To see it in action, let us execute the following lines of code.

In [None]:
print('Shape of Weight matrix:{0} and bias vector:{1}'.format(clf.coef_.shape, clf.intercept_.shape))

* So it is a matrix of size 10 X 784 where each row represents the weights for a single binary classifier.
* Important difference to note is that there is no signum function associated with the perceptron.
* The class of a perceptron that outputs the maximum score for the input sample is considered as the predicted class.

In [None]:
scores = clf.decision_function(X_train[100].reshape(1, -1))
print(scores)
print('The predicted class: ', np.argmax(scores))

In this code, we are using the `decision_function()` method of the trained `Perceptron` classifier (`clf`) to predict the class label of a single instance of the training data (`X_train[100]`).

The `decision_function()` method computes the confidence scores for each class based on the learned model parameters. The score for each class is computed by taking the dot product of the input features and the learned weights, plus the bias term.

The output of `decision_function()` is a 1D array of scores for each class. The highest score indicates the predicted class label.

In this case, `X_train[100]` is a single 784-dimensional instance of an image in the training set, so we need to reshape it to a 2D array of shape (1, 784) to be able to pass it to the `decision_function()` method.

The output of `decision_function()` is printed, as well as the predicted class label (the index with the highest score) using `np.argmax()`.

In [None]:
# get the prediction for all test samples
y_pred = clf.predict(X_test)

This line of code is using the trained `Perceptron` classifier `clf` to make predictions on the test dataset `X_test`. The `predict()` method of the Perceptron classifier is called on `X_test`, which returns an array of predicted class labels for all the samples in the test dataset. This array of predicted class labels is then assigned to the variable `y_pred`.

In [None]:
print(classification_report(y_test, y_pred))

`classification_report()` is a function from the `sklearn.metrics` module that is used to generate a report of the precision, recall, F1-score, and support for each class in a classification problem.


In [None]:
cm_display = ConfusionMatrixDisplay.from_predictions(y_test, y_pred,values_format='.5g')

The code `cm_display = ConfusionMatrixDisplay.from_predictions(y_test, y_pred,values_format='.5g')` generates a confusion matrix display object using the predicted and true labels for the test dataset. The `ConfusionMatrixDisplay` class is part of the `sklearn.metrics` module and provides a way to visualize confusion matrices. The `from_predictions` method of this class takes the true labels `y_test`, predicted labels `y_pred`, and an optional `values_format` argument that specifies the format string for displaying the values in the matrix. In this case, `.5g` format is used to display values with 5 significant digits. The resulting `cm_display` object can be plotted using `matplotlib` to display the confusion matrix.