*AI/ML Society 2nd October 2019*
# Introduction to Machine Learning:
## *The MNIST handwritten digits Dataset*

This data set was made available by Yann LeCun. This tutorial is based of ressouces from scikit-learn.org.*

### Multiple libraries required to import, visualize and treat the data

In [0]:
import matplotlib.pyplot as plt 
import numpy as np
from sklearn import datasets

## 1) Import the data

#### *Import the data and store them in 2 variables: The matrix with the image and the corresponding value*

In [0]:
from sklearn import datasets
Input_images = datasets.load_digits()
X = Input_images.images
y = Input_images.target

#### *Visualize how the data is stored*

In [0]:
i = 80
print (X[i])
print ('The corresponding value is: \n', y[i])

#### *Show what this corresponds to*

> To store an image on a grey scale, all you have to do is associate a value to each of it's pixels.
Here we use matplotlib to represent this matrix of numbers as the image it corresponds to.

In [0]:
plt.imshow(X[i], cmap=plt.cm.gray_r)
print("\n",y[i])

## 2) Process the data

#### *We need to flatten the matrix: Instead of having the image in a 8x8 format, we change it to an array of size 64*

In [0]:
number_of_samples = len(X)
X_reshaped = np.reshape(X,(number_of_samples,64))

#### *Split between train and test*

In [0]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(X_reshaped, y, test_size = 0.7, random_state = 100)

## 3) Training our model


> Here we will use a simple Support Vector Machine for classification to execute the required task. In later workshops we will discuss other modules and compare them.




#### *Import the model chosen: Support Vector Machine*

In [0]:
from sklearn import svm

#### *Create the model*

In [0]:
SVM = svm.SVC(gamma=0.001)

#### *Train the model*

In [0]:
SVM.fit(x_train, y_train)

#### *Predict values with the already trained model*


> For the test data, let's now see what the machine will predict.

In [0]:
y_predicted = SVM.predict(x_test)

#### *Show what these predictions correspond to*

> Let's visualize what predictions were made for our images. However we have already flatten our input data, so we need to put it back to it's orignial form ie. 8x8.

In [0]:
x_test

In [0]:
x_test_reshaped = np.reshape(x_test,(len(x_test),8,8))
i = 20
plt.imshow(x_test_reshaped[i], cmap=plt.cm.gray_r)
print("The algorithm predicted: \n",y_predicted[i])
print("The true value is: \n",y_test[i])


## 4) Get a sense of how well our algorithm did

> This section is vital to evaluate our model and see if he is viable. It will compare the predicted values with the true value. The two metrics shown here will be discussed later on in the semester.

#### *Import Mean square error and confusion matrix to try to get a taste of how good the algorithm did*


In [0]:
from sklearn.metrics import mean_squared_error
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score

#### *Display these metrics*

In [0]:
percentage_accuracy = accuracy_score(y_predicted,y_test)*100
print ("Percentage accuracy of the Support Vector Machine is: %s," %percentage_accuracy )

mean_squared_error = mean_squared_error(y_test, y_predicted)
print ("Mean squared error of the Support Vector Machine is: %s,"%mean_squared_error)

confusion_matrix = confusion_matrix(y_test, y_predicted)
print ("The Confusion matrix is: \n",confusion_matrix)
