## Python-MLearning: Digits recognition using Neural Network (NN) and Sklearn Library

## Model: Digits 0-9 approach using K fold cross-validation

By: Hector Alvaro Rojas &nbsp;&nbsp;|&nbsp;&nbsp; Data Science, Visualizations and Applied Statistics &nbsp;&nbsp;|&nbsp;&nbsp; April 17, 2018<br>
    Url: [http://www.arqmain.net]   &nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;   GitHub: [https://github.com/arqmain]
    <hr>

## I IMPORT REQUIRED PACKAGES

In [16]:
%matplotlib inline
import os
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
from sklearn.model_selection import KFold, cross_val_score, cross_val_predict
from sklearn import metrics
from datetime import datetime


## II LOADING DATA

In [17]:
#Checking working directory
# import os
os.getcwd()

'C:\\Users\\Alvaro\\Documents\\R-Python-Projects_April042018\\Python_Projects\\Machine-Learning\\NNetwork\\NN2\\Backup-Python'

In [3]:
#List files in a directory
os.listdir()

['.ipynb_checkpoints',
 'For  RandomizedSearchCV July172018.txt',
 'mnist_My.csv',
 'Nueva carpeta',
 'PYTHON-MLearning_NN2.ipynb',
 'PYTHON-MLearning_NN2_GridSearchCV.ipynb',
 'PYTHON-MLearning_NN2_KFold.ipynb',
 'PYTHON-MLearning_NN2_RandomizedSearchCV.ipynb']

In [18]:
# read csv (comma separated value) into data
startTime = datetime.now()
data=pd.read_csv('mnist_My.csv')
df=data
df.columns
print ('Total running time (H: M: S. ThS)', datetime.now()-startTime, 'seconds.')

Total running time (H: M: S. ThS) 0:00:13.550775 seconds.


Now, we will do a grid searching in order to get an adequate NN model to be fitted to the data. There are various options associated with NN classification object, like "activation", "Number of Layers" , and "Number of Neurons in a layer" etc. All of this form part of the tune possibilities of the model.  You can view the full list of tunable parameters [here](http://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier).

To present the k fold cross-validation method, we will set up values only for some of the more important parameters of the model .


## III NN MODELING

## Train and Validation Datasets

In [19]:
# train test split
X=data.iloc[:,1:]
y=data.iloc[:,0]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=4)

# Train and Test dataset size details
print("X Shape :: ", X.shape)
print("y :: ", y.shape)
print("X_train Shape :: ", X_train.shape)
print("y_train Shape :: ", y_train.shape)
print("X_test Shape :: ", X_test.shape)
print("y_test Shape :: ", y_test.shape)


X Shape ::  (70000, 784)
y ::  (70000,)
X_train Shape ::  (56000, 784)
y_train Shape ::  (56000,)
X_test Shape ::  (14000, 784)
y_test Shape ::  (14000,)


## Build Model

### Fit the model and evaluate it

#### Fitting the Model

In [20]:
# Fitting NN model
from sklearn.neural_network import MLPClassifier
startTime = datetime.now()
mlp = MLPClassifier(solver='lbfgs', activation = 'logistic', hidden_layer_sizes=(16,16))
mlp.fit(X_train, y_train)
print ('Total running time (H: M: S. ThS)', datetime.now()-startTime, 'seconds.')


Total running time (H: M: S. ThS) 0:05:26.518676 seconds.


#### Evaluating the Model

In [21]:
# Perform 10-fold cross validation
startTime = datetime.now()
scores = cross_val_score(mlp, X_train, y_train, cv=3)
print ("Cross-validated scores:", scores)
print ('Total running time (H: M: S. ThS)', datetime.now()-startTime, 'seconds.')


Cross-validated scores: [0.85360758 0.81469974 0.86530219]
Total running time (H: M: S. ThS) 0:10:27.429887 seconds.


In [24]:
print("Accuracy: %0.4f   StDev: %0.4f  2StDev: (+/- %0.4f)" % (scores.mean(), scores.std(),scores.std() * 2))
#Accuracy: 0.98 (+/- 0.03)


Accuracy: 0.8445   StDev: 0.0216  2StDev: (+/- 0.0433)


In [25]:
predictions = mlp.predict(X_test)
print('Accuracy based on X_test, Y_test: ',accuracy_score(y_test, predictions))
print('')
print('Confusion Matrix:\n ',confusion_matrix(y_test, predictions))
print(classification_report(y_test, predictions))

Accuracy based on X_test, Y_test:  0.8580714285714286

Confusion Matrix:
  [[1282    0    6   11    1   35   26    3   14    0]
 [   0 1530   22    8    3    8    2    4   15    0]
 [  30   11 1143   46   21   12   18   32   54    0]
 [  18    4   65 1146    2   87    7   20   62   13]
 [   7   11   16    0 1081    4   18    4   20  201]
 [  47   10    8   43    3 1038   28   12   63   12]
 [  29    4   26    0   17   22 1305    0    9    1]
 [   4   16   25    5   14    9    1 1265   15   37]
 [  14   23   57   38   17   67   11   15 1068   37]
 [  13    4   14   15  135   17    0   71   38 1155]]
             precision    recall  f1-score   support

          0       0.89      0.93      0.91      1378
          1       0.95      0.96      0.95      1592
          2       0.83      0.84      0.83      1367
          3       0.87      0.80      0.84      1424
          4       0.84      0.79      0.81      1362
          5       0.80      0.82      0.81      1264
          6       0.92

Precision, recall and f1-score are metrics to measure the accuracy of classification models. A general explanation can be got in [Wikipedia](https://en.wikipedia.org/wiki/Evaluation_of_binary_classifiers).

Looks like we misclassified 1987 digit's images, leaving us with a 85.81% accuracy rate (with 86% precision and 86% recall).

If you do want to extract the MLP weights and biases after training your model, you use its public attributes coefs_ and intercepts_.

<b>coefs_</b>is a list of weight matrices, where weight matrix at index i represents the weights between layer i and layer i+1.

<b>intercepts_</b> is a list of bias vectors, where the vector at index i represents the bias values added to layer i+1.

In [26]:
len(mlp.coefs_)

3

In [27]:
mlp.coefs_

[array([[ 0.03244925, -0.00831409, -0.03154424, ...,  0.04295491,
         -0.01714564,  0.00537021],
        [-0.03426557,  0.01500194, -0.02067342, ..., -0.00354121,
         -0.0210884 , -0.00973014],
        [ 0.00945931,  0.00598189,  0.0456612 , ..., -0.00076483,
         -0.01948036,  0.02889062],
        ...,
        [ 0.01956118,  0.04396216,  0.03116761, ...,  0.04457343,
         -0.0202273 ,  0.03332531],
        [ 0.02083884, -0.0217364 ,  0.04130112, ..., -0.04610477,
         -0.01940277,  0.01405071],
        [ 0.0088554 , -0.01302614, -0.04121046, ..., -0.00144927,
          0.03681263,  0.02309513]]),
 array([[-1.45326529e+00, -1.48109285e+00, -5.16223918e-01,
         -6.07205950e-01,  1.56829362e+00, -8.88700218e-01,
          2.06141152e+00,  6.34843742e-01, -2.57519179e+00,
          1.65067471e+00, -1.06858274e+00, -2.98175951e+00,
         -5.06229308e-01,  2.46127643e-01, -7.59990757e-01,
         -1.06837855e+00],
        [ 2.60320813e-01, -5.00963822e-01,  3.

In [28]:
len(mlp.coefs_[0])

784

In [29]:
mlp.coefs_[0]

array([[ 0.03244925, -0.00831409, -0.03154424, ...,  0.04295491,
        -0.01714564,  0.00537021],
       [-0.03426557,  0.01500194, -0.02067342, ..., -0.00354121,
        -0.0210884 , -0.00973014],
       [ 0.00945931,  0.00598189,  0.0456612 , ..., -0.00076483,
        -0.01948036,  0.02889062],
       ...,
       [ 0.01956118,  0.04396216,  0.03116761, ...,  0.04457343,
        -0.0202273 ,  0.03332531],
       [ 0.02083884, -0.0217364 ,  0.04130112, ..., -0.04610477,
        -0.01940277,  0.01405071],
       [ 0.0088554 , -0.01302614, -0.04121046, ..., -0.00144927,
         0.03681263,  0.02309513]])

## Make Predictions

### Based on the training dataset

The function cross_val_predict has a similar interface to cross_val_score, but returns, for each element in the input, the prediction that was obtained for that element when it was in the test set obtained from the partition over X_train (in our case).

In [30]:
predictions = cross_val_predict(mlp, X_train, y_train, cv=3)
print('Prediction: {}', predictions)

Prediction: {} [0 6 4 ... 6 6 7]


### Based on the test dataset

Now we used the function predict which is regularly the one to be used to get the predictions on a new dataset.  In our case, the new dataset is the X_test one. 

In [31]:
# train your model using all data.
startTime = datetime.now()
mlp = MLPClassifier()
mlp.fit(X, y) 
print ('Total running time (H: M: S: ThS)', datetime.now()-startTime, 'seconds.')


Total running time (H: M: S: ThS) 0:03:45.092875 seconds.


In [32]:
predictions = mlp.predict(X_test)
print('Prediction: {}', predictions)

Prediction: {} [0 6 5 ... 7 4 8]


<hr>
By: Hector Alvaro Rojas &nbsp;&nbsp;|&nbsp;&nbsp; Data Science, Visualizations and Applied Statistics &nbsp;&nbsp;|&nbsp;&nbsp; April 17, 2018<br>
    Url: [http://www.arqmain.net]   &nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;   GitHub: [https://github.com/arqmain]
    <hr>