# Python-MLearning: Digits recognition using Neural Network (NN) and Sklearn Library

## Model: Digits 0-9 approach using RandomizedSearchCV


By: Hector Alvaro Rojas &nbsp;&nbsp;|&nbsp;&nbsp; Data Science, Visualizations and Applied Statistics &nbsp;&nbsp;|&nbsp;&nbsp; April 29, 2018<br>
    Url: [http://www.arqmain.net]   &nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;   GitHub: [https://github.com/arqmain]
    <hr>

## I IMPORT REQUIRED PACKAGES

In [1]:
%matplotlib inline
import os
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
from sklearn.model_selection import KFold, cross_val_score, cross_val_predict
from sklearn import metrics
from datetime import datetime


## II LOADING DATA

In [2]:
#Checking working directory
# import os
os.getcwd()

'C:\\Users\\Alvaro\\Documents\\R-Python-Projects_April042018\\Python_Projects\\Machine-Learning\\NNetwork\\NN2\\Backup-Python'

In [3]:
#List files in a directory
os.listdir()

['.ipynb_checkpoints',
 'For  RandomizedSearchCV July172018.txt',
 'mnist_My.csv',
 'Nueva carpeta',
 'PYTHON-MLearning_NN2.ipynb',
 'PYTHON-MLearning_NN2_GridSearchCV.ipynb',
 'PYTHON-MLearning_NN2_KFold.ipynb',
 'PYTHON-MLearning_NN2_RandomizedSearchCV.ipynb']

In [4]:
# read csv (comma separated value) into data
data=pd.read_csv('mnist_My.csv')
df=data
df.columns


Index(['label', 'pixel0', 'pixel1', 'pixel2', 'pixel3', 'pixel4', 'pixel5',
       'pixel6', 'pixel7', 'pixel8',
       ...
       'pixel774', 'pixel775', 'pixel776', 'pixel777', 'pixel778', 'pixel779',
       'pixel780', 'pixel781', 'pixel782', 'pixel783'],
      dtype='object', length=785)

Now, we will do a grid searching in order to get an adequate NN model to be fitted to the data. There are various options associated with NN classification object, like "activation", "Number of Layers" , and "Number of Neurons in a layer" etc. All of this form part of the tune possibilities of the model.  You can view the full list of tunable parameters [here](http://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier).

To present the  GridSearcCV method, we will set up values only for some of the more important parameters of the model .

## III NN MODELING

## Train and Validation Datasets

In [5]:
# train test split
X=data.iloc[:,1:]
y=data.iloc[:,0]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=4)

# Train and Test dataset size details
print("X Shape :: ", X.shape)
print("y :: ", y.shape)
print("X_train Shape :: ", X_train.shape)
print("y_train Shape :: ", y_train.shape)
print("X_test Shape :: ", X_test.shape)
print("y_test Shape :: ", y_test.shape)


X Shape ::  (70000, 784)
y ::  (70000,)
X_train Shape ::  (56000, 784)
y_train Shape ::  (56000,)
X_test Shape ::  (14000, 784)
y_test Shape ::  (14000,)


## Build Model

### What values for the model's hyperparameters would be selected?

In [6]:
# Import necessary modules
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import RandomizedSearchCV

startTime = datetime.now()

# Setup the parameters and distributions to sample from: param_dist
param_dist = {'alpha': [10, 1, 0.01],
'solver': ['lbfgs','adam'], 
'activation': ['logistic', 'relu']
}

# Instantiate MLP Classifier
mlp = MLPClassifier(hidden_layer_sizes=(16,16))


# Instantiate the RandomizedSearchCV object: mlp_cv
mlp_cv = RandomizedSearchCV(mlp, param_dist, cv=3)

# Fit it to the data
mlp_cv.fit(X,y)

print ('Total running time (H: M: S. ThS)', datetime.now()-startTime, 'seconds.')


Total running time (H: M: S. ThS) 1:25:59.597112 seconds.


In [7]:
# Print the tuned parameters and score
print("Tuned Mlp Parameters: {}".format(mlp_cv.best_params_))
print("Best score is {}".format(mlp_cv.best_score_))

Tuned Mlp Parameters: {'solver': 'adam', 'alpha': 10, 'activation': 'relu'}
Best score is 0.9475285714285714


## Using the best parameters to make predictions

### Fitting the Model

In [8]:
# Grid Search for CV
#from sklearn import grid_search
from sklearn.neural_network import MLPClassifier

# train your model using all data and the best known parameters (Kevin Dataschool)
startTime = datetime.now()
mlp = MLPClassifier(solver='adam', activation = 'relu', hidden_layer_sizes=(16,16), alpha = 10)
mlp.fit(X, y) 
print ('Total running time (H: M: S. ThS)', datetime.now()-startTime, 'seconds.')

Total running time (H: M: S. ThS) 0:01:13.447201 seconds.


### Evaluating the model

In [9]:
predictions = mlp.predict(X_test)
print('Accuracy based on X_test, Y_test: ',accuracy_score(y_test, predictions))
print('')
print('Confusion Matrix:\n ',confusion_matrix(y_test, predictions))
print(classification_report(y_test, predictions))

Accuracy based on X_test, Y_test:  0.9557142857142857

Confusion Matrix:
  [[1316    0   19    5    4    7   13    2   12    0]
 [   0 1570    9    5    1    3    1    1    1    1]
 [   2    8 1324    3    3    1    9    7    9    1]
 [   1    3   24 1347    2   25    1    7   11    3]
 [   2    6    4    0 1326    1   10    2    1   10]
 [   5    1    6   21    0 1209   13    2    7    0]
 [   3    1    1    0    2   13 1391    0    2    0]
 [   3   25   16    4   10    3    6 1312    1   11]
 [   2   27   12   15    3   20   19    2 1245    2]
 [   6    7    4   19   37   16    1   25    7 1340]]
             precision    recall  f1-score   support

          0       0.98      0.96      0.97      1378
          1       0.95      0.99      0.97      1592
          2       0.93      0.97      0.95      1367
          3       0.95      0.95      0.95      1424
          4       0.96      0.97      0.96      1362
          5       0.93      0.96      0.94      1264
          6       0.95

Precision, recall and f1-score are metrics to measure the accuracy of classification models. A general explanation can be got in [Wikipedia](https://en.wikipedia.org/wiki/Evaluation_of_binary_classifiers).

Looks like we misclassified 620 digit's images, leaving us with a 95.57% accuracy rate (with 96% precision and 96% recall).

If you do want to extract the MLP weights and biases after training your model, you use its public attributes coefs_ and intercepts_.

<b>coefs_</b>is a list of weight matrices, where weight matrix at index i represents the weights between layer i and layer i+1.

<b>intercepts_</b> is a list of bias vectors, where the vector at index i represents the bias values added to layer i+1.

In [10]:
len(mlp.coefs_)

3

In [11]:
mlp.coefs_

[array([[-9.08752335e-249, -5.26084361e-249, -8.72903484e-005, ...,
         -9.28662133e-021, -1.16935138e-015,  1.13598303e-247],
        [ 2.84319967e-008,  3.63073471e-249, -1.24542610e-005, ...,
         -1.44378693e-019,  4.64087047e-010, -3.64939771e-249],
        [-2.16420994e-029,  2.77159611e-249,  1.88731946e-248, ...,
          6.06866692e-021, -1.09514825e-033, -8.70232449e-029],
        ...,
        [-2.18537634e-024,  2.91301693e-249,  9.00884408e-010, ...,
         -5.50714888e-248, -3.80985603e-007,  1.02833752e-248],
        [ 1.34990470e-249,  3.37306984e-249, -1.01415231e-010, ...,
         -2.21126954e-043, -5.49637985e-248,  4.85013155e-020],
        [ 6.96885737e-017, -1.29172517e-207,  4.86624217e-248, ...,
         -8.26413618e-248, -7.84578139e-250,  2.48325439e-009]]),
 array([[-1.65736492e-002,  1.82541301e-002, -5.40974122e-002,
          1.91904624e-002,  6.14480970e-002,  5.86367173e-002,
          9.09237450e-003, -9.35432268e-003, -4.95687702e-002,
    

In [12]:
len(mlp.coefs_[0])

784

In [13]:
mlp.coefs_[0]

array([[-9.08752335e-249, -5.26084361e-249, -8.72903484e-005, ...,
        -9.28662133e-021, -1.16935138e-015,  1.13598303e-247],
       [ 2.84319967e-008,  3.63073471e-249, -1.24542610e-005, ...,
        -1.44378693e-019,  4.64087047e-010, -3.64939771e-249],
       [-2.16420994e-029,  2.77159611e-249,  1.88731946e-248, ...,
         6.06866692e-021, -1.09514825e-033, -8.70232449e-029],
       ...,
       [-2.18537634e-024,  2.91301693e-249,  9.00884408e-010, ...,
        -5.50714888e-248, -3.80985603e-007,  1.02833752e-248],
       [ 1.34990470e-249,  3.37306984e-249, -1.01415231e-010, ...,
        -2.21126954e-043, -5.49637985e-248,  4.85013155e-020],
       [ 6.96885737e-017, -1.29172517e-207,  4.86624217e-248, ...,
        -8.26413618e-248, -7.84578139e-250,  2.48325439e-009]])

### Making Predictions

#### Based on the training dataset

The function cross_val_predict has a similar interface to cross_val_score, but returns, for each element in the input, the prediction that was obtained for that element when it was in the test set obtained from the partition over X_train (in our case).

In [15]:
predictions = cross_val_predict(mlp, X_train, y_train, cv=3)
print('Prediction: {}', predictions)

Prediction: {} [0 6 9 ... 6 6 3]


#### Based on the test dataset

Now we used the function predict which is regularly the one to be used to get the predictions on a new dataset.  In our case, the new dataset is the X_test one. 

In [16]:
startTime = datetime.now()
predictions = mlp.predict(X_test)
print('Prediction: {}', predictions)
print ('Total running time (H: M: S. ThS)', datetime.now()-startTime, 'seconds.')

Prediction: {} [0 6 5 ... 7 4 8]
Total running time (H: M: S. ThS) 0:00:00.179010 seconds.


<hr>
By: Hector Alvaro Rojas &nbsp;&nbsp;|&nbsp;&nbsp; Data Science, Visualizations and Applied Statistics &nbsp;&nbsp;|&nbsp;&nbsp; April 29, 2018<br>
    Url: [http://www.arqmain.net]   &nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;   GitHub: [https://github.com/arqmain]
    <hr>