# Python-MLearning: Digits recognition using Neural Network (NN) and Sklearn Library

## Model: Digits 0-9 approach using Grid-SearchCV


By: Hector Alvaro Rojas &nbsp;&nbsp;|&nbsp;&nbsp; Data Science, Visualizations and Applied Statistics &nbsp;&nbsp;|&nbsp;&nbsp; April 27, 2018<br>
    Url: [http://www.arqmain.net]   &nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;   GitHub: [https://github.com/arqmain]
    <hr>

## I IMPORT REQUIRED PACKAGES

In [1]:
%matplotlib inline
import os
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
from sklearn.model_selection import KFold, cross_val_score, cross_val_predict
from sklearn import metrics
from datetime import datetime


## II LOADING DATA

In [2]:
#Checking working directory
# import os
os.getcwd()

'C:\\Users\\Alvaro\\Documents\\R-Python-Projects_April042018\\Python_Projects\\Machine-Learning\\NNetwork\\NN2\\DigitSklearn\\Python'

In [3]:
#List files in a directory
os.listdir()

['.ipynb_checkpoints',
 'mnist_My.csv',
 'PYTHON-MLearning_NN2.ipynb',
 'PYTHON-MLearning_NN2_GridSearchCV.ipynb',
 'PYTHON-MLearning_NN2_KFold.ipynb',
 'PYTHON-MLearning_NN2_RandomizedSearchCV.ipynb']

In [4]:
# read csv (comma separated value) into data
data=pd.read_csv('mnist_My.csv')
df=data
df.columns

Index(['label', 'pixel0', 'pixel1', 'pixel2', 'pixel3', 'pixel4', 'pixel5',
       'pixel6', 'pixel7', 'pixel8',
       ...
       'pixel774', 'pixel775', 'pixel776', 'pixel777', 'pixel778', 'pixel779',
       'pixel780', 'pixel781', 'pixel782', 'pixel783'],
      dtype='object', length=785)

Now, we will do a grid searching in order to get an adequate NN model to be fitted to the data. There are various options associated with NN classification object, like "activation", "Number of Layers" , and "Number of Neurons in a layer" etc. All of this form part of the tune possibilities of the model.  You can view the full list of tunable parameters [here](http://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier).

To present the  GridSearcCV method, we will set up values only for some of the more important parameters of the model .

## III NN MODELING

## Train and Validation Datasets

In [5]:
# train test split
X=data.iloc[:,1:]
y=data.iloc[:,0]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=4)

# Train and Test dataset size details
print("X Shape :: ", X.shape)
print("y :: ", y.shape)
print("X_train Shape :: ", X_train.shape)
print("y_train Shape :: ", y_train.shape)
print("X_test Shape :: ", X_test.shape)
print("y_test Shape :: ", y_test.shape)


X Shape ::  (70000, 784)
y ::  (70000,)
X_train Shape ::  (56000, 784)
y_train Shape ::  (56000,)
X_test Shape ::  (14000, 784)
y_test Shape ::  (14000,)


## Build Model

### Fit the model and evaluate it

### Fitting the Model

#### What values for the model's hyperparameters would be selected?

In [7]:
# Grid Search for CV
from sklearn.neural_network import MLPClassifier
#from sklearn.grid_search import GridSearchCV
from sklearn.model_selection import GridSearchCV

startTime = datetime.now()
mlp = MLPClassifier(hidden_layer_sizes=(16,16))

# Use a grid over parameters of interest
parameters={'alpha': [10, 1, 0.01],
'solver': ['lbfgs','adam'],
'activation': ["logistic", "relu"]
}
 
model = GridSearchCV(estimator=mlp, param_grid=parameters, n_jobs=-1, verbose=2, cv= 3)
model.fit(X_train, y_train)
print ('Total running time (H: M: S. ThS)', datetime.now()-startTime, 'seconds.')

Fitting 3 folds for each of 6 candidates, totalling 18 fits


[Parallel(n_jobs=-1)]: Done  18 out of  18 | elapsed: 17.7min finished


Total running time (H: M: S. ThS) 0:18:55.751962 seconds.


In [9]:
print("Tuned hyperparameters: {}".format(model.best_params_))
# View the accuracy score
print('Best score for train data:',model.best_score_) 
# View the accuracy score
model.grid_scores_


Tuned hyperparameters: {'activation': 'relu', 'alpha': 10}
Best score for train data: 0.9512857142857143




[mean: 0.20991, std: 0.00318, params: {'activation': 'logistic', 'alpha': 10},
 mean: 0.88305, std: 0.00905, params: {'activation': 'logistic', 'alpha': 1},
 mean: 0.89277, std: 0.00651, params: {'activation': 'logistic', 'alpha': 0.01},
 mean: 0.95129, std: 0.00334, params: {'activation': 'relu', 'alpha': 10},
 mean: 0.93963, std: 0.00806, params: {'activation': 'relu', 'alpha': 1},
 mean: 0.90302, std: 0.01881, params: {'activation': 'relu', 'alpha': 0.01}]

#### Evaluating the Model

In [7]:
# Grid Search for CV
#from sklearn import grid_search
from sklearn.neural_network import MLPClassifier
from sklearn.grid_search import GridSearchCV

# train your model using X_train, y_train and the best known parameters
startTime = datetime.now()
mlp = MLPClassifier(solver='adam', activation = 'relu', hidden_layer_sizes=(16,16), alpha = 10)
mlp.fit(X_train, y_train) 
print ('Total running time (H: M: S. ThS)', datetime.now()-startTime, 'seconds.')

Total running time (H: M: S. ThS) 0:00:44.182527 seconds.


In [8]:
# Evaluating NN model
print('With Neural Network () accuracy is: ',round(mlp.score(X_train,y_train),4)) # accuracy 

With Neural Network () accuracy is:  0.9561


In [9]:
predictions = mlp.predict(X_test)
print('Accuracy based on X_test, Y_test: ',accuracy_score(y_test, predictions))
print('')
print('Confusion Matrix:\n ',confusion_matrix(y_test, predictions))
print(classification_report(y_test, predictions))

Accuracy based on X_test, Y_test:  0.9453571428571429

Confusion Matrix:
  [[1342    0    7    1    1    1    8   10    7    1]
 [   0 1552    6    9    2    4    3    5   11    0]
 [   8   16 1298    1    6    0    4   19   15    0]
 [   8    2   26 1296    0   31    1   34    9   17]
 [   2    4   12    0 1272    1   12    6    6   47]
 [   5    4    7   22    7 1176   20    5   11    7]
 [  10    1    6    0    4   11 1379    1    1    0]
 [   4   11    9    2    5    1    0 1344    4   11]
 [   6   10   20   13   10   19   13    3 1225   28]
 [   7    3    3   11   28   13    1   34   11 1351]]
             precision    recall  f1-score   support

          0       0.96      0.97      0.97      1378
          1       0.97      0.97      0.97      1592
          2       0.93      0.95      0.94      1367
          3       0.96      0.91      0.93      1424
          4       0.95      0.93      0.94      1362
          5       0.94      0.93      0.93      1264
          6       0.96

Precision, recall and f1-score are metrics to measure the accuracy of classification models. A general explanation can be got in [Wikipedia](https://en.wikipedia.org/wiki/Evaluation_of_binary_classifiers).

Looks like we misclassified 765 digit's images, leaving us with a 94.54% accuracy rate (with 95% precision and 95% recall).

If you do want to extract the MLP weights and biases after training your model, you use its public attributes coefs_ and intercepts_.

<b>coefs_</b>is a list of weight matrices, where weight matrix at index i represents the weights between layer i and layer i+1.

<b>intercepts_</b> is a list of bias vectors, where the vector at index i represents the bias values added to layer i+1.

In [10]:
len(mlp.coefs_)

3

In [11]:
mlp.coefs_

[array([[-1.04251560e-147,  1.10417635e-147,  1.79024404e-149, ...,
          1.43022117e-148, -2.61933875e-149, -1.06146390e-148],
        [-1.23959388e-149,  4.61462566e-148,  8.60108624e-139, ...,
         -4.69080784e-149,  1.80304896e-149,  2.70714311e-147],
        [-3.09361015e-148, -4.59201582e-148, -1.15899210e-147, ...,
          9.65940648e-149,  1.93889696e-149,  4.17547372e-149],
        ...,
        [ 7.32895013e-148,  5.11922094e-149, -6.44773492e-149, ...,
          1.42869333e-149,  1.64742931e-149,  1.12008304e-149],
        [-1.65498014e-148, -3.33294827e-148, -2.16231425e-149, ...,
         -1.98616613e-149,  2.78718364e-148, -2.05016608e-148],
        [ 2.91374602e-147, -3.15709750e-150,  3.79736080e-019, ...,
         -2.56138179e-149, -3.64357849e-149, -4.85179769e-149]]),
 array([[-5.34602933e-02,  8.22567040e-03,  1.32514985e-02,
         -1.31048717e-02, -4.44119819e-02,  7.52181041e-02,
          5.64961908e-02, -2.80691256e-08, -3.96395632e-02,
          7.5

In [12]:
len(mlp.coefs_[0])

784

In [13]:
mlp.coefs_[0]

array([[-1.04251560e-147,  1.10417635e-147,  1.79024404e-149, ...,
         1.43022117e-148, -2.61933875e-149, -1.06146390e-148],
       [-1.23959388e-149,  4.61462566e-148,  8.60108624e-139, ...,
        -4.69080784e-149,  1.80304896e-149,  2.70714311e-147],
       [-3.09361015e-148, -4.59201582e-148, -1.15899210e-147, ...,
         9.65940648e-149,  1.93889696e-149,  4.17547372e-149],
       ...,
       [ 7.32895013e-148,  5.11922094e-149, -6.44773492e-149, ...,
         1.42869333e-149,  1.64742931e-149,  1.12008304e-149],
       [-1.65498014e-148, -3.33294827e-148, -2.16231425e-149, ...,
        -1.98616613e-149,  2.78718364e-148, -2.05016608e-148],
       [ 2.91374602e-147, -3.15709750e-150,  3.79736080e-019, ...,
        -2.56138179e-149, -3.64357849e-149, -4.85179769e-149]])

### Making Predictions

#### Based on the training dataset

The function cross_val_predict has a similar interface to cross_val_score, but returns, for each element in the input, the prediction that was obtained for that element when it was in the test set obtained from the partition over X_train (in our case).

In [16]:
predictions = cross_val_predict(mlp, X_train, y_train, cv=3)
print('Prediction: {}', predictions)

Prediction: {} [0 6 4 ... 6 6 3]


#### Based on the test dataset

Now we used the function predict which is regularly the one to be used to get the predictions on a new dataset.  In our case, the new dataset is the X_test one. 

In [14]:
# train your model using all data.
startTime = datetime.now()
mlp.fit(X, y) 
print ('Total running time (H: M: S. ThS)', datetime.now()-startTime, 'seconds.')

Total running time (H: M: S. ThS) 0:02:18.231907 seconds.


In [15]:
startTime = datetime.now()
predictions = mlp.predict(X_test)
print('Prediction: {}', predictions)
print ('Total running time (H: M: S. ThS)', datetime.now()-startTime, 'seconds.')

Prediction: {} [0 6 5 ... 7 4 8]
Total running time (H: M: S. ThS) 0:00:00.226013 seconds.


<hr>
By: Hector Alvaro Rojas &nbsp;&nbsp;|&nbsp;&nbsp; Data Science, Visualizations and Applied Statistics &nbsp;&nbsp;|&nbsp;&nbsp; April 27, 2018<br>
    Url: [http://www.arqmain.net]   &nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;   GitHub: [https://github.com/arqmain]
    <hr>