<font size=4 color='blue'>
    
# <center>Clase 8, julio 7 del 2021 </center>

<font size=4 color='blue'>

# <center> Topic that the Machine will learn: Handwritten Digit Recognition </center>

<font size=5 color='blue'>
From Learning Machines to Smart Machines

<font size=4 >
    
[Smart machines:](./Literatura/What-is-smart-machines.pdf)

<font size=5 color='blue'>
Classification Predictive Modeling

<font size=4 color='black'>
Classification predictive modeling is the task of approximating a mapping function F from input variables (X) to <font color='red' > $\bf discrete$ <font color='black' > target variables (Y). In statistics a variable that can take on one of a limited number of possible values is called a $\bf categorical$ $\bf variable$

<font size=5 color='blue'>
Regression Predictive Modeling

<font size=4 color='black'>
    
Regression predictive modeling is the task of approximating a mapping function F from input variables (X) to a <font color='red' > $\bf continuous$ <font color='black' > target variable (Y).

<font size=5 color='blue'>
Classification

<font size=4 color='black'>
    
A classification problem requires that samples be classified into one of two or more classes.

A classification can have real-valued or discrete input variables.

A problem with two classes is often called a two-class or binary classification.

A problem with more than two classes is often called a multi-class classification.

A problem where a sample is assigned multiple classes is called a multi-label classification.
    


<font size=5 color='blue'>
Information about the topic: Handwritten Digit Recognition using MNIST database

<font size=4 color='black'>

[Hand written Zip code recognition](./Literatura/Back-propag-hand-written-cnn-lecun-1989.pdf)

<font size=4 color='black'>
The MNIST database (Modified National Institute of Standards and Technology database) is a large database of handwritten digits (samples) that is commonly used for training various image processing systems.

In [None]:
import numpy as np
import time

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Input, Dense, Activation, Flatten
from tensorflow.keras.models import Model
import pydot
import IPython
from IPython.display import SVG
from tensorflow.keras.utils import plot_model
from tensorflow.keras.optimizers import SGD

import matplotlib
import matplotlib.pyplot as plt
from matplotlib.pyplot import imshow

import pickle
import gzip

np.random.seed(1)
%matplotlib inline

In [None]:
print("Numpy version", np.__version__)
print("TensorFlow version", tf.__version__)
print("Keras version", keras.__version__)
print("Pydot version", pydot.__version__)
print("Ipython version", IPython.__version__)
print("Matplotlib version", matplotlib.__version__)
print("Pickle version", pickle.format_version)
from platform import python_version
print("Python version", python_version())

<font size=5 color='blue'>
Samples preparation



 <font size=4 color='black'>   
The database mnist of samples can be downloaded from the following URL: 
    
[MNIST data download](https://github.com/mnielsen/neural-networks-and-deep-learning/blob/master/data/mnist.pkl.gz)

<font size=4 color='black'>
The samples to train and test the neuronal network are in the file 'mnist.pkl.gz'.

    gzip.open(filename, mode='rb') open the compressed binary file 'filename'.
    
   The documentation of gzip.open can be found at [gzip.open(...)](https://docs.python.org/3/library/gzip.html#gzip.open)

    pickle.load(file, encoding = 'latin1') decode the file 'file' in latin1

Documentation: [pickle.load(...)](https://docs.python.org/3/library/pickle.html#pickle.load)

The function 'load_samples()' has three samples sets as output: 

    learn_samples  # Samples for training
    val_samples    # Samples for validation
    test_samples   # Samples for testing

In [None]:
# The database is in the working directory: mnist.pkl.gz file.
    
def load_samples():

    f = gzip.open('mnist.pkl.gz', 'rb')
    
    learn_samples, val_samples, test_samples = pickle.load(f, encoding="latin1")
    
    f.close()
    
    return (learn_samples, val_samples, test_samples)


In [None]:
# the samples are loaded in three sets: train_samples, val_samples and test_samples 

learn_samples, val_samples, test_samples = load_samples()
 

<font size=4 color='black'>
    
Each of these sets is a tuple with two entries:

In [None]:
print("The type of train_samples: ", type(learn_samples), "with length: ", len(learn_samples) )
print("The type of val_data: ", type(val_samples), "with length: ", len(val_samples) )
print("The type of test_data: ", type(test_samples), "with length: ", len(test_samples) )

In [None]:
print("Shape of the first element of the train_data tuple: ", learn_samples[0].shape)
print("Shape of the second element of the train_data tuple: ", learn_samples[1].shape)
print("Shape of the first element of the val_data tuple: ", val_samples[0].shape)
print("Shape of the second element of the val_data tuple: ", val_samples[1].shape)
print("Shape of the first element of the test_data tuple: ", test_samples[0].shape)
print("Shape of the second element of the test_data tuple: ", test_samples[1].shape)

<font size=5 color="blue">

Analyzing the samples extracted from MNIST

<font size=4 color='black'>
The first entry of a sample corresponds to the network input, the values of the pixels, which are the image features. The second entry corresponds to the target variable. It is the digit value associated to the image. $$$$It is to note that pixel values were rescaled to values between 0.0 and 1.0

In [None]:
print("features 150 to 199 of the first training sample \n \n", learn_samples[0][0][150:200])
print("\n y value of the first training sample =",learn_samples[1][0])

<font size=5 color="blue">
    
Viewing one sample from the data sets

<font size=4 color='black'>
    
The digits in the MNIST dataset are images of 28x28 pixels. 
    
In the recovered datasets, images were represented by vectors of dimension 28x28=784. 
    
To deploy the digit image of a sample (index), its vector representation is changed to a matrix with dimensions 28x28.
    
    
This is done by using the following function:
    
plt.imshow(sets[0][index].reshape((28, 28)),cmap='gray')      #Images are in shades of gray

Documentation: [matplotlib.pyplot.imshow(...)](https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.imshow.html#matplotlib-pyplot-imshow)



In [None]:
index = 0

plt.imshow(learn_samples[0][index].reshape((28, 28)),cmap='gray')

print(learn_samples[1][index], "is the digit corresponding to the sample", index)
print("\n This is its image")

<font size=5 color='blue'>

Separation of the samples into features (inputs) and targets:

In [None]:
x_learn = learn_samples[0]   # input (features) in the training data set
y_learn = learn_samples[1]   # target (the digit) in the training data set

x_val = val_samples[0]   # input (features) in the validation data set
y_val = val_samples[1]   # target (the digit) in the validation data set

x_test = test_samples[0]     # input (features) in the testing data set
y_test = test_samples[1]     # target (the digit) in the testing data set


In [None]:
print(type(x_learn))
print(x_learn.shape)
print(y_learn.shape)
print(x_val.shape)
print(y_val.shape)
print(x_test.shape)
print(y_test.shape)

In [None]:
y_learn

<font size=5 color='blue'>
One-hot encoding of the target variable Y

<font size=4 color='black'>
The target value can have one of ten elements (classes), the digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9). 

The train_y and test_y sets are arrangements in which each entry contains a digit. Each digit is represented as a integer of 64 bits.
    
We change this representation to a vectorial one following One-hot encoding 
[One-hot encoding](https://en.wikipedia.org/wiki/One-hot).
    
In the One-Hot encoding, a digit is represented with a vector having dimension 10 (because we have 10 classes) with 1.0 in the vector index corresponding to the digit and 0.0 elsewhere in the vector. 


<font size=5 color='blue' >
    
Digit |     One-hot representation 
--- | --- 
 0  | [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 1  | [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 2  | [0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
 3  | [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
 4  | [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 5  | [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 6  | [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 7  | [0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
 8  | [0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
 9  | [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]

<font size=5 color='purple'>
Demo using numpy.eye

In [None]:
np.eye(10)

In [None]:
np.eye(10)[0]

In [None]:
np.eye(10)[1]

In [None]:
y_learn[0:5]

np.eye(10)[train_y[0:5].reshape(-1)]

In [None]:
np.eye(10)[y_learn[0:5]]

<font size=5 color='purple'>
End of demo using numpy.eye

In [None]:
learn_y = np.eye(10)[y_learn]

val_y = np.eye(10)[y_val]

test_y = np.eye(10)[y_test]

In [None]:
print("Y: Digit representation for the first learning sample \n", y_learn[0])
print("Y: One-hot representation for the first leaning sample \n",learn_y[0])

<font size=4 color="black">
    
For convenience, the dimensions of the input sets will be changed to the format:

(number of samples, image width, image length).

In [None]:
learn_x = x_learn.reshape(50000, 28, 28)
val_x  = x_val.reshape(10000, 28, 28)
test_x = x_test.reshape(10000, 28, 28)

<font size=4 color="black">
    
In summary, the learning and test sample sets are based on the following dimensions:

In [None]:
print ("number of learning samples = " + str(learn_x.shape[0]))
print ("number of validation samples = " + str(val_x.shape[0]))
print ("number of test samples = " + str(test_x.shape[0]))
print ("learn_x shape: " + str(learn_x.shape))
print ("learn_y shape: " + str(learn_y.shape))

print ("val_x shape: " + str(val_x.shape))
print ("val_y shape: " + str(val_y.shape))

print ("test_x shape: " + str(test_x.shape))
print ("test_y shape: " + str(test_y.shape))

<font size=5 color="blue">

Constructing the Learning Machine

<font size=5 color='blue'>
    
Model of the Machine: Full-Connected Feed-Forward Network (FF) with one hidden layer with twenty neurons. The nodes in the output layer will be activated with the softmax function.

<font size=5 color="black">


The architecture of the neural network will be shown

In [None]:
import networkx as nx

class Network(object):
    
    def  __init__ (self,sizes):
        self.num_layers = len(sizes)
        print("It has", self.num_layers, "layers,")
        self.sizes = sizes
        print("with the following number of nodes per layer",self.sizes)
        self.biases = [np.random.randn(y, 1) for y in sizes[1:]]
        self.weights = [np.random.randn(y, x)
                        for x, y in zip(sizes[:-1], sizes[1:])]
        
    def feedforward(self, x_of_sample):
        """Return the output of the network F(x_of_sample) """        
        for b, w in zip(self.biases, self.weights):
            x_of_sample = sigmoid(np.dot(w, x_of_sample)+b)
        return x_of_sample
    
    def graph(self,sizes):
        a=[]
        ps={}
        Q = nx.Graph()
        for i in range(len(sizes)):
            Qi=nx.Graph()    
            n=sizes[i]
            nodos=np.arange(n)
            Qi.add_nodes_from(nodos)
            l_i=Qi.nodes
            Q = nx.union(Q, Qi, rename = (None, 'Q%i-'%i))
            if len(l_i)==1:
                ps['Q%i-0'%i]=[i/(len(sizes)), 1/2]
            else:
                for j in range(len(l_i)+1):
                    ps['Q%i-%i'%(i,j)]=[i/(len(sizes)),(1/(len(l_i)*len(l_i)))+(j/(len(l_i)))]
            a.insert(i,Qi)
        for i in range(len(a)-1):
            for j in range(len(a[i])):
                for k in range(len(a[i+1])):
                    Q.add_edge('Q%i-%i' %(i,j),'Q%i-%i' %(i+1,k))
        nx.draw(Q, pos = ps)
                

In [None]:
# Architecture of the neural network we want to implement in the present notebook

layers = [784,20,10]
net = Network(layers)
net.graph(layers)

<font size=5 color='blue'>

Definition of the neural network architecture
  

<font size=5 color='black'> 
    
Keras has two different modes to define the architecture:

<font size=4 color='black'> 
    
1.- The sequential model. It is a sequential stack of layers.
    
2.- The functional API. It is the way to go for defining complex models, such as multi-output models, directed acyclic graphs, or models with shared layers.  

In the present case, we will use this last mode for constructing the architecture of the network.
    

Documentation: [Keras Functional API](https://keras.io/getting-started/functional-api-guide/)

In [None]:
def architecture(input_shape, num_clases):
    
    # Defining the input as a tensor with shape input_shape. 
    inputs = Input(input_shape, name='input-layer')
    
    # Flattening the input tensor of dimensions (28,28,1) to a tensor of dimensions (784, 1)
    x = Flatten()(inputs)
    
    # Defining the first hidden layer with 20 nodes and sigmoid as activation function
    x = Dense(20, kernel_initializer='uniform', bias_initializer='zeros', name='hidden-layer')(x)
    x = Activation('sigmoid')(x)
    
    # Defining the output layer with num_clases nodes 
    x = Dense(num_clases, kernel_initializer='uniform', bias_initializer='zeros')(x)
    
    # For the output layer we use the activation function 'softmax'
    outputs = Activation('softmax', name='output.layer')(x)

    # Create model. This creates your Keras model instance, you'll use this instance to train/test the model.    
    arch_model = Model(inputs = inputs, outputs = outputs, name='MnistModel')

    return arch_model

<font size=4 color='black'> 

    
   *The softmax activation function is always used for classification when the number (K) of classes is larger than two.* 

![image.png](attachment:image.png)
    
[Activation functions](./Literatura/activation_functions_2018.pdf)


<font size=5 color="blue">

Constructing the neural network model for the Learning Machine

In [None]:
one_image = (28,28)
num_classes= 10

# Generating a model using the architecture defined for the neural network
mnist_model = architecture(one_image, num_classes)

<font size=5 color="blue">
    
Model plot and summary

<font size=4 color='black'> 
The function 'plot_model()' generates a graphic with the layers and their number of input ands output weights.
$$ $$
Documentation: [Model visualization](https://keras.io/visualization/#training-history-visualization)

In [None]:
plot_model(mnist_model, to_file='FF_mnist_model.png', show_shapes=True, show_layer_names=True)

In [None]:

mnist_model.summary()

<font size=5 color='blue'>
Optimization method

<font size=4 color="black">
This requires defining the optimization algorithm, the loss function and the metric.
    
In the present case we are using the algorithm of Stochastic Gradient descent with learning rate "lr", "momentum" without Nesterov acceleration".


[An overview of gradient descent optimization algorithms](./Literatura/SGD_overview_2016-17.pdf)

This publication also comments some other optimization variants of this algorithm; Adagrad, Adadelta, RMStrop and Adam.

<font size=5 color='blue'>
Optimizer

In [None]:
learning_rate = 0.01

optimizer = keras.optimizers.SGD(learning_rate=0.01, momentum=0.0, nesterov=False)

<font size=5 color='blue'>
The cost (loss) and Metric functions

<font size=4 color="black">
    
The cost function *J* is the one defined as "categorical_crossentropy"
    
$$ J = \frac{1}{m} \sum_{i=1}^m \sum_{k=0}^{K-1}(y_k^{(i)}*\log{(F_k(x^{(i)})))}$$
    
 where $F_k(x^{(i)})$ is the predicted value and $y_k^{(i)}$ is the target value for the sample *i*; *K* is the number of classes and *m* is the number of samples.
    
[Cross entropy](https://en.wikipedia.org/wiki/Cross_entropy)
    
[Categorical cross entropy](https://www.deeplearningbook.org/)
    

A metric function is similar to a loss function, except that the results from evaluating a metric are not used when training the model. You may use any of the loss functions as a metric function. In the present example, we are using "accuracy" as metrics:
    
*Accuracy = Number of correct predictions / Total number of predictions made*
    

Categorical crossentropy will compare the distribution of the predictions (the activations in the output layer, one for each class) with the true distribution, where the probability of the true class is set to 1, and 0 for the other classes.

To put it in a different way, the true class is represented as an encoded vector, and the closer the model’s outputs are to that vector, the lower the loss.
    
Documentation: [keras.compile(...)](https://keras.io/models/model/#compile)

In [None]:
loss_function = 'categorical_crossentropy'
metric_function = 'accuracy'

<font size=5 color='blue'>
Compiling the model

In [None]:
mnist_model.compile(optimizer = optimizer, loss = loss_function, metrics = [metric_function])

<font size=5 color='blue'>
    
The Machine is learning

<font size=4 color="black">
    
Documentation: [keras.fit(...)](https://keras.io/models/model/#fit)


In [None]:
start_time = time.time()

num_epochs = 100

history = mnist_model.fit(x = learn_x, y = learn_y, epochs=num_epochs, batch_size = 100, \
                          validation_data=(val_x,val_y), shuffle=False, verbose=2)

end_time = time.time()
print("Time for learning: {:10.4f}s".format(end_time - start_time))

<font size=4 color='black'>

* Note: if you run `fit()` again, the `model` will continue learning, starting with the parameters it has already learnt, instead of reinitializing them.


<font size=5 color="blue">

Plotting the cost function of the learning and validation data sets as a function of the epoch

In [None]:
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Cost function', size=16)
plt.ylabel('Cost', size=16)
plt.xlabel('Epoch', size=16)
plt.legend(['Train', 'Validation'], loc='upper right', prop={'size': 16})
plt.show()

<font size=5 color="blue">
Plotting the accuracy function of the learning and validation sets as a function of the epoch

In [None]:
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model accuracy', size=16)
plt.ylabel('Accuracy', size=16)
plt.xlabel('Epoch', size=16)
plt.legend(['Train', 'Validation'], loc='lower right', prop={'size': 16})
plt.show()

<font size = 5 color='blue'>

Loss and accuracy evaluation using the Smart Machine and the test samples

<font size= 4 color='black'>    
After the network learnt, the loss and accuracy functions are evaluated using the test samples (test_x, test_y). This is done using the Keras Method evaluate(x=None, y=None, ...).
    

    
[Method evaluate in Keras](https://keras.io/models/model/)

In [None]:
# Evaluation using all the samples of the test set
evaluations = mnist_model.evaluate(x = test_x, y = test_y)

print ("Loss = " + str(evaluations[0]))
print ("Test Accuracy = " + str(evaluations[1]))


In [None]:
# Evaluation using the first 100 samples of the test set

evaluations = mnist_model.evaluate(x = test_x[:100], y = test_y[:100])

print ("Loss = " + str(evaluations[0]))
print ("Test Accuracy = " + str(evaluations[1]))


<font size = 5 color='blue'>
Digits prediction with the Smart Machine

<font size= 4 color='black'>    
The smart machine generates predictions of the digists associated to new samples. For example, those in the test data (test_x, test_y). This is done using the Keras Method "predict(x, ...)"
      
[Method predict in Keras](https://keras.io/models/model/)

In [None]:
# Predicting the digits associated to each sample in the test set (test_x)
predictions = mnist_model.predict(test_x)

In [None]:
sample = 34

# Predicting the digit associated to the sample 
# np.argmax returns the index of the maximum value

prediction = np.argmax(predictions[sample])

print('For the sample number', sample, 'the prediction is the digit:', prediction)

<font size=4 color="black"> 
Displaying the digit associated (not predicted!) to this sample.

In [None]:
plt.imshow(test_samples[0][sample].reshape((28, 28)), cmap='gray')

print ('For the sample number', sample, 'the associated digit is:', np.squeeze(test_samples[1][sample]))

<font size = 5 color='blue'>
Reseting all state generated by Keras

<font size=4 >
Keras manages a global state, which it uses to implement the Functional model-building API and to uniquify autogenerated layer names.  
If you are creating many models in a loop, this global state will consume an increasing amount of memory over time, and you may want to clear it. Calling clear_session() releases the global state: this helps avoid clutter from old models and layers, especially when memory is limited.

In [None]:
tf.keras.backend.clear_session()
