<font size=4 color='blue'>

# <center> Clase 12, enero 6, 2021 </center>


<font size=5 color="blue">

Deep Learning using LeNet and AlexNet

<font size=4 color="black">
    
[Comment: LeNet and AlexNet](https://d2l.ai/chapter_convolutional-modern/alexnet.html#fig-alexnet)

<font size=5 color="black">

LeNet

<img src="https://drive.google.com/uc?id=196d5qXzLXms_9-bW40KJETPy4Wmz4Yef" align = "center" >

<font size=4 color="black">
    
LeNet: these networks have a large number of weights and biases; overfitting should be attended

<font size=4 color="black">
    
[Article: LeNet](./literature/LeNet_lecun-1999.pdf)
$$ $$
[Comment: LeNet](http://deeplearning.net/tutorial/lenet.html)


<font size=4 color="black">
    
[Comment: Convolutional Neural Networks](https://engmrk.com/convolutional-neural-network-3/)

<font size=3 color="black">

## AlexNet



<img src="https://drive.google.com/uc?id=1Zl1oGuVvaHt5bdQIBsvXr9CrWTjIhrLz" width=820 height=600 align = "center" >




<font size=4 color="black">
    
[Article: AlexNet](./literature/alexnet-paper.pdf)

[Comment: AlexNet](https://www.mydatahack.com/building-alexnet-with-keras/)

<font size=2 color="black">

## A method to reduce overfitting: Data Augmentation

<img src="https://drive.google.com/uc?id=1eC_hIGOwIyDHMq3TNecEyqdxVfCartR_" width=300 height=300 align = "left" > 
 <img src="https://drive.google.com/uc?id=10tJzrCBocvBSXp35B0Bt4rgM-nS2-PiC" width=520 height=320 align = "center" > 



<font size=2 color="black">

## Data Augmentation
    
<font size=4 color="black"> 
    
[Paper: Augmentation overview](./literature/SurvayData-Augm-DL_2019.pdf)    

$$ $$   
Deep networks are heavily reliant on big data to avoid overfitting:
    
Transforming an image
    
    
<img src="https://drive.google.com/uc?id=1Fzg9O3SFAr7Xr8zdh8FQDds8oZNAoZ5b" width=520 height=420 align = "center" > 


Transforming a curve
    
<img src="https://drive.google.com/uc?id=1A_gGsL-j6LfTHscwkw_2TN1zfnlTCp7U" width=520 height=320 align = "center" > 
  



<font size=4 color="black">
    
[Keras: Image Preprocessing](https://keras.io/preprocessing/image/)

<font size = 4 color="black">

[Comment: About data augmentation for Deep Learning](https://towardsdatascience.com/data-augmentation-for-deep-learning-4fe21d1a4eb9)

<font size=2 color="black">

## Another way of reducing overfitting is using batch normalization
    
<font size=4 color="black">


<font size=4 color="black">
$$ $$
    
[Paper: Batch normalization](./literature/Batch-normalization_2015.pdf)
    
<img src="https://drive.google.com/uc?id=1MP7J2KZipsEr5QlBLWtFWJbwYBzpYhxv" width=620 height=420 align = "center" >     



<font size=4 color="black">
    
Batch normalization helps to reduce the overfitting and accelerates the convergence of the network during training        
<img src="https://drive.google.com/uc?id=1jW5rYNpHjq7eLYTHIXUMB8CtJtHYL6TA" width=520 height=520 align = "center" > 


<font size=4 color="black">

[Comment: batch normalization](https://towardsdatascience.com/batch-normalization-in-neural-networks-1ac91516821c)

<font size=5 color="blue">

Deep Learning: LeNet 

<font size=4 color='red'>
If you use tensorflow-GPU, run the following cell

In [2]:
import tensorflow as tf

physical_devices = tf.config.experimental.list_physical_devices('GPU')
print("physical_devices-------------", len(physical_devices))
tf.config.experimental.set_memory_growth(physical_devices[0], True)

physical_devices------------- 0


IndexError: ignored

In [None]:
import numpy as np
import matplotlib.pyplot as plt

from keras.models import Sequential
from keras.layers import Dense, Conv2D, Activation, Dropout, Flatten, MaxPooling2D
from keras.layers import BatchNormalization 
from keras.utils import plot_model
from keras import optimizers

import time

np.random.seed(10)

<font size=2 color="black">

## Data of the System to be analyzed: mnist

<font size=4 color="black"> 
    
[The MNIST database](http://yann.lecun.com/exdb/mnist/)

<img src="https://drive.google.com/uc?id=1jTGs17Bnx-hpmVwUYYQXGhH1ja9r-seY" width=300 height=300 align = "center" >



<font size=2 color='black'>

##  Generation or extraction of the raw data
  

In [None]:
from keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

In [None]:
print("x_train, y_train type", type(x_train), type(y_train))
print("x_test, y_test type", type(x_test), type(y_test))

In [None]:
print("x_train shape", x_train.shape)
print("y_train shape", y_train.shape)
print("x_test shape", x_test.shape)
print("y_test shape", y_test.shape)

<font size=2 color='black'>

##  Analysis of the raw data
  

In [None]:
image_index = 7777 # You may select anything up to 60,000
print(y_train[image_index]) # The label is 8
plt.imshow(x_train[image_index], cmap='Greys')
plt.show()

<font size=2 color='black'>

##  Transformation of the raw data 
  

In [None]:
# Reshaping the array to 4-dims so that it can work with the Keras API
x_train = x_train.reshape(x_train.shape[0], 28, 28, 1)
x_test = x_test.reshape(x_test.shape[0], 28, 28, 1)
#input_shape = (28, 28, 1)

# Making sure that the values are float so that we can get decimal points after division
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')

# Normalizing the RGB codes by dividing it to the max RGB value.
x_train /= 255.0
x_test /= 255.0
print('x_train shape:', x_train.shape)
print('y_train shape:', y_train.shape)
print('x_test shape:', x_test.shape)
print('y_test shape:', y_test.shape)

In [None]:
y_train[0:15]

<font size=2 color='black'>

##  Definition of the neural network architecture
  

In [None]:
# Creating a Sequential Model and adding the layers

def architecture(batch_normalization, dropout, input_shape, activation):
    model = Sequential()
    model.add(Conv2D(32, kernel_size=(5,5), input_shape=input_shape))
    
    model.add(MaxPooling2D(pool_size=(2, 2)))
    if batch_normalization:
        model.add(BatchNormalization())    #The recomendaton is to perform batch normalization before activation
    
    model.add(Conv2D(64, kernel_size=(5,5), input_shape=input_shape))
    
    model.add(MaxPooling2D(pool_size=(2, 2)))
    
    model.add(Flatten()) # Flattening the 2D arrays for fully connected layers
    
    model.add(Dense(1024))
    if dropout:
        model.add(Dropout(0.2))
    if batch_normalization:
        model.add(BatchNormalization())  #The recomendaton is to perform batch normalization before activation
    model.add(Activation(activation))

    model.add(Dense(10,activation='softmax'))
    
    return model

<font size=2 color='black'>

##  Generating a model of deep neural network 
  

<font size=4 color='black'>

Playing with batch normalization and dropout, you will see that batch normalization improves better the network. Remember that batch normalization is applied before the activation. 
    
[Paper: Batch Normalization](./literature/Batch-normalization_2015.pdf)

In [None]:
batch_normalization=True
dropout=False
input_shape = (28, 28, 1)
activation = 'relu'

LeNet_model = architecture(batch_normalization, dropout, input_shape, activation)


In [None]:
# Plotting the architecture

plot_model(LeNet_model, to_file='LeNet.png', show_shapes=True, show_layer_names=True)

In [None]:
LeNet_model.summary()

<font size=4 color='blue'>
    
[Keras: compiling methods](https://keras.io/models/model/#compile)

<font size=2 color='black'>

##  Compiling the model 
  

In [None]:
#Compiling the model

lr = 0.001

LeNet_model.compile(optimizer=optimizers.Adam(learning_rate=lr,beta_1=0.9, beta_2=0.999, amsgrad=False),
              loss='sparse_categorical_crossentropy', metrics=['accuracy']) 

<font size=2 color='black'>

##  Running the model 
  

In [None]:
start_time = time.time()

num_epochs=20

history = LeNet_model.fit(x_train, y_train, batch_size=256, epochs=num_epochs, validation_split = 0.16)

end_time = time.time()
print("Time for training: {:10.4f}s".format(end_time - start_time))

<font size=2 color='black'>

##  Plotting the loss function 
  

In [None]:
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Lr = 0.001, loss_train: 0.1894, \n loss_val: 1.5591, BatchNorm=True \n Dropout = 0.4')
plt.ylabel('Cost')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper right')
#plt.ylim(top=13)
#plt.ylim(bottom=0)
plt.show()

<font size=2 color='black'>

##  Plotting the accuracy 
  

In [None]:
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Lr = 0.001, Acc_train: 0.9404, \n Acc_val: 0.6544 BatchNorm=True \n Dropout = 0.4')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='lower right')
plt.show()

In [None]:
# Predicting the image associated to the each sample in the test set (X_test)
predictions = LeNet_model.predict(x_test)

In [None]:
print(type(predictions))
print(predictions.shape)

In [None]:
sample = 91
print(predictions[sample])
print("\nPredicted digit:", np.argmax(predictions[sample]))

<font size=4 color="black"> 
Displaying the image associated to this sample.

In [None]:
x_test=x_test.reshape(x_test.shape[0], x_test.shape[1], x_test.shape[2])
plt.imshow(x_test[sample], cmap='Greys')
plt.show()

<font size=3 color="black">

## Deep Learning: AlexNet 

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import time

from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Activation, Dense, Flatten
from keras.layers import Activation, Dropout, BatchNormalization
from keras.utils import plot_model
from keras import optimizers

np.random.seed(10)

<font size=2 color="black">

## Data of the System to be analyzed: oxflowers17

<font size=4 color="black"> 
    
[The oxflowers17 database](http://www.robots.ox.ac.uk/~vgg/data/flowers/17/)


    
<font size=4 color="black">
$$ $$
 
<img src="https://drive.google.com/uc?id=19VcKMGRrm71CP1f2Yv4SrDm0PiDJ1_q1" width=400 height=800 align = "center" >     



<font size=2 color='black'>

##  Generation or extraction of the raw data

 <font size=4 color='black'>  
    
Install the library tflearn to get the data. 
$$ $$
[TFLearn library](http://tflearn.org/)

In [None]:
! pip install tflearn

In [None]:
import tflearn.datasets.oxflower17 as oxflower17
train_x, train_y = oxflower17.load_data(one_hot=True)

<font size=2 color='black'>

##  Analysis of the raw data  

 <font size=4 color='black'>   

 The oxflower17 dataset consists of 1360 colour images (224 pixels high and 224 pixes width) of flowers in 17 classes, with 80 images per class. All images will be used for training. Before running the model, it will be indicated the ratio of samples that will be used for validation.


The 17 classes are:
 
| index | class name |
| --- | --- |
| 0 | Daffodil|
| 1 | Snowdrop|
| 2 | Daisy|    
| 3 | ColtsFoot|										
| 4 | Dandelion|										
| 5 | Cowslip|
| 6 | Buttercup|   
| 7 | Windflower|										
| 8 | Pansy|										
| 9 | LilyValley|										
|10 | Bluebell |										
|11 | Crocus|
|12 | Iris|										
|13 | Tigerlily|										
|14 | Tulip|										
|15 | Fritillary|
|16 | Sunflower|										       

<font size=2 color="black">
    
## Viewing one sample from the data sets

<font size=4 color='black'>
    
We define a dictionary to associate the class number to a class name.



In [None]:
dic = {0: 'Daffodil', 1: 'Snowdrop', 2: 'Daisy', 3: 'ColtsFoot', 4: 'Dandelion', \
       5: 'Cowslip', 6: 'Buttercup', 7: 'Windflower', 8: 'Pansy', 9:'LilyValley', \
       10: 'Bluebell', 11: 'Crocus', 12: 'Iris', 13: 'Tigerlily', 14:'Tulip', \
       15: 'Fritillary', 16: 'Sunflower'}

<font size=4 color="black">
    
Next, we show a sample: its target and image.

In [None]:
# Plotting the content of a sample

sample = 72

plt.imshow(train_x[sample]);
print('y =',  np.squeeze(train_y[sample]))

for i in [i for i,x in enumerate(train_y[sample]) if x == 1]:
    print('')

print('y =',  i, ';', 'the sample', sample, 'corresponds to a(an)', dic[i])

<font size=2 color='black'>

##  Transformation of the raw data 

In [None]:
print('the shape is', train_x.shape)

In [None]:
print(train_x[0][0:5][0:2])

In [None]:
print(train_y[0])

<font size=2 color='black'>

##  Transformation of the raw data 
    
<font size=4 color='black'>
$$ $$    
The raw data are yet renormalized. We do not do anything more

In [None]:
print('train_x shape:', train_x.shape)
print('train_y shape:', train_y.shape)

In [None]:
from sklearn.model_selection import train_test_split

# Choose your test size to split between training and testing sets:
train_x, test_x, train_y, test_y = train_test_split(train_x,train_y, test_size=0.1, random_state=42)

In [None]:
print(train_x.shape)
print(test_x.shape)
print(train_y.shape)
print(test_y.shape)

<font size=2 color='black'>

##  Definition of the neural network architecture
  

<font size=5 color='black'> 
    
Keras has two different modes to define the architecture:

<font size=4 color='black'>     

1. The sequential model. It is a sequential stack of layers.
$$ $$    
2. The functional API. It is the way to go for defining complex models, such as multi-output models, directed acyclic graphs, or models with shared layers.  
$$ $$

In the present case, we will use the sequential mode for constructing the architecture of the network.
    
[Keras: Sequential model API](https://keras.io/models/sequential/)

<font size=4 color="blac">
    
[Keras: Convolutional layers](https://keras.io/layers/convolutional/)
$$ $$
[Keras: Pooling layers](https://keras.io/layers/pooling/))    
$$ $$
[Keras: Batch Normalization](https://keras.io/layers/normalization/)

In [None]:
# Creating a Sequential Model and adding the layers

def architecture(batch_normalization, dropout, input_shape, activation):
    
    # Creating a sequential model
    model = Sequential()
    
    # 1st Convolutional layer
    model.add(Conv2D(filters=96, activation=activation, input_shape=input_shape,\
      kernel_size=(11,11), strides=(4,4), padding='valid', kernel_initializer='he_uniform'))
    # Pooling
    model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='valid')) 
    if batch_normalization: 
        model.add(BatchNormalization())  

    # 2nd Convolutional Layer
    model.add(Conv2D(filters=256, activation=activation, kernel_size=(5,5), \
                     strides=(1,1), padding='valid'))
    # Pooling
    model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='valid'))
    if batch_normalization: 
        model.add(BatchNormalization())  
    
    # 3rd Convolutional Layer
    model.add(Conv2D(filters=384, activation=activation, kernel_size=(3,3), strides=(1,1), padding='valid'))
     
    # 4th Convolutional Layer
    model.add(Conv2D(filters=384, activation=activation, kernel_size=(3,3), strides=(1,1), padding='valid'))
    
    # 5th Convolutional Layer
    model.add(Conv2D(filters=256, activation=activation, kernel_size=(3,3), strides=(1,1), padding='valid'))    
    # Pooling
    model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='valid'))
    if batch_normalization: 
        model.add(BatchNormalization())   

    # Passing it to a dense layer
    model.add(Flatten())
    if dropout:
        model.add(Dropout(0.4))
    
    # 1st Dense Layer
    model.add(Dense(512, activation=activation, input_shape=(224*224*3,), kernel_initializer = 'he_uniform'))
    # Add Dropout to prevent overfitting
    if dropout:
        model.add(Dropout(0.4))
    if batch_normalization: 
        model.add(BatchNormalization())   
    
    # 2nd Dense Layer
    model.add(Dense(512, activation=activation, kernel_initializer = 'he_uniform'))
    model.add(Activation('relu'))
    # Add Dropout
    if dropout:
        model.add(Dropout(0.4))
    if batch_normalization: 
        model.add(BatchNormalization())   

    # Output Layer
    model.add(Dense(17, activation='softmax'))
              
    return model
            
    

<font size=2 color='black'>

##  Generating a model of deep neural network 


In [None]:
# Generating the model using the defined architecture

batch_normalization=True
dropout=True
one_image = (224, 224, 3)
activation = 'relu'

oxflower17_model = architecture(batch_normalization, dropout, one_image, activation)


In [None]:
plot_model(oxflower17_model, to_file='oxflower17_model.png', show_shapes=True, show_layer_names=True)

In [None]:

oxflower17_model.summary()

<font size=2 color='black'>

##  Compiling the model 

In [None]:
#Compiling the model using Adam as optimizer

lr = 0.001  # Learning rate

oxflower17_model.compile(loss='categorical_crossentropy', metrics=['accuracy'], \
optimizer=optimizers.Adam(learning_rate=lr,beta_1=0.9, beta_2=0.999, amsgrad=False))


<font size=2 color='black'>

##  Running the model    

In [None]:
start_time = time.time()

batch_size=32
num_epochs = 50

history = oxflower17_model.fit(train_x, train_y, batch_size=batch_size,\
       epochs=num_epochs, validation_data=(test_x,test_y),verbose=1, shuffle=True)


end_time = time.time()
print("Time for training: {:10.4f}s".format(end_time - start_time))

<font size=4 color="black">
    
* Note: if you run `fit()` again, the `model` will continue training, starting with the parameters it has already learnt, instead of reinitializing them.


<font size=2 color='black'>

##  Plotting the loss function 

In [None]:
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Lr = 0.001, BatchNorm=True \n Dropout = 0.4')
plt.ylabel('Cost')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper right')
plt.ylim(top=13)    # The instruction is used to limit the upper value of the loss function 
plt.ylim(bottom=0)  # The instruction is used to limit the lower value of the loss function
plt.show()

<font size=2 color='black'>

##  Plotting the accuracy 

In [None]:
# dependiendo de la versión de tensorflow que se este usando la información del accuracy se puede guardar an accuracy y val_accuracy
# o en acc y val_acc
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('Lr = 0.001, BatchNorm=True \n Dropout = 0.4')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='lower right')
plt.show()

<font size=2 color="black">
    
## Data augmentation

<font size=4 color="black">
$$ $$
shear_range, zoom_range, and horizontal_flip are some of the parameter available in Keras that define the transformation of the images

[Keras: Data augmetation](https://keras.io/preprocessing/image/)


<font size=2 color='black'>

##  Generating a model of deep neural network 


In [None]:
# Generating the model using the defined architecture

batch_normalization=True
dropout=True
one_image = (224, 224, 3)
activation = 'relu'

oxflower17_model = architecture(batch_normalization, dropout, one_image, activation)


<font size=2 color='black'>

##  Compiling the model 

In [None]:
#Compiling the model using Adam as optimizer

lr = 0.001  # Learning rate

oxflower17_model.compile(loss='categorical_crossentropy', metrics=['accuracy'], \
optimizer=optimizers.Adam(learning_rate=lr,beta_1=0.9, beta_2=0.999, amsgrad=False))


In [None]:
from keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(
    shear_range = 0.2,
    zoom_range = 0.2,
    horizontal_flip = True
)

<font size=2 color='black'>

##  Running the model    

<font size=4 color="black">

[Comment: Keras flow method](https://theailearner.com/2019/07/06/imagedatagenerator-flow-method/)

<font size=4 color="black">
This process requires long times, depending of the number of steps per epoch, the number of epochs and the number of images that will be generated during the data augmentation (batch_size) 

In [None]:
# compute quantities required for featurewise normalization
# (std, mean, and principal components if ZCA whitening is applied)
train_datagen.fit(train_x)

train_generator = train_datagen.flow(
    train_x,
    train_y,
    batch_size = 32,
    shuffle=True
)

In [None]:
get_steps_augment = 64 

print ("X_train shape: " + str(train_x.shape[0]))
steps = int(train_x.shape[0]/get_steps_augment)
print("Augmentation steps = {}".format(steps))


In [None]:
start_time = time.time()

num_epochs = 100

history = oxflower17_model.fit(train_generator, steps_per_epoch=steps,\
        epochs=num_epochs, validation_data=(test_x, test_y), verbose=1, shuffle=True)


end_time = time.time()
print("Time for training: {:10.4f}s".format(end_time - start_time))

<font size=4 color="black">
    
* Note: if you run `fit()` again, the `model` will continue training, starting with the parameters it has already learnt, instead of reinitializing them.


In [None]:
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Lr = 0.001, BatchNorm=True \n Dropout = 0.4')
plt.ylabel('Cost')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper right')
plt.ylim(top=13)    # The instruction is used to limit the upper value of the loss function 
plt.ylim(bottom=0)  # The instruction is used to limit the lower value of the loss function
plt.show()

In [None]:
# dependiendo de la versión de tensorflow que se este usando la información del accuracy se puede guardar an accuracy y val_accuracy
# o en acc y val_acc
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('Lr = 0.001, BatchNorm=True \n Dropout = 0.4')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='lower right')
plt.show()

In [None]:
# Predicting the image associated to the each sample in the test set (X_test)
predictions = oxflower17_model.predict(test_x)

In [None]:
print(type(predictions))
print(predictions.shape)

In [None]:
# Predicting the image associated to the sample 
# np.argmax returns the index of the maximum value
sample = 17
prediction = np.argmax(predictions[sample])
print("Prediction number=", prediction, ', it corresponds to a', dic[prediction])


# Plotting the content of a sample

plt.imshow(train_x[sample]);
print('\ny =',  np.squeeze(train_y[sample]))

for i in [i for i,x in enumerate(train_y[sample]) if x == 1]:
    print('')

print('y =',  i, ';', 'the sample', sample, 'corresponds to a(an)', dic[i])