# Course:  Convolutional Neural Networks for Image Classification

## Section-5
### Construct deep architectures for CNN models
#### How much Dropout?

**Description:**  
*Analyze percentage of dropout after every layer  
Interpret notation*

**File:** *dropout.ipynb*

### Algorithm:

**--> Step 1:** Open preprocessed dataset  
**--> Step 2:** Convert classes vectors to binary matrices  
**--> Step 3:** Choose **percentage of dropout**  
**--> Step 4:** Visualize built CNN models  
**--> Step 5:** Set up learning rate & epochs  
**--> Step 6:** Train built CNN models  
**--> Step 7:** Show and plot accuracies  
**--> Step 8:** Make a conclusion  


**Result:**  
- Chosen architecture for every preprocessed dataset  


## Importing libraries

In [None]:
# Importing needed libraries
import matplotlib.pyplot as plt
import numpy as np
import h5py


from keras.utils.np_utils import to_categorical
from keras.models import Sequential
from keras.layers import Conv2D, MaxPool2D, Flatten, Dense, Dropout
from keras.callbacks import LearningRateScheduler
from keras.utils import plot_model


## Setting up full path to preprocessed datasets

In [None]:
# Full or absolute path to 'Section4' with preprocessed datasets
# (!) On Windows, the path should look like following:
# r'C:\Users\your_name\PycharmProjects\CNNCourse\Section4'
# or:
# 'C:\\Users\\your_name\\PycharmProjects\\CNNCourse\\Section4'
full_path_to_Section4 = \
    '/home/valentyn/PycharmProjects/CNNCourse/Section4'


### RGB custom dataset (255.0 ==> mean ==> std)

## Step 1: Opening preprocessed dataset

In [None]:
# Opening saved custom dataset from HDF5 binary file
# Initiating File object
# Opening file in reading mode by 'r'
# (!) On Windows, it might need to change
# this: + '/' +
# to this: + '\' +
# or to this: + '\\' +
with h5py.File(full_path_to_Section4 + '/' + 'custom' + '/' + 
               'dataset_custom_rgb_255_mean_std.hdf5', 'r') as f:
    
    # Showing all keys in the HDF5 binary file
    print(list(f.keys()))
    
    # Extracting saved arrays for training by appropriate keys
    # Saving them into new variables
    x_train = f['x_train']  # HDF5 dataset
    y_train = f['y_train']  # HDF5 dataset
    # Converting them into Numpy arrays
    x_train = np.array(x_train)  # Numpy arrays
    y_train = np.array(y_train)  # Numpy arrays
    
    
    # Extracting saved arrays for validation by appropriate keys
    # Saving them into new variables
    x_validation = f['x_validation']  # HDF5 dataset
    y_validation = f['y_validation']  # HDF5 dataset
    # Converting them into Numpy arrays
    x_validation = np.array(x_validation)  # Numpy arrays
    y_validation = np.array(y_validation)  # Numpy arrays
    
    
    # Extracting saved arrays for testing by appropriate keys
    # Saving them into new variables
    x_test = f['x_test']  # HDF5 dataset
    y_test = f['y_test']  # HDF5 dataset
    # Converting them into Numpy arrays
    x_test = np.array(x_test)  # Numpy arrays
    y_test = np.array(y_test)  # Numpy arrays


In [None]:
# Showing types of loaded arrays
print(type(x_train))
print(type(y_train))
print(type(x_validation))
print(type(y_validation))
print(type(x_test))
print(type(y_test))
print()


# Showing shapes of loaded arrays
print(x_train.shape)
print(y_train.shape)
print(x_validation.shape)
print(y_validation.shape)
print(x_test.shape)
print(y_test.shape)


### RGB custom dataset (255.0 ==> mean ==> std)

## Step 2: Converting classes vectors to classes matrices

In [None]:
# Showing class index from the vector
print('Class index from vector:', y_train[3])
print()

# Preparing classes to be passed into the model
# Transforming them from vectors to binary matrices
# It is needed to set relationship between classes to be understood by the algorithm
# Such format is commonly used in training and predicting
y_train = to_categorical(y_train, num_classes = 5)
y_validation = to_categorical(y_validation, num_classes = 5)


# Showing shapes of converted vectors into matrices
print(y_train.shape)
print(y_validation.shape)
print()


# Showing class index from the matrix
print('Class index from matrix:', y_train[3])


### RGB custom dataset (255.0 ==> mean ==> std)

## Step 3: Choosing percentage of dropout

### Notation

**C** - convolutional layer  
**P** - pooling  
**D** - dropout  
  
Examples:
* **8C5** - convolutional layer with 8 feature maps and kernels of spatial size 5x5  
* **P2** - pooling operation with 2x2 window and stride 2  
*  **128** - fully connected layer (dense layer) with 128 neurons  
*  **D15** - 15% of dropout  
  
Definitions:
* **filters** (also called as kernels or cores) are trainable parameters  
* **weights** are values of filters that network learns during training  
* **strides** are steps by which window of filter size goes through the input  
* **padding** is a 0-valued frame used to process edges of the input  
* **Dropout** is a regularization technique that helps to prevent overfitting  
  
Some keywords values:
* **kernel_size=5** sets the filter size to be 5x5
* **strides=1** is a default value
* **padding='valid'** is a default value, meaning that output will be reduced: kernel_size - 1  
* **padding='same'** means that output will be of the same spatial size as input  
* **activation='relu'** sets ReLU (Rectified Linear Unit) as activation function  
  
Calculations of spatial size for feature maps after convolutional layer:  
* **height_output = 1 + (height_input + 2 * pad - kernel_size) / stride**
* **width_output = 1 + (width_input + 2 * pad - kernel_size) / stride**
  
Example without pad frame:
* **height_output = 1 + (64 + 2 * 0 - 5) / 1 = 60**
* **width_output = 1 + (64 + 2 * 0 - 5) / 1 = 60**
  
Example with pad frame:
* **height_output = 1 + (64 + 2 * 2 - 5) / 1 = 64**
* **width_output = 1 + (64 + 2 * 2 - 5) / 1 = 64**
  

In [None]:
# Building 5 models
# RGB --> {64C5-P2} --> {128C5-P2} --> {256C5-P2} --> {512C5-P2} --> 2048 --> 5
# RGB --> {64C5-P2-D10} --> {128C5-P2-D10} --> {256C5-P2-D10} --> {512C5-P2-D10} --> 2048-D10 --> 5
# RGB --> {64C5-P2-D20} --> {128C5-P2-D20} --> {256C5-P2-D20} --> {512C5-P2-D20} --> 2048-D20 --> 5
# RGB --> {64C5-P2-D30} --> {128C5-P2-D30} --> {256C5-P2-D30} --> {512C5-P2-D30} --> 2048-D30 --> 5
# RGB --> {64C5-P2-D40} --> {128C5-P2-D40} --> {256C5-P2-D40} --> {512C5-P2-D40} --> 2048-D40 --> 5


# Defining list to collect models in
model = []


# Building models in a loop
for i in range(5):
    # Initializing model to be as linear stack of layers
    temp = Sequential()

    # Adding first convolutional-pooling pair
    temp.add(Conv2D(64, kernel_size=5, padding='same', activation='relu', input_shape=(64, 64, 3)))
    temp.add(MaxPool2D())
    temp.add(Dropout(0.1 * i))

    # Adding second convolutional-pooling pair
    temp.add(Conv2D(128, kernel_size=5, padding='same', activation='relu'))
    temp.add(MaxPool2D())
    temp.add(Dropout(0.1 * i))

    # Adding third convolutional-pooling pair
    temp.add(Conv2D(256, kernel_size=5, padding='same', activation='relu'))
    temp.add(MaxPool2D())
    temp.add(Dropout(0.1 * i))

    # Adding fourth convolutional-pooling pair
    temp.add(Conv2D(512, kernel_size=5, padding='same', activation='relu'))
    temp.add(MaxPool2D())
    temp.add(Dropout(0.1 * i))

    # Adding fully connected layers
    temp.add(Flatten())
    temp.add(Dense(2048, activation='relu'))
    temp.add(Dropout(0.1 * i))
    temp.add(Dense(5, activation='softmax'))

    # Compiling created model
    temp.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

    # Adding current model in the list
    model.append(temp)


# Check point
print('5 models are compiled successfully')


### RGB custom dataset (255.0 ==> mean ==> std)

## Step 4: Visualizing built CNN models

In [None]:
# Plotting model's layers in form of flowchart
plot_model(model[4],
           to_file='model.png',
           show_shapes=True,
           show_layer_names=False,
           rankdir='TB',
           dpi=500)


In [None]:
# Showing model's summary in form of table
model[4].summary()


In [None]:
# Showing dropout rate
model[4].layers[2].rate


### RGB custom dataset (255.0 ==> mean ==> std)

## Step 5: Setting up learning rate & epochs

In [None]:
# Defining number of epochs
epochs = 20

# Defining schedule to update learning rate
learning_rate = LearningRateScheduler(lambda x: 1e-3 * 0.95 ** (x + epochs), verbose=1)

# Check point
print('Number of epochs and schedule for learning rate are set successfully')


### RGB custom dataset (255.0 ==> mean ==> std)

## Step 6: Training built CNN models

In [None]:
# If you're using Nvidia GPU and 'cnngpu' environment, there might be an issue like:
'''Failed to get convolution algorithm. This is probably because cuDNN failed to initialize'''
# In this case, close all Jupyter Notebooks, close Terminal Window or Anaconda Prompt
# Open again just this one Jupyter Notebook and run it


# Defining list to collect results in
h = []


# Training models in a loop
for i in range(5):
    # Current model
    temp = model[i].fit(x_train, y_train,
                        batch_size=50,
                        epochs=epochs,
                        validation_data=(x_validation, y_validation),
                        callbacks=[learning_rate],
                        verbose=1)
    
    # Adding results of current model in the list
    h.append(temp)


### RGB custom dataset (255.0 ==> mean ==> std)

## Step 7: Showing and plotting accuracies

In [None]:
# Accuracies of the models
for i in range(5):
    print('Model {0}: Training accuracy={1:.5f}, Validation accuracy={2:.5f}'.
                                                         format(i + 1,
                                                                max(h[i].history['accuracy']),
                                                                max(h[i].history['val_accuracy'])))


In [None]:
# Magic function that renders the figure in a jupyter notebook
# instead of displaying a figure object
%matplotlib inline


# Setting default size of the plot
plt.rcParams['figure.figsize'] = (12.0, 6.0)


# Plotting accuracies for every model
plt.plot(h[0].history['val_accuracy'], '-o')
plt.plot(h[1].history['val_accuracy'], '-o')
plt.plot(h[2].history['val_accuracy'], '-o')
plt.plot(h[3].history['val_accuracy'], '-o')
plt.plot(h[4].history['val_accuracy'], '-o')


# Setting limit along Y axis
plt.ylim(0.62, 0.72)


# Showing legend
plt.legend(['model_1', 'model_2', 'model_3', 'model_4', 'model_5'],
           loc='lower right',
           fontsize='xx-large')


# Giving name to axes
plt.xlabel('Epoch', fontsize=16)
plt.ylabel('Accuracy', fontsize=16)


# Giving name to the plot
plt.title('Models accuracies: Custom Dataset', fontsize=16)


# Showing the plot
plt.show()


In [None]:
# Showing list of scheduled learning rate for every epoch
print(h[0].history['lr'])


In [None]:
# Magic function that renders the figure in a jupyter notebook
# instead of displaying a figure object
%matplotlib inline


# Plotting scheduled learning rate
plt.plot(h[0].history['lr'], '-mo')


# Showing the plot
plt.show()


### RGB custom dataset (255.0 ==> mean ==> std)

## Step 8: Making a conclusion

In [None]:
# According to validation accuracy, the 3rd model has the highest value

# The choice for custom dataset is 3rd model
# RGB input --> {64C5-P2-D20} --> {128C5-P2-D20} --> {256C5-P2-D20} --> {512C5-P2-D20} --> 2048-D20 --> 5
# GRAY input --> {64C5-P2-D20} --> {128C5-P2-D20} --> {256C5-P2-D20} --> {512C5-P2-D20} --> 2048-D20 --> 5

# RGB input: (64, 64, 3)
# GRAY input: (64, 64, 1)


### RGB CIFAR-10 dataset (255.0 ==> mean ==> std)

## Step 1: Opening preprocessed dataset

In [None]:
# Opening saved CIFAR-10 dataset from HDF5 binary file
# Initiating File object
# Opening file in reading mode by 'r'
# (!) On Windows, it might need to change
# this: + '/' +
# to this: + '\' +
# or to this: + '\\' +
with h5py.File(full_path_to_Section4 + '/' + 'cifar10' + '/' + 
               'dataset_cifar10_rgb_255_mean_std.hdf5', 'r') as f:
    
    # Showing all keys in the HDF5 binary file
    print(list(f.keys()))
    
    # Extracting saved arrays for training by appropriate keys
    # Saving them into new variables    
    x_train = f['x_train']  # HDF5 dataset
    y_train = f['y_train']  # HDF5 dataset
    # Converting them into Numpy arrays
    x_train = np.array(x_train)  # Numpy arrays
    y_train = np.array(y_train)  # Numpy arrays
    
    
    # Extracting saved arrays for validation by appropriate keys
    # Saving them into new variables 
    x_validation = f['x_validation']  # HDF5 dataset
    y_validation = f['y_validation']  # HDF5 dataset
    # Converting them into Numpy arrays
    x_validation = np.array(x_validation)  # Numpy arrays
    y_validation = np.array(y_validation)  # Numpy arrays
    
    
    # Extracting saved arrays for testing by appropriate keys
    # Saving them into new variables 
    x_test = f['x_test']  # HDF5 dataset
    y_test = f['y_test']  # HDF5 dataset
    # Converting them into Numpy arrays
    x_test = np.array(x_test)  # Numpy arrays
    y_test = np.array(y_test)  # Numpy arrays


In [None]:
# Showing types of loaded arrays
print(type(x_train))
print(type(y_train))
print(type(x_validation))
print(type(y_validation))
print(type(x_test))
print(type(y_test))
print()


# Showing shapes of loaded arrays
print(x_train.shape)
print(y_train.shape)
print(x_validation.shape)
print(y_validation.shape)
print(x_test.shape)
print(y_test.shape)


### RGB CIFAR-10 dataset (255.0 ==> mean ==> std)

## Step 2: Converting classes vectors to classes matrices

In [None]:
# Showing class index from the vector
print('Class index from vector:', y_train[3])
print()

# Preparing classes to be passed into the model
# Transforming them from vectors to binary matrices
# It is needed to set relationship between classes to be understood by the algorithm
# Such format is commonly used in training and predicting
y_train = to_categorical(y_train, num_classes = 10)
y_validation = to_categorical(y_validation, num_classes = 10)


# Showing shapes of converted vectors into matrices
print(y_train.shape)
print(y_validation.shape)
print()


# Showing class index from the matrix
print('Class index from matrix:', y_train[3])


### RGB CIFAR-10 dataset (255.0 ==> mean ==> std)

## Step 3: Choosing percentage of dropout

### Notation

**C** - convolutional layer  
**P** - pooling  
**D** - dropout  
  
Examples:
* **8C5** - convolutional layer with 8 feature maps and kernels of spatial size 5x5  
* **P2** - pooling operation with 2x2 window and stride 2  
*  **128** - fully connected layer (dense layer) with 128 neurons  
*  **D15** - 15% of dropout  
  
Definitions:
* **filters** (also called as kernels or cores) are trainable parameters  
* **weights** are values of filters that network learns during training  
* **strides** are steps by which window of filter size goes through the input  
* **padding** is a 0-valued frame used to process edges of the input  
* **Dropout** is a regularization technique that helps to prevent overfitting  
  
Some keywords values:
* **kernel_size=5** sets the filter size to be 5x5
* **strides=1** is a default value
* **padding='valid'** is a default value, meaning that output will be reduced: kernel_size - 1  
* **padding='same'** means that output will be of the same spatial size as input  
* **activation='relu'** sets ReLU (Rectified Linear Unit) as activation function  
  
Calculations of spatial size for feature maps after convolutional layer:  
* **height_output = 1 + (height_input + 2 * pad - kernel_size) / stride**
* **width_output = 1 + (width_input + 2 * pad - kernel_size) / stride**
  
Example without pad frame:
* **height_output = 1 + (64 + 2 * 0 - 5) / 1 = 60**
* **width_output = 1 + (64 + 2 * 0 - 5) / 1 = 60**
  
Example with pad frame:
* **height_output = 1 + (64 + 2 * 2 - 5) / 1 = 64**
* **width_output = 1 + (64 + 2 * 2 - 5) / 1 = 64**
  

In [None]:
# Building 5 models
# RGB --> {128C5-P2} --> {256C5-P2} --> {512C5-P2} --> 256 --> 10
# RGB --> {128C5-P2-D10} --> {256C5-P2-D10} --> {512C5-P2-D10} --> 256-D10 --> 10
# RGB --> {128C5-P2-D20} --> {256C5-P2-D20} --> {512C5-P2-D20} --> 256-D20 --> 10
# RGB --> {128C5-P2-D30} --> {256C5-P2-D30} --> {512C5-P2-D30} --> 256-D30 --> 10
# RGB --> {128C5-P2-D40} --> {256C5-P2-D40} --> {512C5-P2-D40} --> 256-D40 --> 10


# Defining list to collect models in
model = []


# Building models in a loop
for i in range(5):
    # Initializing model to be as linear stack of layers
    temp = Sequential()

    # Adding first convolutional-pooling pair
    temp.add(Conv2D(128, kernel_size=5, padding='same', activation='relu', input_shape=(32, 32, 3)))
    temp.add(MaxPool2D())
    temp.add(Dropout(0.1 * i))

    # Adding second convolutional-pooling pair
    temp.add(Conv2D(256, kernel_size=5, padding='same', activation='relu'))
    temp.add(MaxPool2D())
    temp.add(Dropout(0.1 * i))

    # Adding third convolutional-pooling pair
    temp.add(Conv2D(512, kernel_size=5, padding='same', activation='relu'))
    temp.add(MaxPool2D())
    temp.add(Dropout(0.1 * i))

    # Adding fully connected layers
    temp.add(Flatten())
    temp.add(Dense(256, activation='relu'))
    temp.add(Dropout(0.1 * i))
    temp.add(Dense(10, activation='softmax'))

    # Compiling created model
    temp.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
    
    # Adding current model in the list
    model.append(temp)


# Check point
print('5 models are compiled successfully')


### RGB CIFAR-10 dataset (255.0 ==> mean ==> std)

## Step 4: Visualizing built CNN models

In [None]:
# Plotting model's layers in form of flowchart
plot_model(model[4],
           to_file='model.png',
           show_shapes=True,
           show_layer_names=False,
           rankdir='TB',
           dpi=500)


In [None]:
# Showing model's summary in form of table
model[4].summary()


In [None]:
# Showing dropout rate
model[4].layers[2].rate


### RGB CIFAR-10 dataset (255.0 ==> mean ==> std)

## Step 5: Setting up learning rate & epochs

In [None]:
# Defining number of epochs
epochs = 20

# Defining schedule to update learning rate
learning_rate = LearningRateScheduler(lambda x: 1e-3 * 0.95 ** (x + epochs), verbose=1)

# Check point
print('Number of epochs and schedule for learning rate are set successfully')


### RGB CIFAR-10 dataset (255.0 ==> mean ==> std)

## Step 6: Training built CNN models

In [None]:
# If you're using Nvidia GPU and 'cnngpu' environment, there might be an issue like:
'''Failed to get convolution algorithm. This is probably because cuDNN failed to initialize'''
# In this case, close all Jupyter Notebooks, close Terminal Window or Anaconda Prompt
# Open again just this one Jupyter Notebook and run it


# Defining list to collect results in
h = []


# Training models in a loop
for i in range(5):
    # Сurrent model
    temp = model[i].fit(x_train, y_train,
                        batch_size=50,
                        epochs=epochs,
                        validation_data=(x_validation, y_validation),
                        callbacks=[learning_rate],
                        verbose=1)
    
    # Adding results of current model in the list
    h.append(temp)


### RGB CIFAR-10 dataset (255.0 ==> mean ==> std)

## Step 7: Showing and plotting accuracies

In [None]:
# Accuracies of the models
for i in range(5):
    print('Model {0}: Training accuracy={1:.5f}, Validation accuracy={2:.5f}'.
                                                         format(i + 1,
                                                                max(h[i].history['accuracy']),
                                                                max(h[i].history['val_accuracy'])))


In [None]:
# Magic function that renders the figure in a jupyter notebook
# instead of displaying a figure object
%matplotlib inline


# Setting default size of the plot
plt.rcParams['figure.figsize'] = (12.0, 6.0)


# Plotting accuracies for every model
plt.plot(h[0].history['val_accuracy'], '-o')
plt.plot(h[1].history['val_accuracy'], '-o')
plt.plot(h[2].history['val_accuracy'], '-o')
plt.plot(h[3].history['val_accuracy'], '-o')
plt.plot(h[4].history['val_accuracy'], '-o')


# Setting limit along Y axis
plt.ylim(0.69, 0.832)


# Showing legend
plt.legend(['model_1', 'model_2', 'model_3', 'model_4', 'model_5'],
           loc='lower right',
           fontsize='xx-large')


# Giving name to axes
plt.xlabel('Epoch', fontsize=16)
plt.ylabel('Accuracy', fontsize=16)


# Giving name to the plot
plt.title('Models accuracies: CIFAR-10 dataset', fontsize=16)


# Showing the plot
plt.show()


In [None]:
# Showing list of scheduled learning rate for every epoch
print(h[0].history['lr'])


In [None]:
# Magic function that renders the figure in a jupyter notebook
# instead of displaying a figure object
%matplotlib inline


# Plotting scheduled learning rate
plt.plot(h[0].history['lr'], '-mo')


# Showing the plot
plt.show()


### RGB CIFAR-10 dataset (255.0 ==> mean ==> std)

## Step 8: Making a conclusion

In [None]:
# According to validation accuracy, the 4th and 5th models have the highest values

# The choice for CIFAR-10 dataset is 5th model
# RGB input --> {128C5-P2-D40} --> {256C5-P2-D40} --> {512C5-P2-D40} --> 256-D40 --> 10
# GRAY input --> {128C5-P2-D40} --> {256C5-P2-D40} --> {512C5-P2-D40} --> 256-D40 --> 10

# RGB input: (32, 32, 3)
# GRAY input: (32, 32, 1)


### GRAY MNIST dataset (255.0 ==> mean ==> std)

## Step 1: Opening preprocessed dataset

In [None]:
# Opening saved MNIST dataset from HDF5 binary file
# Initiating File object
# Opening file in reading mode by 'r'
# (!) On Windows, it might need to change
# this: + '/' +
# to this: + '\' +
# or to this: + '\\' +
with h5py.File(full_path_to_Section4 + '/' + 'mnist' + '/' + 
               'dataset_mnist_gray_255_mean_std.hdf5', 'r') as f:
    
    # Showing all keys in the HDF5 binary file
    print(list(f.keys()))
    
    # Extracting saved arrays for training by appropriate keys
    # Saving them into new variables    
    x_train = f['x_train']  # HDF5 dataset
    y_train = f['y_train']  # HDF5 dataset
    # Converting them into Numpy arrays
    x_train = np.array(x_train)  # Numpy arrays
    y_train = np.array(y_train)  # Numpy arrays
    
    
    # Extracting saved arrays for validation by appropriate keys
    # Saving them into new variables 
    x_validation = f['x_validation']  # HDF5 dataset
    y_validation = f['y_validation']  # HDF5 dataset
    # Converting them into Numpy arrays
    x_validation = np.array(x_validation)  # Numpy arrays
    y_validation = np.array(y_validation)  # Numpy arrays
    
    
    # Extracting saved arrays for testing by appropriate keys
    # Saving them into new variables 
    x_test = f['x_test']  # HDF5 dataset
    y_test = f['y_test']  # HDF5 dataset
    # Converting them into Numpy arrays
    x_test = np.array(x_test)  # Numpy arrays
    y_test = np.array(y_test)  # Numpy arrays


In [None]:
# Showing types of loaded arrays
print(type(x_train))
print(type(y_train))
print(type(x_validation))
print(type(y_validation))
print(type(x_test))
print(type(y_test))
print()


# Showing shapes of loaded arrays
print(x_train.shape)
print(y_train.shape)
print(x_validation.shape)
print(y_validation.shape)
print(x_test.shape)
print(y_test.shape)


### GRAY MNIST dataset (255.0 ==> mean ==> std)

## Step 2: Converting classes vectors to classes matrices

In [None]:
# Showing class index from the vector
print('Class index from vector:', y_train[3])
print()

# Preparing classes to be passed into the model
# Transforming them from vectors to binary matrices
# It is needed to set relationship between classes to be understood by the algorithm
# Such format is commonly used in training and predicting
y_train = to_categorical(y_train, num_classes = 10)
y_validation = to_categorical(y_validation, num_classes = 10)


# Showing shapes of converted vectors into matrices
print(y_train.shape)
print(y_validation.shape)
print()


# Showing class index from the matrix
print('Class index from matrix:', y_train[3])


### GRAY MNIST dataset (255.0 ==> mean ==> std)

## Step 3: Choosing percentage of dropout

### Notation

**C** - convolutional layer  
**P** - pooling  
**D** - dropout  
  
Examples:
* **8C5** - convolutional layer with 8 feature maps and kernels of spatial size 5x5  
* **P2** - pooling operation with 2x2 window and stride 2  
*  **128** - fully connected layer (dense layer) with 128 neurons  
*  **D15** - 15% of dropout  
  
Definitions:
* **filters** (also called as kernels or cores) are trainable parameters  
* **weights** are values of filters that network learns during training  
* **strides** are steps by which window of filter size goes through the input  
* **padding** is a 0-valued frame used to process edges of the input  
* **Dropout** is a regularization technique that helps to prevent overfitting  
  
Some keywords values:
* **kernel_size=5** sets the filter size to be 5x5
* **strides=1** is a default value
* **padding='valid'** is a default value, meaning that output will be reduced: kernel_size - 1  
* **padding='same'** means that output will be of the same spatial size as input  
* **activation='relu'** sets ReLU (Rectified Linear Unit) as activation function  
  
Calculations of spatial size for feature maps after convolutional layer:  
* **height_output = 1 + (height_input + 2 * pad - kernel_size) / stride**
* **width_output = 1 + (width_input + 2 * pad - kernel_size) / stride**
  
Example without pad frame:
* **height_output = 1 + (64 + 2 * 0 - 5) / 1 = 60**
* **width_output = 1 + (64 + 2 * 0 - 5) / 1 = 60**
  
Example with pad frame:
* **height_output = 1 + (64 + 2 * 2 - 5) / 1 = 64**
* **width_output = 1 + (64 + 2 * 2 - 5) / 1 = 64**
  

In [None]:
# Building 5 models
# GRAY --> {128C5-P2} --> {256C5-P2} --> {512C5-P2} --> 256 --> 10
# GRAY --> {128C5-P2-D10} --> {256C5-P2-D10} --> {512C5-P2-D10} --> 256-D10 --> 10
# GRAY --> {128C5-P2-D20} --> {256C5-P2-D20} --> {512C5-P2-D20} --> 256-D20 --> 10
# GRAY --> {128C5-P2-D30} --> {256C5-P2-D30} --> {512C5-P2-D30} --> 256-D30 --> 10
# GRAY --> {128C5-P2-D40} --> {256C5-P2-D40} --> {512C5-P2-D40} --> 256-D40 --> 10


# Defining list to collect models in
model = []


# Building models in a loop
for i in range(5):
    # Initializing model to be as linear stack of layers
    temp = Sequential()

    # Adding first convolutional-pooling pair
    temp.add(Conv2D(128, kernel_size=5, padding='same', activation='relu', input_shape=(28, 28, 1)))
    temp.add(MaxPool2D())
    temp.add(Dropout(0.1 * i))

    # Adding second convolutional-pooling pair
    temp.add(Conv2D(256, kernel_size=5, padding='same', activation='relu'))
    temp.add(MaxPool2D())
    temp.add(Dropout(0.1 * i))

    # Adding third convolutional-pooling pair
    temp.add(Conv2D(512, kernel_size=5, padding='same', activation='relu'))
    temp.add(MaxPool2D())
    temp.add(Dropout(0.1 * i))

    # Adding fully connected layers
    temp.add(Flatten())
    temp.add(Dense(256, activation='relu'))
    temp.add(Dropout(0.1 * i))
    temp.add(Dense(10, activation='softmax'))

    # Compiling created model
    temp.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
    
    # Adding current model in the list
    model.append(temp)


# Check point
print('5 models are compiled successfully')


### GRAY MNIST dataset (255.0 ==> mean ==> std)

## Step 4: Visualizing built CNN models

In [None]:
# Plotting model's layers in form of flowchart
plot_model(model[4],
           to_file='model.png',
           show_shapes=True,
           show_layer_names=False,
           rankdir='TB',
           dpi=500)


In [None]:
# Showing model's summary in form of table
model[4].summary()


In [None]:
# Showing dropout rate
model[1].layers[2].rate


### GRAY MNIST dataset (255.0 ==> mean ==> std)

## Step 5: Setting up learning rate & epochs

In [None]:
# Defining number of epochs
epochs = 20

# Defining schedule to update learning rate
learning_rate = LearningRateScheduler(lambda x: 1e-3 * 0.95 ** (x + epochs), verbose=1)

# Check point
print('Number of epochs and schedule for learning rate are set successfully')


### GRAY MNIST dataset (255.0 ==> mean ==> std)

## Step 6: Training built CNN models

In [None]:
# If you're using Nvidia GPU and 'cnngpu' environment, there might be an issue like:
'''Failed to get convolution algorithm. This is probably because cuDNN failed to initialize'''
# In this case, close all Jupyter Notebooks, close Terminal Window or Anaconda Prompt
# Open again just this one Jupyter Notebook and run it


# Defining list to collect results in
h = []


# Training models in a loop
for i in range(5):
    # Сurrent model
    temp = model[i].fit(x_train, y_train,
                        batch_size=50,
                        epochs=epochs,
                        validation_data=(x_validation, y_validation),
                        callbacks=[learning_rate],
                        verbose=1)
    
    # Adding results of current model in the list
    h.append(temp)


### GRAY MNIST dataset (255.0 ==> mean ==> std)

## Step 7: Showing and plotting accuracies

In [None]:
# Accuracies of the models
for i in range(5):
    print('Model {0}: Training accuracy={1:.5f}, Validation accuracy={2:.5f}'.
                                                         format(i + 1,
                                                                max(h[i].history['accuracy']),
                                                                max(h[i].history['val_accuracy'])))


In [None]:
# Magic function that renders the figure in a jupyter notebook
# instead of displaying a figure object
%matplotlib inline


# Setting default size of the plot
plt.rcParams['figure.figsize'] = (12.0, 6.0)


# Plotting accuracies for every model
plt.plot(h[0].history['val_accuracy'], '-o')
plt.plot(h[1].history['val_accuracy'], '-o')
plt.plot(h[2].history['val_accuracy'], '-o')
plt.plot(h[3].history['val_accuracy'], '-o')
plt.plot(h[4].history['val_accuracy'], '-o')


# Setting limit along Y axis
plt.ylim(0.984, 0.9951)


# Showing legend
plt.legend(['model_1', 'model_2', 'model_3', 'model_4', 'model_5'],
           loc='lower right',
           fontsize='xx-large')


# Giving name to axes
plt.xlabel('Epoch', fontsize=16)
plt.ylabel('Accuracy', fontsize=16)


# Giving name to the plot
plt.title('Models accuracies: MNIST dataset', fontsize=16)


# Showing the plot
plt.show()


In [None]:
# Showing list of scheduled learning rate for every epoch
print(h[0].history['lr'])


In [None]:
# Magic function that renders the figure in a jupyter notebook
# instead of displaying a figure object
%matplotlib inline


# Plotting scheduled learning rate
plt.plot(h[0].history['lr'], '-mo')


# Showing the plot
plt.show()


### MNIST dataset (255.0 ==> mean ==> std)

## Step 8: Making a conclusion

In [None]:
# According to validation accuracy, the 4th and 5th models have the highest values

# The choice for MNIST dataset is 4th model
# GRAY input --> {128C5-P2-D30} --> {256C5-P2-D30} --> {512C5-P2-D30} --> 256-D30 --> 10

# GRAY input: (28, 28, 1)


### RGB Traffic Signs dataset (255.0 ==> mean ==> std)

## Step 1: Opening preprocessed dataset

In [None]:
# Opening saved Traffic Signs dataset from HDF5 binary file
# Initiating File object
# Opening file in reading mode by 'r'
# (!) On Windows, it might need to change
# this: + '/' +
# to this: + '\' +
# or to this: + '\\' +
with h5py.File(full_path_to_Section4 + '/' + 'ts' + '/' + 
               'dataset_ts_rgb_255_mean_std.hdf5', 'r') as f:
    
    # Showing all keys in the HDF5 binary file
    print(list(f.keys()))
    
    # Extracting saved arrays for training by appropriate keys
    # Saving them into new variables    
    x_train = f['x_train']  # HDF5 dataset
    y_train = f['y_train']  # HDF5 dataset
    # Converting them into Numpy arrays
    x_train = np.array(x_train)  # Numpy arrays
    y_train = np.array(y_train)  # Numpy arrays
    
    
    # Extracting saved arrays for validation by appropriate keys
    # Saving them into new variables 
    x_validation = f['x_validation']  # HDF5 dataset
    y_validation = f['y_validation']  # HDF5 dataset
    # Converting them into Numpy arrays
    x_validation = np.array(x_validation)  # Numpy arrays
    y_validation = np.array(y_validation)  # Numpy arrays
    
    
    # Extracting saved arrays for testing by appropriate keys
    # Saving them into new variables 
    x_test = f['x_test']  # HDF5 dataset
    y_test = f['y_test']  # HDF5 dataset
    # Converting them into Numpy arrays
    x_test = np.array(x_test)  # Numpy arrays
    y_test = np.array(y_test)  # Numpy arrays


In [None]:
# Showing types of loaded arrays
print(type(x_train))
print(type(y_train))
print(type(x_validation))
print(type(y_validation))
print(type(x_test))
print(type(y_test))
print()


# Showing shapes of loaded arrays
print(x_train.shape)
print(y_train.shape)
print(x_validation.shape)
print(y_validation.shape)
print(x_test.shape)
print(y_test.shape)


### RGB Traffic Signs dataset (255.0 ==> mean ==> std)

## Step 2: Converting classes vectors to classes matrices

In [None]:
# Showing class index from the vector
print('Class index from vector:', y_train[3])
print()

# Preparing classes to be passed into the model
# Transforming them from vectors to binary matrices
# It is needed to set relationship between classes to be understood by the algorithm
# Such format is commonly used in training and predicting
y_train = to_categorical(y_train, num_classes = 43)
y_validation = to_categorical(y_validation, num_classes = 43)


# Showing shapes of converted vectors into matrices
print(y_train.shape)
print(y_validation.shape)
print()


# Showing class index from the matrix
print('Class index from matrix:', y_train[3])


### RGB Traffic Signs dataset (255.0 ==> mean ==> std)

## Step 3: Choosing percentage of dropout

### Notation

**C** - convolutional layer  
**P** - pooling  
**D** - dropout  
  
Examples:
* **8C5** - convolutional layer with 8 feature maps and kernels of spatial size 5x5  
* **P2** - pooling operation with 2x2 window and stride 2  
*  **128** - fully connected layer (dense layer) with 128 neurons  
*  **D15** - 15% of dropout  
  
Definitions:
* **filters** (also called as kernels or cores) are trainable parameters  
* **weights** are values of filters that network learns during training  
* **strides** are steps by which window of filter size goes through the input  
* **padding** is a 0-valued frame used to process edges of the input  
* **Dropout** is a regularization technique that helps to prevent overfitting  
  
Some keywords values:
* **kernel_size=5** sets the filter size to be 5x5
* **strides=1** is a default value
* **padding='valid'** is a default value, meaning that output will be reduced: kernel_size - 1  
* **padding='same'** means that output will be of the same spatial size as input  
* **activation='relu'** sets ReLU (Rectified Linear Unit) as activation function  
  
Calculations of spatial size for feature maps after convolutional layer:  
* **height_output = 1 + (height_input + 2 * pad - kernel_size) / stride**
* **width_output = 1 + (width_input + 2 * pad - kernel_size) / stride**
  
Example without pad frame:
* **height_output = 1 + (64 + 2 * 0 - 5) / 1 = 60**
* **width_output = 1 + (64 + 2 * 0 - 5) / 1 = 60**
  
Example with pad frame:
* **height_output = 1 + (64 + 2 * 2 - 5) / 1 = 64**
* **width_output = 1 + (64 + 2 * 2 - 5) / 1 = 64**
  

In [None]:
# Building 5 models
# RGB --> {128C5-P2} --> {256C5-P2} --> {512C5-P2} --> {1024C3-P2} --> 2048 --> 43
# RGB --> {128C5-P2-D10} --> {256C5-P2-D10} --> {512C5-P2-D10} --> {1024C3-P2-D10} --> 2048-D10 --> 43
# RGB --> {128C5-P2-D20} --> {256C5-P2-D20} --> {512C5-P2-D20} --> {1024C3-P2-D20} --> 2048-D20 --> 43
# RGB --> {128C5-P2-D30} --> {256C5-P2-D30} --> {512C5-P2-D30} --> {1024C3-P2-D30} --> 2048-D30 --> 43
# RGB --> {128C5-P2-D40} --> {256C5-P2-D40} --> {512C5-P2-D40} --> {1024C3-P2-D40} --> 2048-D40 --> 43


# Defining list to collect models in
model = []


# Building models in a loop
for i in range(5):
    # Initializing model to be as linear stack of layers
    temp = Sequential()

    # Adding first convolutional-pooling pair
    temp.add(Conv2D(128, kernel_size=5, padding='same', activation='relu', input_shape=(48, 48, 3)))
    temp.add(MaxPool2D())
    temp.add(Dropout(0.1 * i))

    # Adding second convolutional-pooling pair
    temp.add(Conv2D(256, kernel_size=5, padding='same', activation='relu'))
    temp.add(MaxPool2D())
    temp.add(Dropout(0.1 * i))

    # Adding third convolutional-pooling pair
    temp.add(Conv2D(512, kernel_size=5, padding='same', activation='relu'))
    temp.add(MaxPool2D())
    temp.add(Dropout(0.1 * i))

    # Adding fourth convolutional-pooling pair
    temp.add(Conv2D(1024, kernel_size=3, padding='same', activation='relu'))
    temp.add(MaxPool2D())
    temp.add(Dropout(0.1 * i))

    # Adding fully connected layers
    temp.add(Flatten())
    temp.add(Dense(2048, activation='relu'))
    temp.add(Dropout(0.1 * i))
    temp.add(Dense(43, activation='softmax'))

    # Compiling created model
    temp.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
    
    # Adding current model in the list
    model.append(temp)
    

# Check point
print('5 models are compiled successfully')


### RGB Traffic Signs dataset (255.0 ==> mean ==> std)

## Step 4: Visualizing built CNN models

In [None]:
# Plotting model's layers in form of flowchart
plot_model(model[4],
           to_file='model.png',
           show_shapes=True,
           show_layer_names=False,
           rankdir='TB',
           dpi=500)


In [None]:
# Showing model's summary in form of table
model[4].summary()


In [None]:
# Showing dropout rate
model[3].layers[2].rate


### RGB Traffic Signs dataset (255.0 ==> mean ==> std)

## Step 5: Setting up learning rate & epochs

In [None]:
# Defining number of epochs
epochs = 20

# Defining schedule to update learning rate
learning_rate = LearningRateScheduler(lambda x: 1e-3 * 0.95 ** (x + epochs), verbose=1)

# Check point
print('Number of epochs and schedule for learning rate are set successfully')


### RGB Traffic Signs dataset (255.0 ==> mean ==> std)

## Step 6: Training built CNN models

In [None]:
# If you're using Nvidia GPU and 'cnngpu' environment, there might be an issue like:
'''Failed to get convolution algorithm. This is probably because cuDNN failed to initialize'''
# In this case, close all Jupyter Notebooks, close Terminal Window or Anaconda Prompt
# Open again just this one Jupyter Notebook and run it


# Defining list to collect results in
h = []


# Training models in a loop
for i in range(5):
    # Сurrent model
    temp = model[i].fit(x_train, y_train,
                        batch_size=50,
                        epochs=epochs,
                        validation_data=(x_validation, y_validation),
                        callbacks=[learning_rate],
                        verbose=1)
    
    # Adding results of current model in the list
    h.append(temp)


### RGB Traffic Signs dataset (255.0 ==> mean ==> std)

## Step 7: Showing and plotting accuracies

In [None]:
# Accuracies of the models
for i in range(5):
    print('Model {0}: Training accuracy={1:.5f}, Validation accuracy={2:.5f}'.
                                                          format(i + 1,
                                                                 max(h[i].history['accuracy']),
                                                                 max(h[i].history['val_accuracy'])))


In [None]:
# Magic function that renders the figure in a jupyter notebook
# instead of displaying a figure object
%matplotlib inline


# Setting default size of the plot
plt.rcParams['figure.figsize'] = (12.0, 6.0)


# Plotting accuracies for every model
plt.plot(h[0].history['val_accuracy'], '-o')
plt.plot(h[1].history['val_accuracy'], '-o')
plt.plot(h[2].history['val_accuracy'], '-o')
plt.plot(h[3].history['val_accuracy'], '-o')
plt.plot(h[4].history['val_accuracy'], '-o')


# Setting limit along Y axis
plt.ylim(0.9915, 0.9986)


# Showing legend
plt.legend(['model_1', 'model_2', 'model_3', 'model_4', 'model_5'],
           loc='lower right',
           fontsize='xx-large')


# Giving name to axes
plt.xlabel('Epoch', fontsize=16)
plt.ylabel('Accuracy', fontsize=16)


# Giving name to the plot
plt.title('Models accuracies: Traffic Signs dataset', fontsize=16)


# Showing the plot
plt.show()


In [None]:
# Showing list of scheduled learning rate for every epoch
print(h[0].history['lr'])


In [None]:
# Magic function that renders the figure in a jupyter notebook
# instead of displaying a figure object
%matplotlib inline


# Plotting scheduled learning rate
plt.plot(h[0].history['lr'], '-mo')


# Showing the plot
plt.show()


### RGB Traffic Signs dataset (255.0 ==> mean ==> std)

## Step 8: Making a conclusion

In [None]:
# According to validation accuracy, the 4th and 5th models have the highest values

# The choice for Traffic Signs dataset is 4th model
# RGB input --> {128C5-P2-D30} --> {256C5-P2-D30} --> {512C5-P2-D30} --> {1024C3-P2-D30} --> 2048-D30 --> 43
# GRAY input --> {128C5-P2-D30} --> {256C5-P2-D30} --> {512C5-P2-D30} --> {1024C3-P2-D30} --> 2048-D30 --> 43

# RGB input: (48, 48, 3)
# GRAY input: (48, 48, 1)


### Some comments

To get more details for usage of 'Dropout' class:  
**print(help(Dropout))**  
  
More details and examples are here:  
https://keras.io/api/layers/regularization_layers/dropout/


To get more details for usage of 'Sequential' class:  
**print(help(Sequential))**  
  
More details and examples are here:  
https://keras.io/api/models/sequential/


To get more details for usage of function 'to_categorical':  
**print(help(to_categorical))**  

More details and examples are here:  
https://keras.io/api/utils/python_utils/#to_categorical-function 


To get more details for usage of function 'plot_model':  
**print(help(plot_model))**  

More details and examples are here:  
https://keras.io/api/utils/model_plotting_utils/#plot_model-function  


To get more details for usage of function 'plt.plot':  
**print(help(plt.plot))**  

More details and examples are here:  
https://matplotlib.org/3.1.3/api/_as_gen/matplotlib.pyplot.plot.html


In [None]:
print(help(Dropout))

In [None]:
print(help(Sequential))

In [None]:
print(help(to_categorical))

In [None]:
print(help(plot_model))

In [None]:
print(help(plt.plot))

In [None]:
# Importing needed libraries
import numpy as np
import tensorflow as tf


# Defining array with random values
x_train = np.random.randint(5, size=(5, 2)).astype(np.float32)


# Showing array
print(x_train)
print()


# Initializing Dropout layer
layer = tf.keras.layers.Dropout(0.3)

# Passing array to the layer
output = layer(x_train, training=True)


# Showing array after Dropout layer
print(output)
print()

# Showing initial array
print(x_train)
