***Reference:***

**Kapoor, Amita; Gulli, Antonio; Pal, Sujit. Deep Learning with TensorFlow and Keras: Build and deploy supervised, unsupervised, deep, and reinforcement learning models, 3rd Edition . Packt Publishing.**

# Chapter 3: Convolutional Neural Networks

### Spatial Structure in Images

While classifying the MNIST handwritten characters using Deep Neural Network, each pixel in the input image has been assigned to a neuron for a total of 784 (28 x 28 pixels) input neurons.

**However, this strategy does not leverage the spatial structure and relationships between each image.**

In particular, this piece of code is a dense network that transforms the [bitmap](https://www.google.com/search?q=bitmap&oq=bitmap&aqs=chrome..69i57j0i433i512j0i512l8.1399j0j7&sourceid=chrome&ie=UTF-8) representing each written digit into a flat vector where the local spatial structure is removed. **Removing the spatial structure is a problem because important information is lost:**

```
# X_train in 60000 rows of 28x28 vlaues --> reshaped in 60000 x 784
X_train = X_train.reshape(60000, 784)
X_test = X_test.reshape(60000, 784)
```

## 1.Deep Convolutional Neural Network(DCNN)

A Deep Convolutional Neural Network (DCNN) consists of many neural network layers. Two different types of layers,
>**convolutional and pooling (i.e., subsampling), are typically alternated**

The depth of each filter increases from left to right in the network. The last stage is typically made of one or more fully connected layers.
<div align='center'>
    <img src='images/dcnn.png'/>
</div>

***There are three key underlying concepts for ConvNets:***
- **local receptive fields**,
- **shared weights**, and
- **pooling**

### 1.1 **Local Receptive Fields**

If we want to preserve the spatial information of an image or other form of data, then it is convenient to represent each image with a matrix of pixels.

Given this, a simple way to encode the local structure is to connect a submatrix of adjacent input neurons into one single hidden neuron belonging to the next layer. That single hidden neuron represents one local receptive field.

Note that this operation is named convolution, and this is where the name for this type of network is derived.

>**You can think about convolution as the treatment of a matrix by another matrix, referred to as a kernel.**

Of course, we can encode more information by having overlapping submatrices. For instance, let's suppose that the size of every single submatrix is 5 x 5 and that those submatrices are used with MNIST images of 28 x 28 pixels. Then we will be able to generate 24 x 24 local receptive field neurons in the hidden layer. In fact, it is possible to slide the submatrices by only 23 positions before touching the borders of the images.


In TensorFlow,
- the number of pixels along one edge of the kernel, or submatrix, is the **kernel size**, and
- the **stride length** is the number of pixels by which the kernel is moved at each step in the convolution.


Let's define the feature map from one layer to another. Of course, we can have multiple feature maps that learn independently from each hidden layer. For example, we can start with 28 x 28 input neurons for processing MNIST images, and then define k feature maps of size 24 x 24 neurons each (again with shape of 5 x 5) in the next hidden layer.

### 1.2 **Shared Weights and Bias**

Let's suppose that we want to move away from the pixel representation in a raw image, by gaining the ability to detect the same feature independently from the location where it is placed in the input image.

A simple approach is to use the same set of weights and biases for all the neurons in the hidden layers.

In this way, each layer will learn a set of position-independent latent features derived from the image, bearing in mind that a layer consists of a set of kernels in parallel, and each kernel only learns one feature.

### **1.3 Example** : `Convolutional`, `Padding` & `Stride`

> **One simple way to understand convolution is to think about a sliding window function applied to a matrix.**

In the following example, given the input matrix **J** and the kernel **K**, we get the convolved output.

The $3 \times 3$ **kernel K**(sometimes called the **filter or feature detector**) is multiplied elementwise with the input matrix to get one cell in the output matrix. All the other cells are obtained by sliding the window over I:

<div align='center'>
    <img src='images/conv2d_ex.png' title='Conv2D_Example'/>
    <img src='images/conv2d.gif' title='Conv2D_GIF'/>
</div>

In this example, we decided to stop the sliding window as soon as we touch the borders of **J**(so the output is $3 \times 3$).

* **

**Padding:**
>Alternatively, we could have chosen to pad the input with zeros (so that the output would have been $5 \times 5$). This decision relates to the **padding** choice adopted. Note that kernel depth is equal to input depth (channel).

<div align='center'>
    <img src='images/conv2d_padding_no_stride.gif' title='Conv2D_With-Padding_No-Stride'/>
</div>

* **

**Stride:**
>Another choice is about how far along we slide our sliding windows with each step. This is called the **stride** and it can be one or more.

<div align='center'>
    <img src='images/con2d_padding_and_strides.gif' title='Conv2D_With-Padding_and_Stride'>
</div>                                                         


A larger stride generates fewer applications of the kernel and a smaller output size, while a smaller stride generates more output and retains more information.

* **

The size of the filter, the stride, and the type of padding are
hyperparameters that can be fine-tuned during the training of
the network.

### 1.4 ConvNets in Tensorflow

```
model = keras.Sequential()
model.add(layers.Conv2D(32, (3,3), activation='relu', input_shape=(28,28,1)))
```

**This means that we are applying a 3x3 convolution on 28x28 images with 1 input channel (or input filters) resulting in 32 output channels (or output filters).**


<!-- Random Example:
<div align='center'>
    <img src='images/conv2d_pad.gif'>
</div>  -->

* **

### 1.5 **Pooling Layers**

>What is a **feature map?**
>>**In CNNs, a feature map is the output of a convolutional layer representing specific features in the input image or feature map.**
<img src='images/feature_map.png'/>


- https://www.geeksforgeeks.org/cnn-introduction-to-pooling-layer/

- https://www.geeksforgeeks.org/introduction-convolution-neural-network/?ref=lbp

* **

- **Pooling Layers are used to summarize the output of a feature map.**

**Max Pooling:**
- **Max Pooling simply outputs the maximum activation as observed in the region.** In Keras, if we want to define a max pooling layer of size 2 x 2, we write: `model.add(layers.MaxPooling2D((2,2)))`

<div align='center'>
    <img src='images/max_pooling.png'/>
</div>

**Average Pooling:**
- **Average Pooling, which simply aggregates a region into the average values of the activations observed in that region.**

### ConvNets Summary

**CNNs apply convolution and pooling operations in one dimension for audio and text data along the time dimension, in two dimensions for images along the (height x width) dimensions, and in three dimensions for videos along the
(height x width x time) dimensions.**

- **Convolution operation**
<div align="center">
    <img src="images/conv.png" title="Convolution operation"/>
</div>  

* **

- **Max pool operation**
<div align="center">
    <img src="images/max_pool.png" title="Max pool operation"/>
</div>  

* **
- [Kernel_(image_processing) - Wikipedia](https://en.wikipedia.org/wiki/Kernel_(image_processing))

- [tensorflow doc - conv2d](https://www.tensorflow.org/api_docs/python/tf/nn/conv2d)

    - Example is good

## 2. DCNN: LeNet in Tensorflow using MNIST Dataset

- **The core idea of LeNet is to have lower layers alternating convolutional operations with max-pooling operations.**

- The convolution operations are based on carefully chosen local receptive fields with shared weights for multiple feature maps.

- Then, higher levels are fully connected based on a traditional MLP with hidden layers and softmax as the output layer.

* **
1. To define a LeNet in Code, we use `Convolutional2D` Module
```
tf.keras.layers.Conv2D(filters,
                       kernel_size,
                       strides=(1, 1),
                       padding='valid',...)
```

- The first parameter is the number of output `filters` in the convolution and the next tuple(`kernal_size`) is the extension of each filter.

- Another parameter `padding`:
    - `padding = 'valid'` means that the convolution is only computed where the input and the filter fully overlap and therefore the output is smaller than the input,
    
    - while `padding = 'same'` means that we have an output that is the same size as the input, for which the area around the input is padded with zeros.
    
* **

2. In addition we use `MaxPooling2D` module:
```
tf.keras.layers.MaxPooling2D(pool_size=(2, 2),
                             strides=(2,2), # default=None
                             padding='valid',)
```
where `pool_size=(2, 2)` is a tuple of 2 integers representing the factors by which the image is vertically and horizontally downscaled. So `(2, 2)` will halve the image in each dimension, and `strides=(2, 2)` is the stride used for
processing.

In [1]:
import tensorflow as tf
from tensorflow import keras

print(tf.__version__)
print(tf.config.list_physical_devices('GPU'))

2.10.1
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]


In [2]:
# Params
EPOCHS = 5
BATCH_SIZE = 128
VERBOSE = 1
OPTIMIZER = tf.keras.optimizers.Adam()
VALIDATION_SPLIT = 0.20
IMG_ROWS, IMG_COLS = 28, 28 # i/p image dimensions
INPUT_SHAPE = (IMG_ROWS, IMG_COLS, 1) # 1 -> Only one color channel
NB_CLASSES = 10 # Output Classes = 10 digits

# Define the LeNet Network:
class LeNet:
    # define the convnet
    @staticmethod
    def build(input_shape, classes):
        model = keras.models.Sequential()
        model.add(keras.layers.InputLayer(input_shape=input_shape))
        # CONV => RELU => POOL : Stage 1
        model.add(keras.layers.Conv2D(filters=20, kernel_size=(5,5),
                                      activation='relu'))

        model.add(keras.layers.MaxPooling2D(pool_size=(2,2), strides=(2,2)))

        # CONV => RELU => POOL : Stage 2
        model.add(keras.layers.Conv2D(50, (5,5), activation='relu'))
        model.add(keras.layers.MaxPooling2D(pool_size=(2,2), strides=(2,2)))

        # Flatten => RELU Layers : Stage 3
        model.add(keras.layers.Flatten())
        model.add(keras.layers.Dense(500, activation='relu'))

        # a SOFTMAX classifier
        model.add(keras.layers.Dense(classes, activation='softmax'))

        return model

**Visualizing above defined LeNet Architechture:**
<div align='center'>
    <img src='images/lenet.png' title='Visualization of LeNet'/>
</div>

- **Stage 1:**
    - **We have a first convolutional stage with ReLU activations followed by max pooling.** Our network will learn 20 convolutional filters, each one of which has a size of 5x5. The output dimension is the same as the input shape, so it will be 28 x 28. Note that since `Convolutiona12D` is the first stage of our pipeline, we are also required to define its `input_shape`.
    
```
# CONV => RELU => POOL : Stage 1
model.add(keras.layers.Conv2D(filters=20, kernel_size=(5,5),
                              activation='relu',
                              input_shape=input_shape))
                              
model.add(keras.layers.MaxPooling2D(pool_size=(2,2),
                                    strides=(2,2)))
        
```

* **

- **Stage 2:**
    - **Then there is a 2nd convolutional stage with ReLU activations, followed again by a max pooling layer.** In this case, we increase the number of convolutional filters learned to 50 from the previous 20. **Increasing the number of filters in deeper layers is a common technique in deep learning.**
    
```
# CONV => RELU => POOL : Stage 2
model.add(keras.layers.Convolution2D(50, (5,5), activation='relu'))

model.add(keras.layers.MaxPooling2D(pool_size=(2,2), strides=(2,2)))
```

* **

- **Stage 3:**
    - Then we have a pretty standard flattening and a dense network of 500 neurons, followed by a softmax classifier with 10 classes:
    
```
# Flatten => RELU Layers : Stage 3
model.add(keras.layers.Flatten())

model.add(keras.layers.Dense(500, activation='relu'))
        
# a SOFTMAX classifier
model.add(keras.layers.Dense(classes, activation='softmax'))
```

In [3]:
# data: shuffled and split between train and test sets
(X_train, y_train), (X_test, y_test) = keras.datasets.mnist.load_data()

# reshape
X_train = X_train.reshape((60000, 28, 28, 1))
X_test = X_test.reshape((10000, 28, 28, 1))

# normalize
X_train, X_test = X_train / 255.0, X_test / 255.0

# cast
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')

print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
y_train = tf.keras.utils.to_categorical(y_train, NB_CLASSES)
y_test = tf.keras.utils.to_categorical(y_test, NB_CLASSES)

60000 train samples
10000 test samples


In [4]:
# Initialize the model and the optimizer
model = LeNet.build(input_shape=INPUT_SHAPE, classes=NB_CLASSES)

# Comile the model
model.compile(loss='categorical_crossentropy',
              optimizer=OPTIMIZER,
              metrics=['accuracy'])

# Model summary
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 24, 24, 20)        520       
                                                                 
 max_pooling2d (MaxPooling2D  (None, 12, 12, 20)       0         
 )                                                               
                                                                 
 conv2d_1 (Conv2D)           (None, 8, 8, 50)          25050     
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 4, 4, 50)         0         
 2D)                                                             
                                                                 
 flatten (Flatten)           (None, 800)               0         
                                                                 
 dense (Dense)               (None, 500)               4

In [5]:
# use TensorBoard, princess Aurora!
callbacks = [
    # Write TensorBoard logs to `./logs` directory
    tf.keras.callbacks.TensorBoard(log_dir='./logs')
]

# Fit the model
history = model.fit(X_train, y_train,
                    batch_size=BATCH_SIZE, epochs=EPOCHS,
                    verbose=VERBOSE,
                    validation_split=VALIDATION_SPLIT,
                    callbacks=callbacks)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [6]:
train_score = model.evaluate(X_train, y_train, verbose=VERBOSE)
print(f"\nTrain score: {train_score[0]}")
print(f'Train accuracy: {train_score[1]}\n')

score = model.evaluate(X_test, y_test, verbose=VERBOSE)
print("\nTest score:", score[0])
print('Test accuracy:', score[1])


Train score: 0.016713107004761696
Train accuracy: 0.9951333403587341


Test score: 0.02745833992958069
Test accuracy: 0.9911999702453613


 * **