# Convolutional Neural Network
Sima Torabi, AI 3-22, August 2019

In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of deep neural networks, most commonly applied to analyzing visual imagery.

Convolutional neural networks are deep artificial neural networks that are used primarily to classify images (e.g. name what they see), cluster them by similarity (photo search), and perform object recognition within scenes. They are algorithms that can identify faces, individuals, street signs, tumors, platypuses and many other aspects of visual data.

Convolutional networks perform optical character recognition (OCR) to digitize text and make natural-language processing possible on analog and hand-written documents, where the images are symbols to be transcribed. CNNs can also be applied to sound when it is represented visually as a spectrogram. More recently, convolutional networks have been applied directly to text analytics as well as graph data with graph convolutional networks.

![1](1.png)

In computer vision, images are the training data of a network, and the input features are the pixels of an image. These features can get really big. For example, when dealing with a 1megapixel image, the total number of features in that picture is 3 million (=1,000 x 1,000 x 3 color channels). Then imagine passing this through a neural network with just 1,000 hidden units, and we end up with some weights of 3 billion parameters!

These numbers are too big to be managed, but, luckily, we have the perfect solution: Convolutional neural networks (ConvNets).

There are 3 types of layers in a convolutional network:

    Convolution (CONV)
    Pooling (POOL)
    Fully connected (FC)

#### Convolutional Layers
Applying a convolution to an image is like running a filter of a certain dimension and sliding it on top of the image. That operation is translated into an element-wise multiplication between the two matrices and finally an addition of the multiplication outputs. The final integer of this computation forms a single element of the output matrix.

![2](2.png)

The value 1 on the kernel allows filtering brightness, while -1 highlights the darkness and 0 the grey from the original image when the filter slides on top.

#### Valid & Same Convolution (Padding)
* VALID padding. The easiest case, means no padding at all. Just leave your data the same it was.
* SAME padding sometimes called HALF padding. It is called SAME because for a convolution with a stride=1, (or for pooling) it should produce output of the same size as the input. It is called HALF because for a kernel of size k enter image description here

![4](4.png)

![3](3.png)

knowing the filter size (f), stride (s), pad (p), and input size (n):

![5](5.png)
the filter size is usually an odd value, and if the fraction above is not an integer you should round it down.

#### Convolution over a stacked channels (Images)

![6](6.png)

![7](7.png)
Therefore, in general terms we have:
![8](8.png)
(with nc’ as the number of filters, which are detecting different features)

#### One-layer of a convolutional neural network

The final step that takes us to a convolutional neural layer is to add the bias and a non-linear function.


![9](9.png)

The parameters involved in one layer are independent of the input size image.

So let’s consider, for example, that we have 10 filters that are of size 3x3x3 in one layer of a neural network. Each filter has 27 (3x3x3) + 1 bias => 28 parameters.

Therefore, the total amount of parameters in the layer is 280 (10x28).

#### Deep Convolutional Network
The following architecture depicts a simple example of that:

![10](10.png)

#### POOLING LAYERS

There are two types of pooling layers: max and average pooling.

Max pooling

We define a spatial neighborhood (a filter), and as we slide it through the input, we take the largest element within the region covered by the filter.

![11](11.png)

Average pooling

As the name suggests, it retains the average of the values encountered within the filter.

One thing worth noting is the fact that a pooling layer does not have any parameters to learn. Of course, we have hyper-parameters to select, the filter size and the stride (it’s common not to use any padding).

#### FULLY CONNECTED LAYER

A fully connected layer acts like a “standard” single neural network layer, where you have a weight matrix W and bias b.

We can see its application in the following example of a Convolutional Neural Network. This network is inspired by the LeNet-5 network:

![12](12.png)

It’s common that, as we go deeper into the network, the sizes (nh, nw) decrease, while the number of channels (nc) increases.

Another common pattern you can see in neural networks is to have CONV layers, one or more, followed by a POOL layer, and then again one or more CONV layers followed by a POOL layer and, at the end, a few FC layers followed by a Softmax.

When choosing the right hyper-parameters (f, s, p, ..), look at the literature and choose an architecture that was successfully used and that can apply to your application. There are several “classic” networks, such as LeNet, AlexNet, VGG, …

#### Convolution filters in detail

![13](13.png)

![14](14.png)

![final](final.jpg)

![f2](f2.jpeg)

#### Case studies

There are several architectures in the field of Convolutional Networks that have a name. The most common are:

    LeNet. The first successful applications of Convolutional Networks were developed by Yann LeCun in 1990’s. Of these, the best known is the LeNet architecture that was used to read zip codes, digits, etc.
    
    AlexNet. The first work that popularized Convolutional Networks in Computer Vision was the AlexNet, developed by Alex Krizhevsky, Ilya Sutskever and Geoff Hinton. The AlexNet was submitted to the ImageNet ILSVRC challenge in 2012 and significantly outperformed the second runner-up (top 5 error of 16% compared to runner-up with 26% error). The Network had a very similar architecture to LeNet, but was deeper, bigger, and featured Convolutional Layers stacked on top of each other (previously it was common to only have a single CONV layer always immediately followed by a POOL layer).
    
    ZF Net. The ILSVRC 2013 winner was a Convolutional Network from Matthew Zeiler and Rob Fergus. It became known as the ZFNet (short for Zeiler & Fergus Net). It was an improvement on AlexNet by tweaking the architecture hyperparameters, in particular by expanding the size of the middle convolutional layers and making the stride and filter size on the first layer smaller.
    
    GoogLeNet. The ILSVRC 2014 winner was a Convolutional Network from Szegedy et al. from Google. Its main contribution was the development of an Inception Module that dramatically reduced the number of parameters in the network (4M, compared to AlexNet with 60M). Additionally, this paper uses Average Pooling instead of Fully Connected layers at the top of the ConvNet, eliminating a large amount of parameters that do not seem to matter much. There are also several followup versions to the GoogLeNet, most recently Inception-v4.
    
    VGGNet. The runner-up in ILSVRC 2014 was the network from Karen Simonyan and Andrew Zisserman that became known as the VGGNet. Its main contribution was in showing that the depth of the network is a critical component for good performance. Their final best network contains 16 CONV/FC layers and, appealingly, features an extremely homogeneous architecture that only performs 3x3 convolutions and 2x2 pooling from the beginning to the end. Their pretrained model is available for plug and play use in Caffe. A downside of the VGGNet is that it is more expensive to evaluate and uses a lot more memory and parameters (140M). Most of these parameters are in the first fully connected layer, and it was since found that these FC layers can be removed with no performance downgrade, significantly reducing the number of necessary parameters.
    
    ResNet. Residual Network developed by Kaiming He et al. was the winner of ILSVRC 2015. It features special skip connections and a heavy use of batch normalization. The architecture is also missing fully connected layers at the end of the network. The reader is also referred to Kaiming’s presentation (video, slides), and some recent experiments that reproduce these networks in Torch. ResNets are currently by far state of the art Convolutional Neural Network models and are the default choice for using ConvNets in practice (as of May 10, 2016). In particular, also see more recent developments that tweak the original architecture from Kaiming He et al. Identity Mappings in Deep Residual Networks (published March 2016).


A problem with the output feature maps is that they are sensitive to the location of the features in the input. One approach to address this sensitivity is to down sample the feature maps. This has the effect of making the resulting down sampled feature maps more robust to changes in the position of the feature in the image, referred to by the technical phrase “local translation invariance.”

Pooling layers provide an approach to down sampling feature maps by summarizing the presence of features in patches of the feature map. Two common pooling methods are average pooling and max pooling that summarize the average presence of a feature and the most activated presence of a feature respectively.

In [None]:
 from google.colab import drive
drive.mount('/gdrive')
%cd /gdrive

import numpy as np
from keras.models import Sequential
from keras.layers import Dropout, Dense, Flatten
from keras.optimizers import Adam
from keras.layers.convolutional import Conv2D, MaxPooling2D
from keras.utils import np_utils
from keras.datasets import cifar10
import matplotlib.pyplot as plt

from tensorflow.python.client import device_lib

print (device_lib.list_local_devices())

from keras import backend as K

K.tensorflow_backend._get_available_gpus()

(X_train, y_train), (X_test, y_test) = cifar10.load_data()

plt.imshow(X_train[0])

help(cifar10.load_data)

y_train[0]



**Preprocessing the data**

X_train_norm = np_utils.normalize(X_train)
X_test_norm = np_utils.normalize(X_test)

y_train = np_utils.to_categorical(y_train, 10)
y_test = np_utils.to_categorical(y_test, 10)

y_train[0]







## **Build CNN** keras

In [None]:
model = Sequential()

## convolutional layers
# 1st layer
model.add(Conv2D(32, 5, strides = (1,1), padding='same', 
                 input_shape=(32,32,3), activation='relu'))
model.add(Dropout(0.2))

# 2nd layer
model.add(Conv2D(16, 3, strides= (1,1), padding='valid',
                activation='relu'))

# maxpooling layer
model.add(MaxPooling2D(pool_size=(2,2), padding='same'))

# flatten the tensor to a vector
model.add(Flatten())

## fully connected layers (ANN)
model.add(Dense(256, activation= 'relu'))
model.add(Dropout(0.2))

# model.add(Dense(128, activation = 'relu'))

# final layer
model.add(Dense(10, activation = 'softmax'))

model.summary()

model.layers[0].name = 'لایه کانولوشنی اول'
model.layers[0].trainable = True
model.layers[0].get_config()

model.summary()

model.compile(loss='categorical_crossentropy', optimizer=
              'adam', metrics = ['accuracy'])

history = model.fit(X_train_norm, y_train, validation_split=0.25,
                   epochs = 25, batch_size=512)

model.evaluate(X_test_norm, y_test)