<a href="https://colab.research.google.com/github/Bohdan-at-Kulinich/Machine-Learning/blob/main/Convnets.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Introduction to Convolutional Neural Networks

In [2]:
# Instantiating a small convnet: 
# using the Functional API

from tensorflow import keras 
from tensorflow.keras import layers 

# a convnet takes as input tensors of shape (image_height, image_width, image_channels) not including the batch dimension: 
# (28, 28, 1) is the format of MNIST images. 
inputs = keras.Input(shape=(28, 28, 1)) 

x = layers.Conv2D(filters=32, kernel_size=3, activation='relu')(inputs) 
x = layers.MaxPooling2D(pool_size=2)(x) 
x = layers.Conv2D(filters=64, kernel_size=3, activation='relu')(x)
x = layers.MaxPooling2D(pool_size=2)(x) 
x = layers.Conv2D(filters=128, kernel_size=3, activation='relu')(x) 
x = layers.Flatten()(x) 

outputs = layers.Dense(10, activation='softmax')(x) 

model = keras.Model(inputs=inputs, outputs=outputs) 

In [3]:
# Displaying the model's summary: 

model.summary() 

# The output of each Conv2D and MaxPooling2D layer is a rank-3 tensor (height, width, channels). 
# The heigth and width dimensions tend to shrink as we go deeper into the model. 
# The number of channels is cotrolled by the first argument passed to the Conv2D layers (32, 64, 128). 

# After the last Conv2D layer we end up with an output of shape (3, 3, 128): 
# a 3x3 feature map of 128 channels. 
# In the next step we need to feed this rank-3 output into the a densely connected calssifier:
# a stack of Dense layers, which process 1D vectors. 
# We need to flatten the 3D outputs to 1D with a Flatten layer befor adding the Dense layers. 

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_2 (InputLayer)        [(None, 28, 28, 1)]       0         
                                                                 
 conv2d_1 (Conv2D)           (None, 26, 26, 32)        320       
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 13, 13, 32)       0         
 2D)                                                             
                                                                 
 conv2d_2 (Conv2D)           (None, 11, 11, 64)        18496     
                                                                 
 max_pooling2d_2 (MaxPooling  (None, 5, 5, 64)         0         
 2D)                                                             
                                                                 
 conv2d_3 (Conv2D)           (None, 3, 3, 128)         73856 

In [4]:
# Training the convnet on MNIST dataset: 

# Since we are doing 10-way classification with a softmax output, 
# we use categorical crossentropy loss, 
# and because our labels are integers, 
# we use the sparse version, sparse_categorical_crossentropy: 

from tensorflow.keras.datasets import mnist 

(train_images, train_labels), (test_images, test_labels) = mnist.load_data() 

train_images = train_images.reshape((60000, 28, 28, 1))
train_images = train_images.astype('float32') / 255 

test_images = test_images.reshape((10000, 28, 28, 1))
test_images = test_images.astype('float32') / 255 

model.compile(optimizer='rmsprop',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])
model.fit(train_images, train_labels,
          epochs=5, 
          batch_size=64) 

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f7f5e21ead0>

In [5]:
# Evaluating the convnet: 

test_loss, test_acc = model.evaluate(test_images, test_labels) 
print(f"Test accuracy: {test_acc:.3f}") 

Test accuracy: 0.988


### The convolution operation: 

Dense layers learn global patterns in their input feature space (for MNIST 
digit, patterns involving app pixels). 

Convolution layers learn local patterns (for images, patterns found in small
2D windows of the inputs): 

1.   Patterns are translation invariant: a certain pattern learned in the one corner of the image, is recognized by convnet anywhere.
2.   Convnets can learn spatial heirarchies of patterns. This allows to learn increasingly complex and abstract visual concepts, based on the previously learned small local patterns (such as edges, elementary lines, and textures). 

Convolutions operate over rank-3 tensors called *feature maps*, with two spacial axes (height, width) as well as *depth* axis (*channels* axis). For an RGB image, the dimension of the depth axis is 3, since the image has 3 color channels: red, green and blue. For a black-and-white picture, the depth is 1 (levels of gray). 

The convolution operation pruduces an output feature map which is still an rank-3 tensor. Its depth can be arbitrary, because the output depth is a parameter of the layer. The different channels now stand for *filters*.  

Filters encode specific aspects of the input data (like "presence of a face in the input" at a high-level). 

