# Convolutional neural networks

A convolutional neural network (CNN or Convnet) is a class of neural networks, a machine learning algorithm. CNN's are especially good at visual data analysis, such as image classification; by detecting patterins, mostly in the form of shapes (edges, circles, etc) and colours. Before CNN's, computer vision (CV) experts would develop filters manually to detect patterns, e.g. Sobel filter which detects edges. Nowadays with deep learning, specifically with convolutional layers, one can create these filters automatically by training a (convolutional) neural network.

## Layers

CNN's differentiate themselves from 'regular' neural networks with special hidden layers called convolutional layers, putting the convolution in convolutional neural network. Convolutional layers transform the data by performing cross-correlations, also called convolving. But first let's take a high level view of the entire process.

The way convolutional neural networks processrd images is in the form of a matrix. Each row and column representing an individual pixel of said image. You're probably familiar with the hello-world tutorial of neural networks by using a classic NN to classify handwritten digits. For the input layers you used every pixel as input neuron, it's not that different in CNN's except it's in the form of a matrix instead of a (flattened) vector.


![vvg16](sources/input-matrix.png)

## filter

Convolutional layer is the first layer that extract features from the images. Pixels are only related to its direct adjacent and close pixels. Convolving the image patternizes the relationship of more pixels across the image, larger related matrices. Convolution filtering detects patterns by looking at larger matrices of pixels while preserving relations and complexity. See animation below where a filter of 3x3 takes in 9 pixels and stores its output in a single output cell. The outcome matrix is also called a feature map.

![cnn filter animation](sources/cnn-filter.gif)

Considering above's animation, we still dont know what the output is. How does a filter consider the 9 input pixels and what is the resulting outcome? A convolutional filter is not just a width and height of how many pixels it should look at. The filter matrix itself has values too. Each positional value in the filter matrix is multiplied to the respected input position it's convolving.
![convolution-filter](sources/convolution-filter.png)

Let’s suppose that we have four 3 x 3 filters for our first convolutional layer, and these filters are filled with the values you see below. These values can be represented visually by having -1s correspond to black, 1s correspond to white, and 0s correspond to grey. These 4 filters of 3x3 are detecting edges as depicted by the images, resulting in 4 feature maps.

$$\begin{bmatrix} -1 & 1 & 0 \\ -1 & 1 & 0 \\ -1 & 1 & 0 \end{bmatrix}
\begin{bmatrix} -1 & -1 & -1 \\ 1 & 1 & 1 \\ 0 & 0 & 0 \end{bmatrix}
\begin{bmatrix} 0 & 1 & -1 \\ 0 & 1 & -1 \\ 0 & 1 & -1 \end{bmatrix}
\begin{bmatrix} 0 & 0 & 0 \\ 1 & 1 & 1 \\ -1 & -1 & -1 \end{bmatrix}$$

![filter-sides](sources/filter-sides.png)

## Stride
When computing the cross-correlation, we start with the convolution window at the top-left corner of the input matrix, and slide it over all possible locations from left to right, top to bottom. Above's animation showed a default `stride` of 1x1, also simpply called `no stride` (We must move at least 1 position/pixel or we're stuck forever), meaning that the filter shifts position by 1x1 pixel. This is an arbitrary configuration, one could set a stride of 2x3, resulting in the filter shifting 2 positions horizontally and 3 vertically every shift. Generally a stride of 1x1 is applied unless, for computational efficiency, one wants to downsample and thus skip intermediate positions.

### Padding

Consider above's animation, the input matrix is 7x7 and the output is 5x5, the output matrix is smaller than the input matrix. This is because the filter of 3x3 cannot fit on every position on the input matrix due to the borders. Convolving a 5x5 image with a 3x3 filter with a 1x1 `stride`, results in an output matrix of 3x3, a 64% decrease in complexity. 

In order to use convolutional filters without decreasing complexity one can configure a padding size. Padding adds an extra layer of pixels on each side of the image with a value of 0, allowing the filter to stride over the entire image.

![same padding no strides](sources/cnn-all-filter.gif)

## Max pooling

Convolution layers are commonly followed by pooling layers to reduce the spatial size of the representation to reduce the parameter counts, reducing the computational complexity in following layers. Basically we select a pooling size to reduce the amount of the parameters by selecting the maximum, average, or sum values of the input matrix. Pooling layers also partially prevent overfitting as the specific relations are (literally) replaced by a larger overview. 

Pooling is applied for three reasons: To get local translational invariance, to get invariance against minor local changes and, most important, for data reduction.

Applying a convolutional layer 2x2 filter on a handwritten digit would result in the following matrix. One can definitely notice a loss of detail in the digit but the pattern persists, while drastically reducing the amount of data to the next layer. Below depicts a max-pooling of 2x2, notice the maximum values in the coloured boxes on the left equals on the right.

![filter-pool](sources/max-pool-seven3.png)

## 1x1 layer (network in network)

A filter of 1x1 at first glance sounds rather useless, multipling matrices to a matrix with a single row and column is just a normal multiplcation. Such as described below. 

$$\begin{bmatrix} 3 & 5 & 2 \\ 2 & 6 & 4 \\ 1 & 3 & 4 \end{bmatrix}
\begin{bmatrix} 2 \end{bmatrix}
\begin{bmatrix} 6 & 10 & 4 \\ 4 & 12 & 8 \\ 2 & 6 & 8 \end{bmatrix}$$

However this was without taking channels into account, the feature maps depth. Let's refresh, max-pooling reduces the amount of parameters in the matrices, however, the amount of feature maps (depth) remain the same. In order to reduce the amount of feature maps we can introduce a 1x1 convolution layer, also called `Network in network`. A 1x1 convolution simply maps an input pixel with all it's channels to an output pixel, not relating at anything around itself. Often used to reduce the number of depth channels, as multipling volumes with (extremely) large depths is computationally time consuming.

Consider the code below. A convolutional neural network with 2 layers. One with a matrix input of 256x256 and 256 filters of 3x3, resulting in 512 feature maps after the very first first layer. In order to reduce the amount of feature maps (512) we can introduce a 1x1 convolutional layer. 64 filters of 1x1 resulting in exactly 64 feature maps with where the input depth values are mapped as output.

In [3]:
from keras.models import Sequential
from keras.layers import Conv2D
# create model
model = Sequential()
model.add(Conv2D(512, (3,3), padding='same', activation='relu', input_shape=(256, 256, 3)))
model.add(Conv2D(64, (1,1), activation='relu'))
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_5 (Conv2D)            (None, 256, 256, 512)     14336     
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 256, 256, 64)      32832     
Total params: 47,168
Trainable params: 47,168
Non-trainable params: 0
_________________________________________________________________


More advanced usages of 1x1 conv layers would be to promote learning across channels and in ineception architecture, far beyond the scope of this notebook.

# Implementation

Lets start off with a simple introduction into convolutional layers, eventually we'll move to complex architectures and compare results. Below's code should be interpreted more as pseudo code with further explanations, later on the best model to date will be created, trained, tested and applied.

### Importing libraries

In [8]:
print("Import libraries...")
import tensorflow as tf
import keras
from tensorflow.examples.tutorials.mnist import input_data
from keras.models import Sequential
from keras.optimizers import Adam
from keras.layers import Dense, Conv2D, MaxPool2D , Flatten, Dropout
from keras.preprocessing.image import ImageDataGenerator
from keras.callbacks import ModelCheckpoint, EarlyStopping

import numpy as np
np.set_printoptions(suppress=True)
np.set_printoptions(precision=2)

from mnist.loader import MNIST

import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

print("Imported all libraries!")

Import libraries...
Imported all libraries!


In [13]:
print("Loading MNIST dataset...")
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
print("Done!")

Loading MNIST dataset...
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
Done!


In [28]:
print("Creating model...")
# Creating model
model = Sequential()
# Conv2d layer with 64 filters of 3x3
model.add(Conv2D(input_shape=(28,28,3),filters=64,kernel_size=(3,3), activation="relu"))
model.add(Conv2D(filters=64,kernel_size=(3,3),padding="same", activation="relu"))
# Max pooling layer, 2x2 stride
model.add(MaxPool2D(pool_size=(2,2)))
# Drop out, not covered in the notebook so far
model.add(Dropout(0.25))
# Flatten data as required for dense (fully connected) layer
model.add(Flatten())
# Classic NN layer with 1024 nodes
model.add(Dense(units=1024,activation="relu"))
# Drop out, not covered in the notebook so far
model.add(Dropout(0.5))
# Final NN layer with 10 output nodes
model.add(Dense(units=10, activation="softmax"))
print("model created!")

Creating model...
model created!


### Measuring model accuracy

Generally machine learning models accuracy is measured in Mean Squared Error (MSE). However, for Neural Networks, it's generally more accurate to use `Cross-Entropy Error` (CSE) to measure accuracy. Suppose a neural network is classifying data and the outcome is a score from 0-1 how sure it is of each possible output label. To measure accuracy we consider  3 seperate inputs for the NN to predict; 2 were predicted correctly, and 1 was wrong. MSE will measure this as a 1/3 classification error, 2/3 correct which results in an 0.67 accuracy score. Whereas `Cross-Entropy error` (CSE) calculates how correct or wrong a given prediction is. E.g a prediction that was just barely right will receive a lower score in CSE than in MSE, same case when a prediction was only just incorrect, MSE sees it as entirely wrong whereas CSE still gives points for being almost right.

In [26]:
model.compile(optimizer=Adam(lr=0.001), loss=keras.losses.categorical_crossentropy, metrics=['accuracy'])
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_13 (Conv2D)           (None, 26, 26, 64)        1792      
_________________________________________________________________
conv2d_14 (Conv2D)           (None, 26, 26, 64)        36928     
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 13, 13, 64)        0         
_________________________________________________________________
dropout_7 (Dropout)          (None, 13, 13, 64)        0         
_________________________________________________________________
flatten_4 (Flatten)          (None, 10816)             0         
_________________________________________________________________
dense_7 (Dense)              (None, 1024)              11076608  
_________________________________________________________________
dropout_8 (Dropout)          (None, 1024)              0         
__________

### Model checkpoint
ModelCheckpoint helps us to save the model by monitoring a specific parameter of the model. In this case I am monitoring validation accuracy by passing val_acc to ModelCheckpoint. The model will only be saved to disk if the validation accuracy of the model in current epoch is greater than what it was in the last epoch.

### Early stopping
EarlyStopping helps us to stop the training of the model early if there is no increase in the parameter which I have set to monitor in EarlyStopping. In this case I am monitoring validation accuracy by passing val_acc to EarlyStopping. I have here set patience to 20 which means that the model will stop to train if it doesn’t see any rise in validation accuracy in 20 epochs.

### Epoch and batches

In fit_generator steps_per_epoch will set the batch size to pass training data to the model and validation_steps will do the same for test data. You can tweak it based on your system specifications.

In [None]:
checkpoint = ModelCheckpoint("conv2d_1.h5", monitor='val_acc', verbose=1, save_best_only=True, save_weights_only=False, mode='auto', period=1)
early = EarlyStopping(monitor='val_acc', min_delta=0, patience=2, verbose=1, mode='auto')
hist = model.fit_generator(steps_per_epoch=1,generator=mnist.train, validation_data=mnist.test, validation_steps=10,epochs=30,callbacks=[checkpoint,early])

Resulting model has a best accuracy score of 0.876

## VGG16
Let's examine the current best image recognition model, called VGG16.

![vvg16](sources/vgg16.jpg)

## Creating and training VGG16 CNN

### imports

In [2]:
import keras,os
from keras.models import Sequential
from keras.layers import Dense, Conv2D, MaxPool2D , Flatten
from keras.preprocessing.image import ImageDataGenerator
from keras.optimizers import Adam
from keras.callbacks import ModelCheckpoint, EarlyStopping

import numpy as np

import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

Using TensorFlow backend.


In [3]:
trdata = ImageDataGenerator()
traindata = trdata.flow_from_directory(directory="train",target_size=(224,224))
tsdata = ImageDataGenerator()
testdata = tsdata.flow_from_directory(directory="test", target_size=(224,224))

Found 25000 images belonging to 2 classes.
Found 12500 images belonging to 1 classes.


In [7]:
model = Sequential()
model.add(Conv2D(input_shape=(224,224,3),filters=64,kernel_size=(3,3),padding="same", activation="relu"))
model.add(Conv2D(filters=64,kernel_size=(3,3),padding="same", activation="relu"))
model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))
model.add(Conv2D(filters=128, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=128, kernel_size=(3,3), padding="same", activation="relu"))
model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))
model.add(Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"))
model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))
model.add(Flatten())
model.add(Dense(units=4096,activation="relu"))
model.add(Dense(units=4096,activation="relu"))
model.add(Dense(units=2, activation="softmax"))
opt = Adam(lr=0.001)
model.compile(optimizer=opt, loss=keras.losses.categorical_crossentropy, metrics=['accuracy'])
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_27 (Conv2D)           (None, 224, 224, 64)      1792      
_________________________________________________________________
conv2d_28 (Conv2D)           (None, 224, 224, 64)      36928     
_________________________________________________________________
max_pooling2d_11 (MaxPooling (None, 112, 112, 64)      0         
_________________________________________________________________
conv2d_29 (Conv2D)           (None, 112, 112, 128)     73856     
_________________________________________________________________
conv2d_30 (Conv2D)           (None, 112, 112, 128)     147584    
_________________________________________________________________
max_pooling2d_12 (MaxPooling (None, 56, 56, 128)       0         
_________________________________________________________________
conv2d_31 (Conv2D)           (None, 56, 56, 256)       295168    
__________

In [None]:
checkpoint = ModelCheckpoint("vgg16_1.h5", monitor='val_acc', verbose=1, save_best_only=True, save_weights_only=False, mode='auto', period=1)
early = EarlyStopping(monitor='val_acc', min_delta=0, patience=2, verbose=1, mode='auto')
hist = model.fit_generator(steps_per_epoch=1,generator=traindata, validation_data= testdata, validation_steps=10,epochs=30,callbacks=[checkpoint,early])

Above created model is a technical replica of VGG16.
![vvg16](sources/vgg16_architecture.png)

In [9]:
from keras.optimizers import Adam
opt = Adam(lr=0.001)
model.compile(optimizer=opt, loss=keras.losses.categorical_crossentropy, metrics=['accuracy'])
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_27 (Conv2D)           (None, 224, 224, 64)      1792      
_________________________________________________________________
conv2d_28 (Conv2D)           (None, 224, 224, 64)      36928     
_________________________________________________________________
max_pooling2d_11 (MaxPooling (None, 112, 112, 64)      0         
_________________________________________________________________
conv2d_29 (Conv2D)           (None, 112, 112, 128)     73856     
_________________________________________________________________
conv2d_30 (Conv2D)           (None, 112, 112, 128)     147584    
_________________________________________________________________
max_pooling2d_12 (MaxPooling (None, 56, 56, 128)       0         
_________________________________________________________________
conv2d_31 (Conv2D)           (None, 56, 56, 256)       295168    
__________

### Epoch and batches

In fit_generator steps_per_epoch will set the batch size to pass training data to the model and validation_steps will do the same for test data. You can tweak it based on your system specifications.

In [None]:
from keras.callbacks import ModelCheckpoint, EarlyStopping

checkpoint = ModelCheckpoint("vgg16_1.h5", monitor='val_acc', verbose=1, save_best_only=True, save_weights_only=False, mode='auto', period=1)
early = EarlyStopping(monitor='val_acc', min_delta=0, patience=20, verbose=1, mode='auto')
hist = model.fit_generator(steps_per_epoch=100,generator=traindata, validation_data= testdata, validation_steps=10,epochs=100,callbacks=[checkpoint,early])

40 minutes per epoch with a total of 100 epochs? that's 4000 minutes, or about 67 hours of continuous training.

## Transfer learning


In [11]:
import keras
from keras.models import Model
from keras.layers import Dense
from keras import optimizers
from keras.preprocessing.image import ImageDataGenerator
from keras.preprocessing import image
from keras.callbacks import ModelCheckpoint, EarlyStopping

In [12]:
trdata = ImageDataGenerator()
traindata = trdata.flow_from_directory(directory="cats_and_dogs_filtered/train",target_size=(224,224))
tsdata = ImageDataGenerator()
testdata = tsdata.flow_from_directory(directory="cats_and_dogs_filtered/validation", target_size=(224,224))

Found 2000 images belonging to 2 classes.
Found 1000 images belonging to 2 classes.


In [13]:
from keras.applications.vgg16 import VGG16
vggmodel = VGG16(weights='imagenet', include_top=True)

In [14]:
for layers in (vggmodel.layers)[:19]:
    print(layers)
    layers.trainable = False

<keras.engine.input_layer.InputLayer object at 0x000001D3020F2588>
<keras.layers.convolutional.Conv2D object at 0x000001D37FDF1B00>
<keras.layers.convolutional.Conv2D object at 0x000001D302117748>
<keras.layers.pooling.MaxPooling2D object at 0x000001D30212B6A0>
<keras.layers.convolutional.Conv2D object at 0x000001D30212B048>
<keras.layers.convolutional.Conv2D object at 0x000001D3021D72B0>
<keras.layers.pooling.MaxPooling2D object at 0x000001D308E54CF8>
<keras.layers.convolutional.Conv2D object at 0x000001D308E40C88>
<keras.layers.convolutional.Conv2D object at 0x000001D308EFE198>
<keras.layers.convolutional.Conv2D object at 0x000001D308F2BBE0>
<keras.layers.pooling.MaxPooling2D object at 0x000001D307D20EF0>
<keras.layers.convolutional.Conv2D object at 0x000001D307D20A20>
<keras.layers.convolutional.Conv2D object at 0x000001D307D584A8>
<keras.layers.convolutional.Conv2D object at 0x000001D307D6EF98>
<keras.layers.pooling.MaxPooling2D object at 0x000001D307D85EB8>
<keras.layers.convoluti

In [15]:
X= vggmodel.layers[-2].output
predictions = Dense(2, activation="softmax")(X)
model_final = Model(input = vggmodel.input, output = predictions)

model_final.compile(loss = "categorical_crossentropy", optimizer = optimizers.SGD(lr=0.0001, momentum=0.9), metrics=["accuracy"])
model_final.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
__________

  This is separate from the ipykernel package so we can avoid doing imports until


In [16]:
model_final.compile(loss = "categorical_crossentropy", optimizer = optimizers.SGD(lr=0.0001, momentum=0.9), metrics=["accuracy"])

In [17]:
model_final.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
__________

In [18]:
checkpoint = ModelCheckpoint("vgg16_1.h5", monitor='val_acc', verbose=1, save_best_only=True, save_weights_only=False, mode='auto', period=1)
early = EarlyStopping(monitor='val_acc', min_delta=0, patience=2, verbose=1, mode='auto')

model_final.fit_generator(generator= traindata, steps_per_epoch= 2, epochs= 100, validation_data= testdata, validation_steps=1, callbacks=[checkpoint,early])
model_final.save_weights("vgg16_1.h5")

Epoch 1/100

Epoch 00001: val_acc improved from -inf to 0.43750, saving model to vgg16_1.h5
Epoch 2/100

Epoch 00002: val_acc improved from 0.43750 to 0.75000, saving model to vgg16_1.h5
Epoch 3/100

Epoch 00003: val_acc improved from 0.75000 to 0.96875, saving model to vgg16_1.h5
Epoch 4/100

Epoch 00004: val_acc did not improve from 0.96875
Epoch 5/100

Epoch 00005: val_acc did not improve from 0.96875
Epoch 00005: early stopping


## Edge case model testing
The model is trained to classify pictures of cats and dogs, and the performance of the model is measured with a 0.94 accuracy score. But what would happen with edge cases? e.g. entering different animals into the CNN. Lions are cats, however vastly different they might be. Dogs stem from the wolf, but does a CNN only trained on dogs think so too?

In [33]:
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
from keras.applications.vgg16 import preprocess_input
from keras.applications.vgg16 import decode_predictions
from keras.applications.vgg16 import VGG16

image = load_img('wolf.jpg', target_size=(224, 224))
# convert the image pixels to a numpy array
image = img_to_array(image)
# reshape data for the model
image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))
# prepare the image for the VGG model
image = preprocess_input(image)
# predict the probability across all output classes
print(model_final.predict(image))

[[0.08076258 0.91923743]]


## Sources

https://arxiv.org/pdf/1707.09725.pdf#page=17

https://d2l.ai/chapter_convolutional-neural-networks/padding-and-strides.html

https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53

https://machinelearningmastery.com/introduction-to-1x1-convolutions-to-reduce-the-complexity-of-convolutional-neural-networks/

https://medium.com/datadriveninvestor/notes-on-deep-learning-advanced-cnn-75ed499ca053

https://engmrk.com/vgg16-implementation-using-keras

https://towardsdatascience.com/neural-style-transfer-tutorial-part-1-f5cd3315fa7f

https://deeplizard.com/learn/video/ZjM_XQa5s6s

https://deeplizard.com/learn/video/YRhxdVk_sIs

https://towardsdatascience.com/step-by-step-vgg16-implementation-in-keras-for-beginners-a833c686ae6c

https://github.com/Ojaswy/Neural-Networks-Recognizing-hand-written-digits

https://towardsdatascience.com/epoch-vs-iterations-vs-batch-size-4dfb9c7ce9c9

https://gist.github.com/kashif/76792939dd6f473b7404474989cb62a8