# Introduction to Convolutional Neural Network (CNN)
Machine Learning and Deep Learning have been around for decades, and they have evolved progressively through theoretical and practical implementation through trial and error of the theoretical concept. However, there are differences between machine and deep learning model architectures based on data. Assuming you are already familiar with a traditional neural network (NN) known as a multilayer perceptron (MLP), it is the final topic of traditional machine learning and basic concepts before diving into deep learning.

The mlp concept is based on human neurons, with each neuron activating based on specific threshold values. However, it has a disadvantage when working with image data because an image must be transformed into a single vector before being sent as input to mlp. Consider a color image with 54 X 54 pixels that are transformed into a single vector, which results in 8748 (54 X 54 X 3) pixels because the color image is composed of three channels (RGB). The amount of trainable weights in a vector after transformation increases considerably, resulting in a huge number of computations based on the number of pixels in an image. Aside from that, the transformation results in spatial information loss, i.e. increased pair pixel distance. To address this issue, a convolutional neural network (CNN) that mimics the human visual cortex has been proposed.

# CNN Architecture
CNN architecture composed of three major building block **convolutional**, **pooling** and **fully connected layer**.

**Convolutional Layer (CL):** A single CL is made up of input, filter, and activation functions. A filter, also known as a kernel, is a user-defined matrix that moves across (also known as stride) an input matrix or image from top left to right and bottom. It took the dot product of a single area of the image covered by the kernel and using stride value, the location of the kernel will change until all feasible positions have been covered, i.e. the kernel has gone through all pixel values of the image. Like MLP, the output of the CL will pass through the activation (non-linearity) function, and this activation function is made up of CL. The concept of kernel reduce the number of computation and keep spatial information as selecting a part of the image keep the pixel position data. The kernel concept reduces the number of computations while retaining spatial information by picking a portion of the image and retaining pixel location data. Also, if we take a kernel of size 3 X 3 and train it on an RGB image, the parameter we have to train is 27 (3 X 3 X 3), which is significantly less than the 8748 trainable weights mentioned earlier.

**Pooling Layer (PL):** PL functioned similar to CL with minor differences; it reduced the dimension of the output from CL and ensured that subsequent layers could capture the detail of the input. In the CNN architecture, maximum, minimum, and average PL are often utilized.

**Fully Connected Layer (FCL):** The FCL is used to classify the output similarly to MLP; we flatten the last CL output into a vector and map that flattened vector to the various classes we have, i.e. mnist classification dataset has 0-9.

Consider a face detection task and consider CNN in backward-like backpropagation. The closer the CL is to the output, the more small features like edges and curves are likely to detect, and when we backpropagate from the last layer, we first do it for an edge, and curve, then objects like nose, eyes for face image, and finally layer before input image we can identify it as the face.


# Example
In this noteboook I will describe how we can train CNN models using TensorFlow framework.

# Data collect

Collecting data image classification dataset (cat, dog) from github

In [1]:
!git clone https://github.com/MojammelHossain/tutorial.git

Cloning into 'tutorial'...
remote: Enumerating objects: 8, done.[K
remote: Counting objects: 100% (8/8), done.[K
remote: Compressing objects: 100% (6/6), done.[K
remote: Total 8 (delta 1), reused 4 (delta 1), pack-reused 0[K
Unpacking objects: 100% (8/8), 14.33 MiB | 14.48 MiB/s, done.


In [2]:
%cd /content/tutorial

/content/tutorial


In [3]:
!unzip test.zip

Archive:  test.zip
   creating: cats/
  inflating: cats/cat_190.jpg        
  inflating: __MACOSX/cats/._cat_190.jpg  
  inflating: cats/cat_147.jpg        
  inflating: __MACOSX/cats/._cat_147.jpg  
  inflating: cats/cat_542.jpg        
  inflating: __MACOSX/cats/._cat_542.jpg  
  inflating: cats/cat_595.jpg        
  inflating: __MACOSX/cats/._cat_595.jpg  
  inflating: cats/cat_422.jpg        
  inflating: __MACOSX/cats/._cat_422.jpg  
  inflating: cats/cat_583.jpg        
  inflating: __MACOSX/cats/._cat_583.jpg  
  inflating: cats/cat_384.jpg        
  inflating: __MACOSX/cats/._cat_384.jpg  
  inflating: cats/cat_586.jpg        
  inflating: __MACOSX/cats/._cat_586.jpg  
  inflating: cats/cat_545.jpg        
  inflating: __MACOSX/cats/._cat_545.jpg  
  inflating: cats/cat_223.jpg        
  inflating: __MACOSX/cats/._cat_223.jpg  
  inflating: cats/cat_551.jpg        
  inflating: __MACOSX/cats/._cat_551.jpg  
  inflating: cats/cat_587.jpg        
  inflating: __MACOSX/cats/._cat_

# Import necessary libraries

In [4]:
import os
import cv2
import glob
import math
import json
import numpy as np
import pandas as pd
from PIL import Image
import tensorflow as tf
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from tensorflow.keras.utils import to_categorical, Sequence

# Dataset and Dataloader
We will create a dataset class using the TensorFlow Sequence class to generate data batches for training the model.

In this example, we will read the image using a function read_img() that takes the image path as an argument and read using opencv library. We will resize our image so that each image follows the same dimension which is necessary for batching the data.

In [5]:
IMG_SIZE = (128, 128)
BATCH_SIZE = 8
NUM_CLASS = 2
IN_CHANNELS = 3

def read_img(path):
    """
    Summary:
        read, resize and normalize an image given a path
    Arguments:
        path (string): image path
    Return:
        numpy array
    """
    img = cv2.imread(path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    img = cv2.resize(img, IMG_SIZE)
    return cv2.normalize(img, None, 0, 1.0, cv2.NORM_MINMAX, dtype=cv2.CV_32F)

class MyDataset(Sequence):

    def __init__(self, img_dir, tgt, in_channels, batch_size, num_class):
        
        """
        Summary:
            initialize class variables
        Arguments:
            img_dir (list): all image directory
            tgt (list): corresponding image label
            in_channels (int): number of input channels
            batch_size (int): how many data to pass in a single step
            num_class (int): number of class in mask image
        Return:
            class object
        """

        self.img_dir = img_dir
        self.tgt = tgt
        self.in_channels = in_channels
        self.batch_size = batch_size
        self.num_class = num_class

    def __len__(self):
        
        """
        return total number of batch to travel full dataset
        """
        
        return math.ceil(len(self.img_dir) // self.batch_size)  # get the total number of batch

    def __getitem__(self, idx):
        
        """
        Summary:
            create a single batch for training
        Arguments:
            idx (int): sequential batch number
        Return:
            images and masks as numpy array for a single batch
        """
        
        # get a single batch for given idx. Ex: for idx=0, batch[0:batch_size] again for idx=1, batch[batch_size:2*batch_size]
        batch_x = self.img_dir[idx * self.batch_size:(idx + 1) *self.batch_size]
        batch_y = self.tgt[idx * self.batch_size:(idx + 1) *self.batch_size]

        imgs = []
        tgts = []
        
        for i in range(len(batch_x)):   # get all image and target for single batch
            imgs.append(read_img(batch_x[i]))
            tgts.append(to_categorical(batch_y[i], num_classes = self.num_class))

        # converting list to numpy array
        tgts = np.array(tgts)
        imgs = np.array(imgs)        
        return tf.convert_to_tensor(imgs), tf.convert_to_tensor(tgts)   # return non-weighted images and targets

## Dataloader
We will create dataloader object using the dataset class written above to pass through the model. Also partition the dataset into train, valid, and test.

In [6]:
def get_train_val_test_dataloader(path, label):
    
    """
    Summary:
        read train and valid image and mask directory and return dataloader
    Arguments:
        config (dict): Configuration directory
    Return:
        train and valid dataloader
    """
    x_train, x_rem, y_train, y_rem = train_test_split(path, label, train_size = 0.75, random_state=42)
    x_valid, x_test, y_valid, y_test = train_test_split(x_rem, y_rem, test_size = 0.5, random_state=42)

    print("train Example : {}".format(len(x_train)))
    print("valid Example : {}".format(len(x_valid)))
    print("test Example : {}".format(len(x_test)))

    train_dataloader = MyDataset(x_train,
                              y_train,
                              in_channels = IN_CHANNELS,
                              batch_size = BATCH_SIZE,
                              num_class = NUM_CLASS)

    # create dataloader object for validation dataset
    val_dataloader = MyDataset(x_valid,
                            y_valid,
                            in_channels = IN_CHANNELS,
                            batch_size = BATCH_SIZE,
                            num_class = NUM_CLASS)
    test_dataloader = MyDataset(x_test,
                            y_test,
                            in_channels = IN_CHANNELS,
                            batch_size = BATCH_SIZE,
                            num_class = NUM_CLASS)
    
    return train_dataloader, val_dataloader, test_dataloader

## Data preparation
Using the glob library we will fetch the image paths and generate labels. We consider cat as 0 and dog as 1 further we will initialize our train, valid, and test dataloader.


In [7]:
# fetch image paths
cat_dir = glob.glob("cats/*.*")
dogs_dir = glob.glob("dogs/*.*")

# generate labels
label = []
for i in range(len(cat_dir)):
  label.append(0)
for i in range(len(dogs_dir)):
  label.append(1)

# create python dataframe and shuffle data
data_dir = cat_dir + dogs_dir
df = pd.DataFrame(np.array([data_dir, label]).T, columns=["path", "label"]).sample(frac=1, random_state=42).reset_index()
df["label"] = pd.to_numeric(df["label"])

# initialize dataloader object
train_data, val_data, test_data = get_train_val_test_dataloader(list(df['path']), list(df["label"]))

train Example : 105
valid Example : 17
test Example : 18


# Model

Over the year different CNN architecture has been proposed for different tasks. However, we will create our own CNN architecture. As previously describe CNN is composed of three major layers so we will use those layers to create our architecture. in Tensorflow CL known as Conv2D, for PL we will be using MaxPooling2D which chooses the max value from a sub-part of the image, and FCL known as Dense.

In [8]:
def model():
    """
    Summary:
        create a define CNN model
    Return:
        return a keras model object
    """
    input = tf.keras.layers.Input((IMG_SIZE[0], IMG_SIZE[1], IN_CHANNELS))
    c1 = tf.keras.layers.Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(input)
    c1 = tf.keras.layers.Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(c1)
    p1 = tf.keras.layers.MaxPooling2D((2, 2))(c1)

    c2 = tf.keras.layers.Conv2D(128, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(p1)
    c2 = tf.keras.layers.Conv2D(128, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(c2)
    p2 = tf.keras.layers.MaxPooling2D((2, 2))(c2)

    flat = tf.keras.layers.Flatten()(p2)
    d1 = tf.keras.layers.Dense(256, activation='relu')(flat)
    d2 = tf.keras.layers.Dense(128, activation='relu')(d1)
    out = tf.keras.layers.Dense(2, activation='softmax')(d2)

    model = tf.keras.models.Model(inputs=[input], outputs=[out])
    return model

## Model initialization

Let us initialize our model and see how many parameter/weights we need to train.

In [9]:
model = model()
model.summary()

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 128, 128, 3)]     0         
                                                                 
 conv2d (Conv2D)             (None, 128, 128, 64)      1792      
                                                                 
 conv2d_1 (Conv2D)           (None, 128, 128, 64)      36928     
                                                                 
 max_pooling2d (MaxPooling2D  (None, 64, 64, 64)       0         
 )                                                               
                                                                 
 conv2d_2 (Conv2D)           (None, 64, 64, 128)       73856     
                                                                 
 conv2d_3 (Conv2D)           (None, 64, 64, 128)       147584    
                                                             

# Training
To train a TensorFlow model we need compile our model first and define some important arguments.

Loss: As we are doing binary classification so we will be using BinaryCrossentropy loss function which is responsible for penalizing bad prediction.

Metrics: It is a performance score to measure how much our model predictions are accurate. Here we will use the accuracy metric.

Now let us train our model for 10 epoch.

In [10]:
model.compile(optimizer='adam',
              loss=tf.keras.losses.BinaryCrossentropy(),
              metrics=['accuracy'])
history = model.fit(train_data,
                    verbose = 1, 
                    epochs = 10,
                    validation_data = val_data, 
                    shuffle = False,
                    )

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


# Testing

Now we will test our trained model on our test data to evaluate its performance in unseen data.

In [11]:
model.evaluate(test_data)



[5.118557929992676, 0.4375]

# Conclusion and Application
We can see that our model can identify the training data with 97% accuracy; however, in both validation and test data it can identify only 43.75% of the data correctly which could be for various reasons in which one is overfitting on train data which can be reduced by increasing the number data.

CNN architecture has different application

*  Object detection
*   Tracking vehicles
*  Identifying Flood casuality from Satelite image

