### Overview of Convolutional Neural Networks (CNNs) for Image Classification

Convolutional Neural Networks (CNNs) are a class of deep learning models specifically designed for processing structured grid data, such as images. They are particularly effective for image classification tasks due to their ability to automatically and adaptively learn spatial hierarchies of features from input images.

#### Problem Representation
In the context of image classification, the problem involves categorizing images into predefined classes. For example, in the MNIST dataset, the task is to classify 28x28 pixel grayscale images of handwritten digits (0-9) into one of the 10 digit classes.

#### Main Steps in CNN Algorithm Implementation

1. **Data Loading and Transformation**:
   - Load the MNIST dataset.
   - Apply transformations to the images, such as resizing and normalization.

2. **Define the CNN Model**:
   - Create a CNN model with convolutional layers, activation functions, pooling layers, and fully connected layers.

3. **Training the Model**:
   - Train the CNN model using the training dataset.
   - Optimize the model parameters using backpropagation and an optimization algorithm (e.g., Adam).

4. **Testing the Model**:
   - Evaluate the trained model on the test dataset to measure its performance.


In [3]:
import torch 
import torch.nn as nn #To build the Neural Network
import heading #To get useful functions
from torchvision import datasets #To load the dataset
import matplotlib.pyplot as plt #To plot results
from torch.utils.data import DataLoader #To load the data

In [4]:
def transform_img(img_dimension):
    """Returns a transformation for images including resizing.

    Args:
        img_dimension (int): The dimension of the image.

    Returns:
        torchvision.transforms: The transformation for the image.
    """
    return heading.resize_image(img_dimension) #transform the image

In [5]:
def load_MNIST_data(train_bool, transform_img, batch_size=1):
    """Loads the MNIST dataset.

    Args:
        train_bool (boolean): True if training, False if testing.
        img_transform (torchvision.transforms): The transformation for the image.
        batch_size (int): Number of samples per batch. Defaults to 1.

    Returns:
        torch.DataLoader: The data loader for the MNIST dataset.
    """
    #Load the dataset
    load_dataset = datasets.MNIST(root='./Data', train=train_bool, download=True, transform=transform_img)

    return DataLoader(dataset=load_dataset, batch_size=batch_size, shuffle=True)

#### Side Note:
* Image tensors are represented as a tuple in the format, (batch_size, height, width, channels). The batch size is the number of images in a batch, the height and width are the dimensions of the image, and the channels are the color channels of the image. For example, a batch of 32 images with a height of 28 pixels, a width of 28 pixels, and 1 color channel would be represented as (32, 28, 28, 1).

The code cell below, defines a class that implements a Convolutional Neural Network for Image Classification using pytorch. Its layer details are as follows,

### Convolutional Layer 1: 
* 1 input channel (Gray scale)
* 64 output channels - Meaning 64 different filters are applied to the input image
* 2x2 kernel size
* stride 1
* padding 1
* Activation Function: ReLU
* Pooling Layer: Max Pooling with 2x2 kernel size
* Batch Normalisation: Applied to 64 features

### Convolutional Layer 2:
* 64 input channels
* 128 output channels - Meaning 128 different filters are applied to the input image
* 2x2 kernel size
* stride 1
* padding 1
* Activation Function: ReLU
* Pooling Layer: Max Pooling with 2x2 kernel size
* Batch Normalisation: Applied to 128 features

### Flattening Layer:
* Afrer conv 2 layer, the output has 128 channels, and spatial dimesnions 7x7. 
* Therefore, it is flattened to a 2D tensor of size (batch_size, 128x7x7)

### Fully Connected Layer 1:
* 7x7x128 input features
* 512 output features
* Dropout ratio of 0.1
* Activation Function: ReLU

### Fully Connected Layer 2:
* 512 input features
* 128 output features
* Dropout ratio of 0.5
* Activation Function: ReLU

### Fully Connected Layer 3:
* 128 input features
* 10 output features

### Output Layer:
* 10 output features corresponding to the 10 classes (10 numbers) of the dataset

In [6]:
class CONV_IMG(nn.Module):
    def __init__(self, num_input_channels=1, num_output_classes=10):
        super().__init__()
        
        #Convolutional Layer 1
        self.conv1 = torch.nn.Sequential(
            torch.nn.Conv2d(num_input_channels, out_channels=64, kernel_size=2, stride=1, padding=1),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d(kernel_size=2),
            torch.nn.BatchNorm2d(num_features=64)  #atch normalization for 64 features
        )
        #conv1 - Input size = 28x28x1, Output size = 28x28x64, After pooling = 14x14x64

        #Convolutional Layer 2
        self.conv2 = torch.nn.Sequential(
            torch.nn.Conv2d(in_channels=64, out_channels=128, kernel_size=2, stride=1, padding=1),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d(kernel_size=2),
            torch.nn.BatchNorm2d(num_features=128)  # Batch normalization for 128 features
        )
        #conv2 - Input size = 14x14x64, Output size = 14x14x128, After pooling = 7x7x128

        #Fully Connected Layer 1
        self.fc1 = torch.nn.Sequential(
            torch.nn.Linear(in_features=7*7*128, out_features=512),
            torch.nn.Dropout(0.1),  #Dropout for regularization
            torch.nn.ReLU()
        )

        #Fully Connected Layer 2
        self.fc2 = torch.nn.Sequential(
            torch.nn.Linear(in_features=512, out_features=128),
            torch.nn.Dropout(0.1),  #Dropout for regularization
            torch.nn.ReLU()
        )

        #Fully Connected Layer 3
        self.fc3 = torch.nn.Sequential(
            torch.nn.Linear(in_features=128, out_features=num_output_classes)
            #softmax is not needed here because CrossEntropyLoss applies softmax
        )

    def forward(self, x):
        """Performs a forward pass through the neural network.

        Args:
            x (torch.Tensor): The input tensor.
            
        Returns:
            torch.Tensor: The output tensor after passing through the network.
        """
        #forward pass through convolutional layers
        x = self.conv1(x)
        x = self.conv2(x)

        #Flatten the final output from the conv layers
        #x.size(0) is the batch size and specifies the size of the first dimension of the reshaped tensor
        #The second argument -1 is a placeholder that allows PyTorch to automatically infer
        #the size of the second dimension based on the original size of the tensor and the specified size of the first dimension
        x = x.view(x.size(0), -1) 

        #Forward pass through fully connected layers
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        return x

### How the CNN Model Works to solve the problem:

1) **Input Tensor:** The function forward takes an input tensor x, which represents the data I want to pass through the network. This tensor typically contains image data.

2) **Convolutional Layers:**

* x = self.conv1(x): The input tensor x is passed through the first convolutional layer (conv1). This layer applies a set of filters to the input image, performing convolution operations to extract local features such as edges, textures, etc.
* x = self.conv2(x): The output from the first convolutional layer is then passed through the second convolutional layer (conv2). This layer further processes the features extracted by the first layer, capturing more complex patterns and structures.

3) **Flattening:**

* x = x.view(x.size(0), -1): After the convolutional layers, the tensor is flattened. Flattening transforms the multi-dimensional tensor into a 2D tensor where the first dimension is the batch size, and the second dimension is the flattened feature vector. This step is necessary to prepare the data for the fully connected layers.

4) **Fully Connected Layers:**

* x = self.fc1(x): The flattened tensor is passed through the first fully connected layer (fc1). This layer combines the features extracted by the convolutional layers to form higher-level representations.
* x = self.fc2(x): The output from the first fully connected layer is then passed through the second fully connected layer (fc2). This layer further refines the representations.
* x = self.fc3(x): Finally, the output from the second fully connected layer is passed through the third fully connected layer (fc3). This layer produces the final output of the network, which are logits representing the predicted class scores.

5) **Output Tensor:** The final output tensor x is returned. This tensor represents the network's predictions based on the input data.


In [7]:
def train_cnn(num_input_channels, num_output_classes, training_dataloader, num_epochs, learning_rate):
    """Trains a Convolutional Neural Network (CNN) model.

    Args:
        num_input_channels (int): Number of input channels.
        num_output_classes (int): Number of output classes.
        training_dataloader (DataLoader): DataLoader for training data.
        num_epochs (int): Number of training epochs.
        learning_rate (float): Learning rate for the optimizer.
        
    Returns:
        CONV_Img : Trained CNN model.
    """

    #create an instance of the CNN
    cnn = CONV_IMG(num_input_channels=num_input_channels, num_output_classes=num_output_classes)

    #create the neural network
    network = heading.create_network(cnn)

    #train model using heading.trian_model()
    trained_model = heading.train_model(net=network, train_loader=training_dataloader, epochs=num_epochs, LR=learning_rate)

    return trained_model

In [11]:
def test_cnn(trained_model, test_dataloader):
    """Tests a Convolutional Neural Network (CNN) model. Returns accuracy.

    Args:
        trained_model (CONV_IMG): Trained CNN model.
        test_dataloader (DataLoader): DataLoader for testing data.

    Returns:
        float: Accuracy of the model
    """
    return heading.test_cnn(trained_model, test_dataloader)

In [14]:
def main():
    img_dimension = 28  #dimension of the input images
    num_input_channels = 1  #number of input channels (grayscale images)
    num_output_classes = 10  #number of output classes (digits 0-9)
    batch_size = 16  #number of samples per batch
    num_epochs = 50  #number of training epochs
    learning_rate = 0.001  #learning rate for the optimizer

    #load the training and testing data with transformations
    train_dataloader = load_MNIST_data(train_bool=True, transform_img=transform_img(img_dimension), batch_size=batch_size)
    test_dataloader = load_MNIST_data(train_bool=False, transform_img=transform_img(img_dimension), batch_size=batch_size)

    #train the CNN model
    model = train_cnn(num_input_channels=num_input_channels, num_output_classes=num_output_classes, training_dataloader=train_dataloader, num_epochs=num_epochs, learning_rate=learning_rate)

    #test the CNN model
    accuracy = test_cnn(model, test_dataloader)

    #print the test accuracy
    print(f'Test Accuracy: {accuracy:.2f}%')

    #save the trained model to a file
    torch.save(model.state_dict(), 'CNN_model.pth')

if __name__ == "__main__":
    main()

[1,100] loss:0.884
[1,200] loss:0.450
[1,300] loss:0.417
[1,400] loss:0.285
[1,500] loss:0.276
[1,600] loss:0.276
[1,700] loss:0.242
[1,800] loss:0.189
[1,900] loss:0.214
[1,1000] loss:0.193
[1,1100] loss:0.244
[1,1200] loss:0.198
[1,1300] loss:0.170
[1,1400] loss:0.202
[1,1500] loss:0.211
[1,1600] loss:0.212
[1,1700] loss:0.172
[1,1800] loss:0.207
[1,1900] loss:0.167
[1,2000] loss:0.152
[1,2100] loss:0.160
[1,2200] loss:0.153
[1,2300] loss:0.179
[1,2400] loss:0.152
[1,2500] loss:0.122
[1,2600] loss:0.168
[1,2700] loss:0.133
[1,2800] loss:0.126
[1,2900] loss:0.152
[1,3000] loss:0.151
[1,3100] loss:0.116
[1,3200] loss:0.141
[1,3300] loss:0.135
[1,3400] loss:0.159
[1,3500] loss:0.158
[1,3600] loss:0.164
[1,3700] loss:0.164
[2,100] loss:0.136
[2,200] loss:0.099
[2,300] loss:0.134
[2,400] loss:0.126
[2,500] loss:0.116
[2,600] loss:0.157
[2,700] loss:0.128
[2,800] loss:0.116
[2,900] loss:0.109
[2,1000] loss:0.134
[2,1100] loss:0.116
[2,1200] loss:0.141
[2,1300] loss:0.105
[2,1400] loss:0.12

#### Saving the model

In [22]:
# Create an instance of the model class
model = CONV_IMG(num_input_channels=1, num_output_classes=10)  

# Load the saved model weights
model.load_state_dict(torch.load('CNN_model.pth'))

  model.load_state_dict(torch.load('CNN_model.pth'))


<All keys matched successfully>