## 3.1 Convolution Neural Network
- Intro to Convolution Layer
- Train Simple CNN using Pytorch on MNIST Dataset

⚠️⚠️⚠️ *Please open this notebook in Google Colab* by click below link ⚠️⚠️⚠️<br><br>
<a href="https://colab.research.google.com/github/Muhammad-Yunus/Belajar-Image-Classification/blob/main/Pertemuan%203/3.1%20intro_to_cnn.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a><br><br><br>
- Click `Connect` button in top right Google Colab notebook,<br>
<img src="resource/cl-connect-gpu.png" width="250px">
- If connecting process completed, it will turn to something look like this<br>
<img src="resource/cl-connect-gpu-success.png" width="250px">

### 3.1.1 Into to Convolution Layer
| Fully Connected Layer Neural Network | Convolution Neural Network |
|-----------------------|----------------------------|
|<img src="resource/MNIST_NN.gif" width="500px">|<img src="resource/MNIST_CNN.gif" width="500px">|


#### 3.1.1.1 Convolution Operation
- Convolutional neural networks apply a series of <font color="orange">learnable filters</font> to the input image. 
- A convolutional layer is defined by the <font color="orange">filter (or kernel) size</font>, the <font color="orange">number of filters</font> applied and the <font color="orange">stride</font>.
- The input and the output of a convolutional layer each have <font color="orange">three dimensions</font> (width, height, number of channels).
- The <font color="orange">depth of the output</font> (number of channels) is adjusted by using <font color="orange">more or fewer filters</font>.
- <font color="orange">Padding</font> in a convolutional layer is like <font color="orange">adding a border</font> of extra pixels around the edges of an image before applying the convolution operation. 
    - It helps <font color="orange">maintain the original size</font> of the image after the convolution. <br><br>
<img src="resource/CNN.gif" width="500px"><img src="resource/CNN2.png" width="400px"><br>
<i>Illustration: filtering an image with <font color="cyan">two successive filters</font> made of <font color="cyan">4x4x3=48 learnable weights</font> each.</i><br><br>
<img src="resource/CNN3.png" width="600px"><br>
<i>Illustration: a convolutional neural network transforms <font color="cyan">"feature map"</font> into <font color="cyan">other "feature map"</font>.</i>


#### 3.1.1.2 Strided convolutions & Max Pooling Layer
- When <font color="orange">stacking convolutional layers</font>, the width and height of the output can be adjusted by using a <font color="orange">stride >1</font> or with a <font color="orange">max-pooling</font> operation.<br><br>
- <font color="cyan">Strided convolution</font>: by performing the convolutions with a <font color="orange">stride</font> of 2 or 3, we can also <font color="orange">shrink</font> the resulting feature map in its <font color="orange">horizontal dimensions</font>.<br><br> 
- <font color="cyan">Max pooling</font>: a <font color="orange">sliding window</font> applying the <font color="orange">MAX</font> operation (typically on 2x2 patches, repeated every 2 pixels).<br>
    <img src="resource/CNN4.gif" width="500px"><br>
    <i>Illustration: <font color="orange">sliding</font> the computing window by <font color="orange">3 pixels</font> results in <font color="orange">fewer</font> output values.</i>


#### 3.1.1.3 Final Layer
- After the last convolutional layer, the data is in the form of a "feature map". 
- There are <font color="orange">two ways</font> of feeding it through the final layer :
    1. <font color="cyan">Flatten the feature map</font> into a <font color="orange">1D Array</font> and then feed it to the <font color="orange">softmax layer</font>.
        - This kind of final layer similar to what we have done at the begining when creating a <font color="orange">fully connected layer</font>.
        - We are also able to add extra hidden layer between flattern featur map and softmax.
    2. Apply <font color="cyan">Global average pooling</font> to the feature map and then feed it to the <font color="orange">softmax layer</font>.
        - This technique <font color="orange">more computationaly cheaper</font> compare to flattern feature map into 1D array.<br><br>
<img src="resource/CNN5.png" width="600px"><br>

<br><br>
_______________
<br><br>
### 3.1.2 Train Simple CNN using Pytorch on MNIST Dataset

- Check GPU connected into Colab environment is active

In [None]:
!nvidia-smi

- Load library

In [None]:
!pip install gdown

import os
import cv2
import gdown
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader, random_split

import torchvision
from torchvision import transforms

from IPython import display

# clear output cell
display.clear_output()

print(f"torch : {torch.__version__}")
print(f"torch vision : {torchvision.__version__}")

- Download MNIST Dataset

In [None]:
DATASET_NAME = 'MNIST' # the dataset name
DATASET_NUM_CLASS = 10 # number of class in dataset

In [None]:
# default using gdrive_id Dataset `mnist_dataset.zip` (1-FfwJrllyHofQwIbMb_IxAkxnfMGSFmR)
gdrive_id = '1-FfwJrllyHofQwIbMb_IxAkxnfMGSFmR' # <-----  ⚠️⚠️⚠️ USE YOUR OWN GDrive ID FOR CUSTOM DATASET ⚠️⚠️⚠️

# download zip from GDrive
url = f'https://drive.google.com/uc?id={gdrive_id}'
gdown.download(url, DATASET_NAME + ".zip", quiet=False)

# unzip dataset
!unzip {DATASET_NAME}.zip -d {DATASET_NAME}

# clear output cell
display.clear_output()

- Load MNIST Dataset
    - <font color="orange">DONT FLATTEN THE INPUT IMAGE ON DATA LOADER</font>,
    - WE WILL FEED 2D 28x28 MNIST DIGIT IMAGE DATA INTO MODEL

In [None]:
# Define Custom Dataset class
# it's just helper to load image dataset using OpenCV and convert to pytorch tensor
# also doing a label encoding using one-hot encoding
class CustomDataset(Dataset):
    def __init__(self, root_dir):
        self.root_dir = root_dir
        self.image_files = sorted([file for file in os.listdir(root_dir) if file.lower().endswith('.png')])

    def __len__(self):
        return len(self.image_files)

    def __getitem__(self, idx):
        # Read image from corresponding .png file
        image_path = os.path.join(self.root_dir, self.image_files[idx])
        image = cv2.imread(image_path)  # Load image using OpenCV
        image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)  # Convert BGR to GRAY
        image = torch.from_numpy(image).to(torch.float32)  # Convert NumPy array to PyTorch tensor
        image = image.view(1, 28, 28) # reshape loaded MNIST image into 1x28x28 format (required by the model)

        # Read label from corresponding .txt file
        label_path = os.path.splitext(image_path)[0] + ".txt"
        with open(label_path, 'r') as label_file:
            label = int(label_file.read().strip())  # Assuming labels are integers

        # Apply one-hot encoding into label
        labels_tensor = torch.tensor(label)
        one_hot_encoded = F.one_hot(labels_tensor, num_classes=DATASET_NUM_CLASS).to(torch.float32)

        return image, one_hot_encoded



# instantiate dataset
# in here the image dataset is not loaded yet
# we only read all image files names in fataset folder
all_train_dataset = CustomDataset(root_dir=f'{DATASET_NAME}/dataset/train')
test_dataset = CustomDataset(root_dir=f'{DATASET_NAME}/dataset/test')

In [None]:
print(f"All Train Dataset : {len(all_train_dataset)} data")
print(f"Test Dataset : {len(test_dataset)} data")

In [None]:
# Split 'all_train_dataset' into 'train' and 'validation' set using `random_split()` function
train_dataset, validation_dataset = random_split(all_train_dataset, [50000, 10000])

print(f"Train Dataset : {len(train_dataset)} data")
print(f"Validation Dataset : {len(validation_dataset)} data")

In [None]:
# Create data loaders
BATCH_SIZE = 128

train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)
validation_loader = DataLoader(validation_dataset, batch_size=BATCH_SIZE, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=BATCH_SIZE, shuffle=False)

### 3.1.2.1 Create a CNN Model 
- Let us build a convolutional network for handwritten digit recognition. 
- We will use :
    - <font color="orange">Three Convolutional Layers</font> at the top,  
    - <font color="orange">Flatten</font> the feature map of last convolution layer,
    - Connect them with one <font color="orange">Fully-connected Layer</font>,
    - And <font color="orange">Softmax Layer</font> at the end. <br><br>
<img src="resource/CNN6.png" width="600px"><br><br>
> ___
> Notice that the <font color="orange">second</font> and <font color="orange">third</font> convolutional layers have a <font color="orange">stride of two</font> which explains why they bring the number of output values down from <font color="orange">28x28</font> to <font color="orange">14x14</font> and then <font color="orange">7x7</font>.
> ___



- Here is how to create pytorch 2D Convolution over an input data, 
    >`
    >    - With square kernels and equal stride <br>
    >    ```m = nn.Conv2d(in_channels=16, out_channels=33, kernel_size=3, stride=2)```<br><br>
    >    - non-square kernels with stride and padding <br> 
    >    ```m = nn.Conv2d(in_channels=16, out_channels=33, kernel_size=(3, 5), stride=2, padding=2)```<br><br>
    >    - non-square kernels and unequal stride with padding <br>
    >    ```m = nn.Conv2d(in_channels=16, out_channels=33, kernel_size=(3, 5), stride=(2,2), padding=2)```
    >
    >`
    - More about [Pytorch Conv2D](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html)

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Define the model using nn.Sequential
model = nn.Sequential(
    # Convolutional layers
    # the first convolution layer will process 2D 28x28 tensor data
    # always use padding=2 to maintain output feature map not loosing the border 
    nn.Conv2d(in_channels=1, out_channels=12, kernel_size=3, padding=2), 
    nn.ReLU(),
    nn.Conv2d(in_channels=12, out_channels=24, kernel_size=6, stride=2, padding=2),
    nn.ReLU(),
    nn.Conv2d(in_channels=24, out_channels=32, kernel_size=6, stride=2, padding=2),
    nn.ReLU(),
    
    # Flatten layer
    nn.Flatten(),
    
    # Fully connected layers
    nn.Linear(in_features=32 * 7 * 7, out_features=200),  # Adjust in_features based on flattened output
    nn.ReLU(),
    nn.Linear(in_features=200, out_features=10),
    nn.LogSoftmax(dim=1)
).to(device)

In [None]:
# setup optimizer, loss function & metric
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
loss_function = nn.CrossEntropyLoss()

- To run training process, we can use the following code

In [None]:
!pip install tqdm

from tqdm import tqdm

In [None]:
def train(model, train_loader, optimizer, loss_function):
    model.train()
    running_loss = 0.0
    correct_predictions = 0
    total_predictions = 0

    # Add progress bar for training loop
    progress_bar = tqdm(train_loader, desc='Training', leave=False)

    for inputs, labels in progress_bar:
        inputs = inputs.to(device) # move inputs to device
        labels = labels.to(device) # move labels to device

        # resets the gradients of all the model's parameters before the backward pass
        optimizer.zero_grad()
        # pass 2D 28x28 input tensor to CNN model
        outputs = model(inputs)
        # calc loss value
        loss = loss_function(outputs, labels)
        # computes the gradient of the loss with respect to each parameter in model
        loss.backward()
        # adjust model parameter
        optimizer.step()
        # sum loss value
        running_loss += loss.item()

        # Calculate correct & total prediction
        _, predicted = torch.max(outputs, 1)
        correct_predictions += (predicted == labels.argmax(1)).sum().item()
        total_predictions += labels.size(0)

        # Update progress bar description with current loss
        progress_bar.set_postfix(loss=loss.item())

    # Calculate average training loss
    average_train_loss = running_loss / len(train_loader.dataset)
    # Calculate training accuracy
    train_accuracy = correct_predictions / total_predictions
    return average_train_loss, train_accuracy

def validate(model, val_loader, loss_function):
    model.eval()
    running_loss = 0.0
    correct_predictions = 0
    total_predictions = 0

    # Add progress bar for validation loop
    progress_bar = tqdm(val_loader, desc='Validating', leave=False)

    with torch.no_grad():
        for inputs, labels in progress_bar:
            inputs = inputs.to(device) # move inputs to device
            labels = labels.to(device) # move labels to device

            # pass 2D 28x28 input tensor to CNN model
            outputs = model(inputs)
            # calc loss value
            loss = loss_function(outputs, labels)
            # sum loss value
            running_loss += loss.item()

            # Calculate correct & total prediction
            _, predicted = torch.max(outputs, 1)
            correct_predictions += (predicted == labels.argmax(1)).sum().item()
            total_predictions += labels.size(0)

            # Update progress bar description with loss
            progress_bar.set_postfix(loss=loss.item())

    # Calculate average validation loss
    average_val_loss = running_loss / len(val_loader.dataset)
    # Calculate validation accuracy
    val_accuracy = correct_predictions / total_predictions
    return average_val_loss, val_accuracy





# This is a training loop for selected Epoch
# each epoch will process all training and validation set, chunked into small batch size data
# then measure the loss & accuracy of training and validation set
NUM_EPOCH = 10      # you can change this value

train_losses = []
val_losses = []
train_accuracies = []
val_accuracies = []

for epoch in range(NUM_EPOCH):
    print(f"Epoch {epoch+1}/{NUM_EPOCH}")

    train_loss, train_accuracy = train(model, train_loader, optimizer, loss_function)
    val_loss, val_accuracy = validate(model, validation_loader, loss_function)

    train_losses.append(train_loss)
    val_losses.append(val_loss)
    train_accuracies.append(train_accuracy * 100)  # convert to percentage
    val_accuracies.append(val_accuracy * 100)  # convert to percentage

    print(f"Train Loss = {train_loss:.4f}, Val Loss = {val_loss:.4f}, Train Accuracy = {train_accuracy:.4f}, Val Accuracy = {val_accuracy:.4f}\n")

- Plot Loss and Accuracy of Training vs Validation Set 

In [None]:
# visualize Loss & Accuracy
import matplotlib.pyplot as plt

epochs = list(range(1, NUM_EPOCH + 1))

# Plotting loss
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.plot(epochs, train_losses, 'b', label='Training Loss')
plt.plot(epochs, val_losses, 'r', label='Validation Loss')
plt.title('Training and Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.grid(True)

# Plotting accuracy
plt.subplot(1, 2, 2)
plt.plot(epochs, train_accuracies, 'b', label='Training Accuracy')
plt.plot(epochs, val_accuracies, 'r', label='Validation Accuracy')
plt.title('Training and Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy (%)')
plt.legend()
plt.grid(True)

plt.tight_layout()


- Evaluate Model, find Precision, Recal each class data, measure accuracy and compute confusion matrix

In [None]:
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
import seaborn as sns
import numpy as np

# define evaluate function for test set
def evaluate(model, test_loader):
    model.eval()
    all_labels = []
    all_preds = []

    # Add progress bar for validation loop
    progress_bar = tqdm(test_loader, desc='Evaluating', leave=False)

    with torch.no_grad():
        # iterate over all batched test set
        for inputs, labels in progress_bar:
            inputs = inputs.to(device) # move inputs to device
            labels = labels.to(device) # move labels to device

            # pass 2D 28x28 input tensor to CNN model
            outputs = model(inputs)
            # get prediction
            _, preds = torch.max(outputs, 1)
            # collect all labels & preds
            all_labels.extend(labels.cpu().numpy())
            all_preds.extend(preds.cpu().numpy())

    return all_labels, all_preds

# Evaluation on test set
all_labels, all_preds = evaluate(model, test_loader)
all_labels = np.argmax(all_labels, axis=1)

# Calculate classification report
labels = [str(i) for i in range(DATASET_NUM_CLASS)]
print(classification_report(all_labels, all_preds, target_names=labels))

# Confusion Matrix
conf_matrix = confusion_matrix(all_labels, all_preds)

# Plotting the confusion matrix
plt.figure(figsize=(10, 7))
sns.heatmap(conf_matrix, annot=True, fmt="d", cmap="Blues")
plt.xlabel('Predicted Class')
plt.ylabel('Actual Class')
plt.title('Confusion Matrix')
plt.show()

- Download Model 

In [None]:
# Save the model
torch.save(model.state_dict(), 'trained_cnn_model.pt')

# Download the model file
from google.colab import files
files.download('trained_cnn_model.pt')

>
>## Discussion
>- Now the model performance, is <font color="orange">more better and achieve ~98% accuracy</font>,
><img src="resource/CNN7.png" width="800px">
>- But look at the <font color="orange">validation loss</font> curve. it's look like a sign of <font color="orange">overfitting</font>.
>- Remember how to tackle it? yes, now we will combine <font color="orange">Convolution</font> with <font color="orange">Dropout</font> Regularization to handle this.
>   - Try <font color="cyan">add droput layer</font> in between fully connected layer 200 & 60 with probability 40% (p=0.4)<br>
>      `nn.Dropout(0.4)`
>
>.

### 📝📝📝 Other Regularization Technique

- Now we will try to learn new possibility to adopt other regularization technique.
- It's called <font color="cyan">Batch Normalization</font>,<br><br>
<img src="resource/BatchNorm2.png" width="500px"><br><br>
- Open <font color="orange">'3.2 cnn_with_batch_normalization.ipynb'</font> in Google Colab to learn more...<br> 
<a href="https://colab.research.google.com/github/Muhammad-Yunus/Belajar-Image-Classification/blob/main/Pertemuan%203/3.2%20cnn_with_batch_normalization.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

_________________________________________________________________________
<br><br><br>
# Source
- https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html
- https://codelabs.developers.google.com/codelabs/cloud-tensorflow-mnist#9
- https://codelabs.developers.google.com/codelabs/cloud-tensorflow-mnist#10