# 5-3  Fashion MNIST classification

We still use the Fashion MNIST dataset to do classification with Convolutional layers, Batch normalization and MaxPooling.



In [None]:
import torchvision
import matplotlib.pyplot as plt
from torch import nn, optim
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
import numpy as np

## Load the data

In this specific case, we will NOT define a dataloader as we did before, because we will process image by image, not in batches.

In [None]:
# load dataset
train_set = torchvision.datasets.FashionMNIST(root = './data/FashionMNIST', download = True,
                                              train = True, transform = transforms.Compose([transforms.ToTensor(),]))
test_set = torchvision.datasets.FashionMNIST(root = './data/FashionMNIST', download=True,
                                             train=False, transform = transforms.Compose([transforms.ToTensor()]))

## Define network

Let's first create a CNN architecture and then explain the layers.

We first define our CNN Class by inheriting **nn.Module**, and then create each layer of CNN in **init**. All operations of the neural network are implemented through the forward function. In this CNN example, there are two 2-dimensional convolutional layers, 2 Batch Normalization, 1 Max Pooling, and 2 fully connected linear layers, which are connected through some activation functions.

In [None]:
# define the model
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()

        self.conv1 = nn.Conv2d(1,10,5)    # Convolutional layer
        self.bn1 = nn.BatchNorm2d(10)     # batch normalization
        self.mp1 = nn.MaxPool2d(2, 2)     # Max Pooling

        self.conv2 = nn.Conv2d(10,20,3)
        self.bn2 = nn.BatchNorm2d(20)

        self.fc1 = nn.Linear(20*10*10,500)
        self.fc2 = nn.Linear(500, 10)

    def forward(self, x):
        input_size = x.size(0)
        # in: batch*1*28*28, out: batch*10*24*24  -- (28-5+1)
        x = self.conv1(x)
        x = F.relu(x)  # does not change the input size
        x = self.bn1(x)  # does not change the input size
        # in: batch*10*24*24, out: batch*10*12*12
        x = self.mp1(x)

        # in: batch*10*12*12, out: batch*20*10*10  -- (12-3+1)
        x = self.conv2(x)
        x = F.relu(x)

        # 20*10*10 = 2000
        x = x.view(input_size,-1)

        # in: batch*2000  out:batch*500
        x = self.fc1(x)
        x = F.relu(x)

        # in:batch*500 out:batch*10
        x = self.fc2(x)
        return x

### Convolutional layer
Fashionmnist is a two-dimensional image dataset to be recognized, so we use two-dimensional convolutional layer **torch.nn.Conv2d**.

`torch.nn.Conv2d(in_channels, out_channels, kernel_size,
                stride=1, padding=0, dilation=1, groups=1,
                bias=True, padding_mode='zeros')`

- in_channels (int): number of input image channels  
- out_channels (int): the number of channels after convolution  
- kernel_size (int or tuple): convolution kernel size  
- stride (int, optional): Convolution stride, default is 1

Let's look at an example operation of convolution layer:

In [None]:
import torch

input = torch.randn(1,1,28,28)
conv1 = nn.Conv2d(1,10,5)
output = conv1(input)

print(input.shape)
print(output.shape)

The size of input image is `(1x1x28x28)`:
* The first 1 is the batch size, which can be ignored here.
* The second is number of channel, for 1X28x28 image. The input of the convolutional layer is also a single channel, which needs to be consistent with the number of channels of the image! The output is 10 channels, and the size of the convolution kernel is 5x5. So our output is naturally (1x10x24x24): where batch size = 1 remains unchanged, the image becomes **(24x24), 24 = 28 - 5 + 1**.



### Batch Normalization

Batch Normalization (BN) is a technique used to normalize activations within a neural network, improving **training speed** and **stability**. It helps reduce internal covariate shift by normalizing the input of each layer across a mini-batch.

For each feature in a mini-batch:

1. Compute the mean $\mu$ and variance $\sigma$ of the feature
2. Normalize the feature:

$$\hat{x} = \frac{x-\mu}{\sigma}$$

3. Scale and shift using learnable parameters $\gamma$ (scale) and $\beta$ (shift):

$$y = \gamma\hat{x} + \beta$$

These parameters allow the model to adjust the normalized values if necessary.

In PyTorch, we can use Batch Normalization using this method:

```python
torch.nn.BatchNorm2d(num_features)
```

where, `num_features` represents the number of features that you have, i.e., the number of channels or neurons of the previous layers.

Observe that, the parameters of the Batch Normalization are learned during the training ONLY. Since the parameters are fixed after the network training is completed, the mean and variance of each validation/testing batch are unchanged. But how to do this in Pytorch?

In Pytorch, to change this we can just use `.train()`, to train the related parameters, such as the ones in the Batch Normalization. If we want to preserve the parameters, we can use `.eval()`.

In [None]:
network = CNN()  # network defined before
print(network)

network.train()  # use this to train the parameters

network.eval()  # use this during validation or testing to avoid learning parameters

### Max Pooling
The pooling layer is used to downsample/compress image(matrix), thereby reducing network computing consumption and redundancy. There are some operators based on [Pytorch](https://pytorch.org/docs/stable/nn.html#pooling-layers), such as the MaxPool2D:


```python
torch.nn.MaxPool2d(kernel_size, stride=None, padding=0)
```

Here is an example of `MaxPool`.

In [None]:
from re import X
import torch
from torch import nn
from torch.nn import MaxPool2d

input_5x5 = torch.tensor([[1, 2, 0, 3, 1],
                          [0, 1, 2, 3, 1],
                          [1, 2, 1, 0, 0],
                          [5, 2, 3, 1, 1],
                          [2, 1, 0, 1, 1]], dtype=torch.float32)
input_5x5 = torch.reshape(input_5x5, (-1, 1, 5, 5))
print("Input Size: ", input_5x5.shape)

class Test(nn.Module):
    def __init__(self):
        super(Test, self).__init__()
        self.maxpool1 = MaxPool2d(kernel_size=3, ceil_mode=True)  # 3x3

    def forward(self, input_data):
        output = self.maxpool1(input_data)
        return output

test = Test()
output = test(input_5x5)
print("Output: ", output.shape)

Using a Maxpool layer of 3x3, a 5x5 matrix become 2x2.

The relationship of input size and output size can be found: https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html#torch.nn.MaxPool2d .



### Putting it all together

After understanding the operations, let's understand the most basic usage: after creating a Net class, we can directly enter the input, and the result can be obtained through the forward function.
Of course, this example is just a simple demonstration, the model has not been trained, so the output is not accurate every time.

In [None]:
network = CNN()  # network defined before
print(network)

print('input', input.shape)
output = network(input)
pred = F.softmax(output, dim=1).argmax(1)
print(pred)  # prediction

## Hyperparameters and Optimizer

Let's define some.

We also know that the stochastic gradient descent algorithm converges faster when the learning rate becomes larger. In this way, we do not need to adjust the learning rate tediously, which greatly improves the efficiency of the optimized model.

In [None]:
learning_rate = 1e-3
batch_size = 60
epochs = 5

loss_function = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(network.parameters(), lr=learning_rate) # Stochastic Gradient Descent

In a training loop, the optimization has 3 steps:

1. Execute optimizer.zero_grad to clear the gradient accumulated in the system,
2. The prediction loss is backpropagated by calling loss.backward(). PyTorch will store the loss gradient corresponding to each parameter.
3. After getting the loss gradient, call optimizer.step() to optimize and adjust parameters



In [None]:
def train(network, train_loader, optimizer):
    network.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        optimizer.zero_grad()

        output = network(data)
        loss = loss_function(output, target)
        loss.backward()

        optimizer.step()

This implements the validation/testing pipeline.

In [None]:
def accuracy(epoch_idx, test_loader, network, set_type = None):
    network.eval()

    correct = 0
    with torch.no_grad():      # to calculate accuracy, we do not need the gradient any more
        for data, target in test_loader:
            outputs = network(data)
            _, predicted = torch.max(outputs.data, 1)
            correct += (predicted == target).sum().item()

    if set_type == "train":
        print('\nEpoch{}: Train accuracy: {}/{} ({:.0f}%)\n'.format(
            epoch_idx, correct, len(test_loader.dataset),
            100. * correct / len(test_loader.dataset)))

    if set_type == "test":
        print('\nEpoch{}: Test accuracy: {}/{} ({:.0f}%)\n'.format(
            epoch_idx, correct, len(test_loader.dataset),
            100. * correct / len(test_loader.dataset)))

    return correct / len(test_loader.dataset)

## Training
In our example, we only train 5 epochs by a simple CNN network. Using complex CNN structures, such as VGG, ResNet, etc., and more epochs are ways to improve accuracy.

In [None]:
from torch.utils.data import Dataset, DataLoader

train_loader = DataLoader(dataset=train_set,batch_size=104,shuffle=True) # training set shuffle the data
test_loader = DataLoader(dataset=test_set,batch_size=5,shuffle=False) # testing set fix the data order

network = CNN()
optimizer = optim.SGD(network.parameters(), lr=learning_rate)

for i in range(1,epochs+1):
  print(f"Epoch {i}\n-------------------------------")
  train(network = network, train_loader = train_loader, optimizer = optimizer)
  train_accuracy = accuracy(epoch_idx=i, test_loader = train_loader, network = network, set_type = "train")
  val_accuracy = accuracy(epoch_idx=i, test_loader = test_loader, network = network, set_type = "test")