<a href="https://colab.research.google.com/github/ashraj98/fashion-mnist-cnn/blob/main/fashion_mnist_cnn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CNN on Fashion MNIST Dataset
### Ashwin Rajgopal

To start off, import `pytorch` libraries as well as `pyplot` for showing comparison results.

In [1]:
# import standard PyTorch modules
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

# import torchvision module to handle image manipulation
import torchvision
import torchvision.transforms as transforms

# import pyplot to show comparison results
import matplotlib.pyplot as plt

Download the Fashion MNIST dataset and save for future runs.

In [2]:
# Use standard FashionMNIST dataset
train_set = torchvision.datasets.FashionMNIST(
    root = './data/FashionMNIST',
    train = True,
    download = True,
    transform = transforms.Compose([
        transforms.ToTensor()                                 
    ])
)

test_set = torchvision.datasets.FashionMNIST(
    root = './data/FashionMNIST',
    train = False,
    download = False,
    transform = transforms.Compose([
        transforms.ToTensor()                                 
    ])
)

This is the base neural network as specified in the description.
* layer 1:
  * 2d convolution of 5x5, 8 feature maps (channels out), no padding
  * each channel then passes through relu
  * each channel then passes through max pooling 2x2, stride 2x2
  * output of the layer will be 12x12x8  (12x12 images, 8 channels)
* layer 2:
  * 2d convolution of 5x5x8, 12 feature maps out, and padding is added to preserve width and height
  * each channel then passes through relu
  * each channel then passes through max pooling 2x2, stride 2x2
  * output of the layer will be 6x6x12 (6x6 image maps, 12 channels)
* layer 3:
  * fully connected layer, 256 outputs
  * outputs passed through relu
* layer 4: 
  * softmax layer with 10 outputs corresponding to classes

---

To allow this network's second layer kernel size to be adapted for comparison testing, I added an initialization parameter to the network. Since the dimensions needs to be maintained using padding in layer two, I did some calculations to figure out what the padding on each side was and used `nn.ConstantPad2d` to achieve the padding calculated. With this calculation, the next layers work without having to adjust the number of units.

In [3]:
# Build the neural network, expand on top of nn.Module
class Network(nn.Module):
  def __init__(self, layer2_kernel_size=5):
    super().__init__()

    self.layer2_kernel_size = layer2_kernel_size
    self.layer2_padding_size_start = (layer2_kernel_size - 1) // 2
    self.layer2_padding_size_end = self.layer2_padding_size_start if layer2_kernel_size % 2 == 1 else self.layer2_padding_size_start + 1
    self.layer2_padding = (
        self.layer2_padding_size_start, self.layer2_padding_size_end,
        self.layer2_padding_size_start, self.layer2_padding_size_end,
    )
    # define layers
    self.layer1 = nn.Sequential(
        nn.Conv2d(in_channels=1, out_channels=8, kernel_size=5),
        nn.ReLU(),
        nn.MaxPool2d(kernel_size=2, stride=2)
    )
    self.layer2 = nn.Sequential(
        nn.Conv2d(
            in_channels=8, out_channels=12, kernel_size=self.layer2_kernel_size,
        ),
        nn.ConstantPad2d(padding=self.layer2_padding, value=0),
        nn.ReLU(),
        nn.MaxPool2d(kernel_size=2, stride=2)
    )
    self.layer3 = nn.Sequential(
        nn.Flatten(),
        nn.Linear(432, 256),
        nn.ReLU()
    )
    self.layer4 = nn.Sequential(
        nn.Linear(256, 10),
        nn.Softmax(dim=1)
    )

  def forward(self, t):
    t = self.layer1(t)
    t = self.layer2(t)
    t = self.layer3(t)
    t = self.layer4(t)

    return t

This is function to set the passed in model in eval mode, and run a validation set and return the accuracy.

In [4]:
def get_accuracy(model, dataloader):
  count=0
  correct=0

  model.eval()
  with torch.no_grad():
    for batch in dataloader:
      images = batch[0]
      labels = batch[1]
      preds=network(images)
      batch_correct=preds.argmax(dim=1).eq(labels).sum().item()
      batch_count=len(batch[0])
      count+=batch_count
      correct+=batch_correct
  model.train()
  return correct/count

This function sets the network to train mode, and trains with the passed in train set, shuffling the dataset every epoch, and using the Adam optimizer with cross entropy loss function. Then, the function runs on the test set, and outputs the accuracy on the test set using the previous function.

In [5]:
def train_network(network, train_set, test_set, lr=0.001, batch_size=1000, epochs=10, shuffle=True):
  loader = torch.utils.data.DataLoader(train_set, batch_size=batch_size, shuffle=shuffle)
  optimizer = optim.Adam(network.parameters(), lr=lr)

  # set the network to training mode
  network.train()
  for epoch in range(epochs):
    for batch in loader:
      images = batch[0]
      labels = batch[1]
      preds = network(images)
      loss = F.cross_entropy(preds, labels)

      optimizer.zero_grad()
      loss.backward()
      optimizer.step()
  test_loader = torch.utils.data.DataLoader(test_set, batch_size=batch_size)
  return get_accuracy(network, test_loader)

Use the network and train function defined above to train the network using different kernel sizes.

In [None]:
kernel_sizes = [3, 4, 5, 6, 7, 8]
accuracies = []
for ks in kernel_sizes:
  network = Network(layer2_kernel_size=ks)
  acc = train_network(network=network, train_set=train_set, test_set=test_set)
  accuracies.append(acc)
fig = plt.figure()
ax = plt.axes()
ax.plot(kernel_sizes, accuracies, '-b')
plt.xlabel('Kernel Size')
plt.ylabel('Accuracy')

From the above graphic, it can be seen that the network performs better with a smaller kernel size than a larger one.

---

To optimize the network, I first updated the convolution layers to use 3x3 kernels. To keep the convolution outputs easily poolable using a 2x2 max pool kernel, I used 3x3 padding on the first layer convolution so the output is 32x32, so the max pool result is 16x16. Then in the second layer, I can again use a 3x3 convolution layer, and use a 1x1 padding to maintain 16x16. Then the max pool will reduce the size to 8x8.

I then optimized the model by increasing the number of channels to 15 in the first layer and 30 in the second layer. Increasing the channels increases number of inputs in the dense layer, so to account for that, I added another dense layer so the number of outputs wouldn't shrink so quickly from 1920 to 10.

After these optimizations, I was around 87% accuracy using the default training parameters from the `train_network` function defined above. To improve this, I added dropout to every layer, using `nn.Dropout2d` on layers before `nn.Flatten`, and ``nn.Dropout`` afterwards. I started by using 50% dropout, but this was hurting performance a little bit, so I reduced dropout to 25%, which improved the results.

Finally, I added batch normalization after every ReLU and max pool, to help regularize values from convolutions and max pooling.

In [None]:
class BetterNetwork(nn.Module):
  def __init__(self):
    super().__init__()

    # define layers
    # 28x28x1
    self.layer1 = nn.Sequential(
        nn.Conv2d(in_channels=1, out_channels=15, kernel_size=3, padding=3),
        nn.ReLU(),
        nn.BatchNorm2d(num_features=15),
        nn.MaxPool2d(kernel_size=2, stride=2),
        nn.BatchNorm2d(num_features=15),
        nn.Dropout2d(.25),
    )
    # 16x16x15
    self.layer2 = nn.Sequential(
        nn.Conv2d(in_channels=15, out_channels=30, kernel_size=3, padding=1),
        nn.ReLU(),
        nn.BatchNorm2d(num_features=30),
        nn.MaxPool2d(kernel_size=2, stride=2),
        nn.BatchNorm2d(num_features=30),
        nn.Dropout2d(.25),
    )
    # 8x8x30
    self.layer3 = nn.Sequential(
        nn.Flatten(),
        nn.Linear(1920, 512),
        nn.ReLU(),
        nn.BatchNorm1d(num_features=512),
        nn.Dropout(.25)
    )
    # 512x1
    self.layer4 = nn.Sequential(
        nn.Linear(512, 256),
        nn.ReLU(),
        nn.BatchNorm1d(num_features=256),
        nn.Dropout(.25)
    )
    # 256x1
    self.layer5 = nn.Sequential(
        nn.Linear(256, 10),
        nn.Softmax(dim=1)
    )

  def forward(self, t):
    t = self.layer1(t)
    t = self.layer2(t)
    t = self.layer3(t)
    t = self.layer4(t)
    t = self.layer5(t)

    return t

In [None]:
network = BetterNetwork()
acc = train_network(network=network, train_set=train_set, test_set=test_set)
print(acc)

After running this network using default training parameters, I was able to get 90.23% accuracy.