# Naive Implementation of Digit Recongnizer

This notebook is my implementation of MNIST digit recognizer. Datasets are provided by Kaggle as two CSVs. The result of this implementation is uploaded to Kaggle and scored 0.98157, a pretty standard score.

The aim of this notebook is for me, as a novice in deep learning, writing a CNN using pytorch from scratch.

I call this implementation naive since there is no preprocessing or data augmentation, and not much hyperparameter tuning as well.

As mentioned, Kaggle provided image as CSVs, so I have converted it to .npy files in a seperate script and will not be presented here.

## Import necessary packages

In [1]:
import numpy as np
import torch
from torch.utils.data import Dataset, DataLoader
from skimage import io, transform
import matplotlib.pyplot as plt
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

## Use CUDA to speed up

In [4]:
device = torch.device('cuda')

## Setup dataset and dataloader

- First define a class out of dataset class to handle each input-target pair. The aim of this class is to convert numpy array tensor and send image and label to GPU

- Then define a Dataset object that load image files and labels and use ```ToTensor``` to pass samples to GPU for training

In [5]:
class ToTensor(object):

    def __call__(self, sample):
        image, label = sample['image'], sample['label']
        image =  torch.from_numpy(image).unsqueeze(0)
        label = torch.tensor(label)
        if torch.cuda.is_available():
            device = torch.device('cuda')
            image = image.to(device,dtype=torch.float)
            label = label.to(device,dtype=torch.long)
        sample = {'image':image, 'label':label}

        return sample


In [6]:
class MNISTDataset(Dataset):
    def __init__(self, LabelFile, ImageFile, transform = ToTensor()):        
        self.labels = np.load(LabelFile)
        self.images = np.load(ImageFile)
        self.transform = transform

    def __len__(self):
        # Check number of labels and images match
        assert len(self.labels) == len(self.images)
        return len(self.labels)
    
    def __getitem__(self, idx):
        if torch.is_tensor(idx):
            idx = idx.tolist()
        
        image = self.images[idx]
        label = self.labels[idx]
        
        # organize image and label to one sample as dictionary
        sample = {'image': image, 'label': label}
        
        if self.transform:
            sample = self.transform(sample)

        return sample

## Define network

- Here we follow very simple neural network that consists of two convolution/pooling/activation layers and 3 linear layers

- The output layer is set to log_softmax as this is a classification task

In [7]:
class MyNet(nn.Module):

    def __init__(self):
        super(MyNet, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=3, padding=1)
        self.pool1 = nn.MaxPool2d(kernel_size=2,stride=2)
        self.activation1 = nn.ReLU()
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=16, kernel_size=3, padding=1)
        self.pool2 = nn.MaxPool2d(kernel_size=2,stride=2)
        self.activation2 = nn.ReLU()
        self.linear1 = nn.Linear(in_features=16*7*7, out_features=120)
        self.linear2 = nn.Linear(120, 84)
        self.linear3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.conv1(x)
        x = self.pool1(x)
        x = self.activation1(x)
        x = self.conv2(x)
        x = self.pool2(x)
        x = self.activation2(x)
        x = x.view(-1, 16*7*7)
        x = self.linear1(x)
        x = self.linear2(x)
        x = self.linear3(x)
        y = F.log_softmax(x)
        return y

## Setup for training

Load Training Dataset

In [8]:
TrainDataset = MNISTDataset("./train_labels.npy", "./train_images.npy")

In [10]:
TrainLoader = DataLoader(TrainDataset, batch_size=10, shuffle=True,num_workers=0)

Initialize network and send it to GPU

In [11]:
net = MyNet().to(device)

A quick preview of network

In [12]:
net

MyNet(
  (conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (activation1): ReLU()
  (conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (pool2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (activation2): ReLU()
  (linear1): Linear(in_features=784, out_features=120, bias=True)
  (linear2): Linear(in_features=120, out_features=84, bias=True)
  (linear3): Linear(in_features=84, out_features=10, bias=True)
)

Initialize optimizer, here we use ADAM

In [13]:
optimizer = optim.Adam(net.parameters())

## Start training

Here we train the training set for 20 epochs and print mean loss for 2000 batches for each epoch.

Choice of loss is __negative log likelihood loss__ since maximizing likelihood principle is equivalently minimizing negative likelihood. Logarithm does not change the properties of the function, like local minima and monotonicity, but it convert multiplication to addition, which is mathematically more simplified.

In [None]:
for epoch in range(20):
    running_loss = 0.0
    for batch_idx, sample in enumerate(TrainLoader):
        image, label = sample['image'], sample['label']
        optimizer.zero_grad()
        output = net(image)
        loss = F.nll_loss(output, label)
        loss.backward()
        optimizer.step()

        running_loss+=loss.item()
        if batch_idx%2000 == 1999:
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, batch_idx + 1, running_loss / 2000))
            running_loss = 0.0
print("Finished Training")

## Apply model trained on test dataset


First read in test dataset image. We can see there are 28,000 images in the dataset.

In [None]:
test_images = np.load('./test_data.npy')

(28000, 28, 28)

Set model as evaluation mode not training mode

In [None]:
net.eval()

MyNet(
  (conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (activation1): ReLU()
  (conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (pool2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (activation2): ReLU()
  (linear1): Linear(in_features=784, out_features=120, bias=True)
  (linear2): Linear(in_features=120, out_features=84, bias=True)
  (linear3): Linear(in_features=84, out_features=10, bias=True)
)

Predict image label using the model we trained. Here, the output of the model is an array with size 10. The index of the max element in this array is the prediction of the corresponding digit of the input image.

In [None]:
predictions = []
for i in range(len(test_images)):
    sample = torch.from_numpy(test_images[i]).to(device,dtype=torch.float)
    sample = sample.unsqueeze(0).unsqueeze(0)
    output = net(sample)
    _,predicted = torch.max(output,1)
    predictions.append(predicted.item())



## Formulate output for submission

First import pandas

In [None]:
import pandas as pd

Create index column as 'ImageId' and prediction column as 'Label'. Formulate this two columns as pandas dataframe and export to .csv

In [None]:
index = np.arange(1,28001,dtype=int)

In [None]:
d = {'ImageId':index, 'Label':predictions}

In [None]:
df = pd.DataFrame(d)

In [None]:
df

Unnamed: 0,ImageId,Label
0,1,2
1,2,0
2,3,9
3,4,0
4,5,3
...,...,...
27995,27996,9
27996,27997,7
27997,27998,3
27998,27999,9


In [None]:
df.to_csv('submission.csv',index=False)