# ResNet Implementation

For this repository, I implemented the [ResNet paper](https://arxiv.org/pdf/1512.03385.pdf) in PyTorch from scratch without referencing any prior implementations. I downloaded the full [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) dataset.

In [3]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

from resnet import ResNet

### Importing Data

We use the functions `unpickle` and `array_to_tensor` to convert the data batches to PyTorch tensors. The original file represents the images as 1x3072 vectors, which we have to rearrange to fit 3x32x32.

In [4]:
def unpickle(file):
    import pickle
    with open(file, 'rb') as fo:
        d = pickle.load(fo, encoding='latin1')
        fo.close()
    return d

In [5]:
def array_to_tensor(a): # B x L
    B = len(a)
    red = torch.tensor(a[:,0:1024],dtype=torch.float)
    green = torch.tensor(a[:,1024:2048],dtype=torch.float)
    blue = torch.tensor(a[:,2048:],dtype=torch.float)

    return torch.stack((
        red.reshape((B,32,32)),
        green.reshape((B,32,32)),
        blue.reshape((B,32,32))
    )).permute(1,0,2,3)

In [44]:
files = ['data_batch_1','data_batch_2','data_batch_3','data_batch_4','data_batch_5']

inputs = torch.zeros((0,3,32,32))
labels = torch.zeros((0))

for f in files:
  data = unpickle(f)
  input = array_to_tensor(data['data'])
  label = torch.tensor(data['labels'])

  inputs = torch.cat((inputs,input),dim=0)
  labels = torch.cat((labels,label))

  del data
  del input
  del label

test_data = unpickle('test_batch')
test_inputs = array_to_tensor(test_data['data'])
test_labels = torch.tensor(test_data['labels'])

inputs = inputs.type(torch.FloatTensor)
labels = labels.type(torch.LongTensor)

test_inputs = test_inputs.type(torch.FloatTensor)
test_labels = test_labels.type(torch.LongTensor)

### Initialization

We initialize a network `res` from the `resnet.py` file in this repository. The batch size, loss function, and optimizer hyperparameters match the ones specified by the paper. We use the PyTorch `ReduceLROnPlateau` module to match the scheduling specifications from the paper. Finally, we move the training data and network to CUDA if possible.

In [6]:
res = ResNet(3)

batch_size = 256
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(
    res.parameters(),
    lr=0.1,
    weight_decay=0.0001,
    momentum=0.9)

scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer)

if torch.cuda.is_available():
  res = res.to('cuda')
  inputs = inputs.to('cuda')
  labels = labels.to('cuda')
  

res = res.train()

### Training Loop

We train for 100 epochs, since I don't have the time or compute to train for longer. The loss function plateaus around 1.45.

In [47]:
for e in range(100):
    for i in range(int(len(inputs)/batch_size)):
        batch_inputs = inputs[i*batch_size:(i+1)*batch_size]
        batch_labels = labels[i*batch_size:(i+1)*batch_size]
        
        optimizer.zero_grad()

        vals = res(batch_inputs)
        loss = criterion(vals,batch_labels)
        
        loss.backward()
        optimizer.step()

    scheduler.step(loss)

    if e % 2 == 0:
        lr = optimizer.param_groups[0]['lr']
        print(f'epoch: {e}, loss: {loss}, lr: {lr}')

OutOfMemoryError: ignored

### Results

With ResNet-20, the original paper reports an error rate of 8.75%. With just the free Google Colab GPU, I was able to get an error rate of 18.5%.

In [52]:
res.eval()

torch.cuda.empty_cache()

if torch.cuda.is_available():
  test_inputs = test_inputs[:1000].to('cpu')
  test_labels = test_labels[:1000].to('cpu')
  res.to('cpu')

num_error = sum([0 if i == 0 else 1 for i in torch.argmax(res(test_inputs), dim=1) - test_labels])

1 - num_error/len(test_labels)

0.815