In this notebook I'll be tweaking [kaggle notebook ResNet for MNIST with PyTorch](https://www.kaggle.com/readilen/resnet-for-mnist-with-pytorch?scriptVersionId=6942243) and the [PyTorch Tutorial on implementation of a ResNet model](https://pytorch-tutorial.readthedocs.io/en/latest/tutorial/chapter03_intermediate/3_2_2_cnn_resnet_cifar10/) in order to learn more about the ResNet model and also how to use PyTorch. The purpose is to change, add, remove, certain parts of the code and see exactly what happens, while also trying to better the accuracy (it's aprox ~10% in the other notebook that uses this model and the CIFAR 10 dataset). 

Code is due to [Liu Kuangs's extensive code](https://github.com/kuangliu) and also [TorchVision ResNet model](https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py) source code.

***
<span style="color: blue"><b>Study session observations: </b></span>

#### 20/07/2020:
Had difficulties implementing a 4th layer due to size, during forward propagation of residual block. Will look more into that after a more concise understanding of the archictecture and classes used. Changed the notebook's name in order to be more incisive with the notebook content. 

Will attempt to change a few input values of certain variables (like learning_rate, train and test loader batch size, etc.) and also both forward methods to see what happens. 

Also want to see more about num_epochs needed and in general more about epochs. When using 80 epochs, the accuracy was of ~10-11%. With 1, ~9-10%. Want to see what can influence the accuracy of the model.

#### 21/07/2020:
Attempted to train model using different layers ([3, 4, 6, 3] instead of [2,2,2,2]). The forth value of the layers isn't used (yet). Tried also changing the value of num of epochs and the learning_rate. Tried understanding more about the resnet class from torchvision and each layer by reading a bit more of its source.
***

In [1]:
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms

### Dataset
***

In [2]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

device(type='cuda')

In [23]:
transform = transforms.Compose([
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor()])

In [24]:
train_dataset = torchvision.datasets.CIFAR10(root='../cifar-10-batches-py/',
                                             train=True, 
                                             transform=transform)

test_dataset = torchvision.datasets.CIFAR10(root='../cifar-10-batches-py/',
                                            train=False, 
                                            transform=transforms.ToTensor())

In [25]:
print(train_dataset.data.shape)
print(test_dataset.data.shape)

(50000, 32, 32, 3)
(10000, 32, 32, 3)


In [26]:
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=100, 
                                           shuffle=True) #Original tutorial has shuffle=True

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=100, 
                                          shuffle=False)

### Creating classes and functions related to ResNet

In [27]:
def conv3x3(in_channels, out_channels, stride=1):
    return nn.Conv2d(in_channels,
                     out_channels,
                     kernel_size=3,
                     stride=stride,
                     padding=1,
                     bias=False)

In [28]:
class ResidualBlock(nn.Module):
    def __init__(self, in_channels, out_channels, stride=1, downsample=None):
        super(ResidualBlock, self).__init__()
        self.conv1 = conv3x3(in_channels, out_channels, stride)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = conv3x3(out_channels, out_channels)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.downsample = downsample
    
    def forward(self, x):
        residual = x
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)
        out = self.conv2(out)
        out = self.bn2(out)
        
        if self.downsample:
            residual = self.downsample(x)
            
        out += residual
        out = self.relu(out)
        
        return out

In [94]:
class ResNet(nn.Module):
    def __init__(self, block, layers, num_classes=10):
        super(ResNet, self).__init__()
        self.in_channels = 16
        self.conv = conv3x3(1, 16)  # 1 when using mnist, 3 when using cifar10
        self.bn = nn.BatchNorm2d(16)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size = 3, stride = 2, padding = 1)
        
        print("\nlayer1")
        self.layer1 = self.make_layer(block, 16, layers[0])
        print("\nlayer2")
        self.layer2 = self.make_layer(block, 32, layers[1], 2)
        print("\nlayer3")
        self.layer3 = self.make_layer(block, 64, layers[2], 2)
#         print("\nlayer4")
#         self.layer4 = self.make_layer(block, 128, layers[3], 2)
        self.avg_pool = nn.AvgPool2d(8)
        self.fc = nn.Linear(64, num_classes)
        
    
    def make_layer(self, block, out_channels, blocks, stride=1):
        
        downsample = None
        
        if (stride != 1) or (self.in_channels != out_channels):
            downsample = nn.Sequential(
                conv3x3(self.in_channels, out_channels, stride=stride),
                nn.BatchNorm2d(out_channels))
        
        layers = []
        layers.append(block(self.in_channels, out_channels, stride, downsample))
        self.in_channels = out_channels
        
        for i in range(1, blocks):
            layers.append(block(out_channels, out_channels))
        
        return nn.Sequential(*layers)
    
    def forward(self, x):
        out = self.conv(x)
        out = self.bn(out)
        out = self.relu(out)
        out = self.maxpool(out)
        out = self.layer1(out)
        out = self.layer2(out)
        out = self.layer3(out)
#         out = self.layer4(out)
        out = self.avg_pool(out)
        out = out.view(out.size(0), -1)  # unsure what this is, possibily flatten?
        out = self.fc(out)
        
        return out

In [98]:
class ResNetX(nn.Module):
    def __init__(self, block, layers, num_classes=10):
        super(ResNetX, self).__init__()
        self.in_channels = 16
        self.conv = conv3x3(1, 16)  # 1 when using mnist, 3 when using cifar10
        self.bn = nn.BatchNorm2d(16)
        self.relu = nn.ReLU(inplace=True)
        
        print("\nlayer1")
        self.layer1 = self.make_layer(block, 16, layers[0])
        print("\nlayer2")
        self.layer2 = self.make_layer(block, 32, layers[1], 2)
        print("\nlayer3")
        self.layer3 = self.make_layer(block, 64, layers[2], 2)
#         print("\nlayer4")
#         self.layer4 = self.make_layer(block, 128, layers[3], 2)
        self.avg_pool = nn.AvgPool2d(8)
        self.fc = nn.Linear(64, num_classes)
        
    
    def make_layer(self, block, out_channels, blocks, stride=1):
        
        downsample = None
        
        if (stride != 1) or (self.in_channels != out_channels):
            downsample = nn.Sequential(
                conv3x3(self.in_channels, out_channels, stride=stride),
                nn.BatchNorm2d(out_channels))
        
        layers = []
        layers.append(block(self.in_channels, out_channels, stride, downsample))
        self.in_channels = out_channels
        
        for i in range(1, blocks):
            layers.append(block(out_channels, out_channels))
        
        return nn.Sequential(*layers)
    
    def forward(self, x):
        out = self.conv(x)
        out = self.bn(out)
        out = self.relu(out)
        out = self.layer1(out)
        out = self.layer2(out)
        out = self.layer3(out)
#         out = self.layer4(out)
        out = self.avg_pool(out)
        out = out.view(out.size(0), -1)  # unsure what this is, possibily flatten?
        out = self.fc(out)
        
        return out

### Training model
***

In [45]:
def train_model(train_loader, model, num_epochs, error, optimizer, curr_lr, total_step, device):
    print("Training model")
    
    for epoch in range(num_epochs):
        for i, (images, labels) in enumerate(train_loader):
            images = images.resize_(100, 1, 32, 32).to(device)
            labels = labels.to(device)

            # Forward pass
            outputs = model(images)
            loss = error(outputs, labels)

            # Backward and optimize
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

            if (i+1) % 100 == 0:
                print ("Epoch [{}/{}], Step [{}/{}] Loss: {:.4f}"
                       .format(epoch+1, num_epochs, i+1, total_step, loss.item()))

        # Decay learning rate
        if (epoch+1) % 20 == 0:
            curr_lr /= 3
            update_lr(optimizer, curr_lr)

        print("Current learning rate: ", curr_lr)
        
    return model

In [31]:
net_args = {
    "block" : ResidualBlock,
    "layers": [2, 2, 2, 2]
}

model1 = ResNet(**net_args).to(device)


layer1

layer2

layer3


In [62]:
net_args = {
    "block" : ResidualBlock,
    "layers": [3, 4, 6, 3]
}

model4 = ResNet(**net_args).to(device)


layer1

layer2

layer3


In [39]:
num_epochs = 10
learning_rate = 0.001

In [40]:
error = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

In [41]:
def update_lr(optimizer, lr):    
    for param_group in optimizer.param_groups:
        param_group['lr'] = lr

In [42]:
total_step = len(train_loader)
curr_lr = learning_rate

In [47]:
model2 = train_model(train_loader, model2, num_epochs, error, optimizer, curr_lr, total_step, device)

Training model
Epoch [1/10], Step [100/500] Loss: 2.5775
Epoch [1/10], Step [200/500] Loss: 2.4368
Epoch [1/10], Step [300/500] Loss: 2.5746
Epoch [1/10], Step [400/500] Loss: 2.5905
Epoch [1/10], Step [500/500] Loss: 2.5330
Current learning rate:  0.001
Epoch [2/10], Step [100/500] Loss: 2.5420
Epoch [2/10], Step [200/500] Loss: 2.4997
Epoch [2/10], Step [300/500] Loss: 2.5480
Epoch [2/10], Step [400/500] Loss: 2.4075
Epoch [2/10], Step [500/500] Loss: 2.5638
Current learning rate:  0.001
Epoch [3/10], Step [100/500] Loss: 2.5754
Epoch [3/10], Step [200/500] Loss: 2.4558
Epoch [3/10], Step [300/500] Loss: 2.4161
Epoch [3/10], Step [400/500] Loss: 2.4186
Epoch [3/10], Step [500/500] Loss: 2.5052
Current learning rate:  0.001
Epoch [4/10], Step [100/500] Loss: 2.4800
Epoch [4/10], Step [200/500] Loss: 2.6370
Epoch [4/10], Step [300/500] Loss: 2.5812
Epoch [4/10], Step [400/500] Loss: 2.5622
Epoch [4/10], Step [500/500] Loss: 2.5759
Current learning rate:  0.001
Epoch [5/10], Step [100/5

In [56]:
model3 = train_model(train_loader, model3, 80, error, optimizer, 0.01, total_step, device)

Training model
Epoch [1/80], Step [100/500] Loss: 2.6815
Epoch [1/80], Step [200/500] Loss: 2.8038
Epoch [1/80], Step [300/500] Loss: 2.8482
Epoch [1/80], Step [400/500] Loss: 2.8780
Epoch [1/80], Step [500/500] Loss: 2.7598
Current learning rate:  0.01
Epoch [2/80], Step [100/500] Loss: 2.7672
Epoch [2/80], Step [200/500] Loss: 2.9197
Epoch [2/80], Step [300/500] Loss: 2.9226
Epoch [2/80], Step [400/500] Loss: 2.7696
Epoch [2/80], Step [500/500] Loss: 2.6829
Current learning rate:  0.01
Epoch [3/80], Step [100/500] Loss: 2.7945
Epoch [3/80], Step [200/500] Loss: 2.7920
Epoch [3/80], Step [300/500] Loss: 2.7224
Epoch [3/80], Step [400/500] Loss: 2.9192
Epoch [3/80], Step [500/500] Loss: 2.7877
Current learning rate:  0.01
Epoch [4/80], Step [100/500] Loss: 2.8258
Epoch [4/80], Step [200/500] Loss: 2.9351
Epoch [4/80], Step [300/500] Loss: 2.7965
Epoch [4/80], Step [400/500] Loss: 2.7749
Epoch [4/80], Step [500/500] Loss: 2.7781
Current learning rate:  0.01
Epoch [5/80], Step [100/500] 

Epoch [34/80], Step [100/500] Loss: 2.8145
Epoch [34/80], Step [200/500] Loss: 2.7828
Epoch [34/80], Step [300/500] Loss: 2.6384
Epoch [34/80], Step [400/500] Loss: 2.8361
Epoch [34/80], Step [500/500] Loss: 2.7623
Current learning rate:  0.0033333333333333335
Epoch [35/80], Step [100/500] Loss: 2.6467
Epoch [35/80], Step [200/500] Loss: 2.5736
Epoch [35/80], Step [300/500] Loss: 2.8314
Epoch [35/80], Step [400/500] Loss: 2.8587
Epoch [35/80], Step [500/500] Loss: 3.0090
Current learning rate:  0.0033333333333333335
Epoch [36/80], Step [100/500] Loss: 2.9356
Epoch [36/80], Step [200/500] Loss: 2.8161
Epoch [36/80], Step [300/500] Loss: 2.8004
Epoch [36/80], Step [400/500] Loss: 2.7757
Epoch [36/80], Step [500/500] Loss: 2.8072
Current learning rate:  0.0033333333333333335
Epoch [37/80], Step [100/500] Loss: 2.6152
Epoch [37/80], Step [200/500] Loss: 2.8019
Epoch [37/80], Step [300/500] Loss: 2.7942
Epoch [37/80], Step [400/500] Loss: 2.8023
Epoch [37/80], Step [500/500] Loss: 2.9453
Cu

Epoch [65/80], Step [400/500] Loss: 2.7568
Epoch [65/80], Step [500/500] Loss: 2.7690
Current learning rate:  0.00037037037037037035
Epoch [66/80], Step [100/500] Loss: 2.8885
Epoch [66/80], Step [200/500] Loss: 2.6624
Epoch [66/80], Step [300/500] Loss: 2.8301
Epoch [66/80], Step [400/500] Loss: 2.6889
Epoch [66/80], Step [500/500] Loss: 2.9316
Current learning rate:  0.00037037037037037035
Epoch [67/80], Step [100/500] Loss: 2.6524
Epoch [67/80], Step [200/500] Loss: 2.8228
Epoch [67/80], Step [300/500] Loss: 2.7594
Epoch [67/80], Step [400/500] Loss: 2.8792
Epoch [67/80], Step [500/500] Loss: 2.8735
Current learning rate:  0.00037037037037037035
Epoch [68/80], Step [100/500] Loss: 2.7397
Epoch [68/80], Step [200/500] Loss: 2.6089
Epoch [68/80], Step [300/500] Loss: 2.8961
Epoch [68/80], Step [400/500] Loss: 2.8936
Epoch [68/80], Step [500/500] Loss: 2.8848
Current learning rate:  0.00037037037037037035
Epoch [69/80], Step [100/500] Loss: 2.8813
Epoch [69/80], Step [200/500] Loss: 2.

In [63]:
num_epochs = 50
model4 = train_model(train_loader, model4, num_epochs, error, optimizer, curr_lr, total_step, device)

Training model
Epoch [1/50], Step [100/500] Loss: 2.8199
Epoch [1/50], Step [200/500] Loss: 2.7521
Epoch [1/50], Step [300/500] Loss: 2.6919
Epoch [1/50], Step [400/500] Loss: 2.8673
Epoch [1/50], Step [500/500] Loss: 2.6657
Current learning rate:  0.001
Epoch [2/50], Step [100/500] Loss: 2.6928
Epoch [2/50], Step [200/500] Loss: 2.6839
Epoch [2/50], Step [300/500] Loss: 2.6584
Epoch [2/50], Step [400/500] Loss: 2.7283
Epoch [2/50], Step [500/500] Loss: 2.7093
Current learning rate:  0.001
Epoch [3/50], Step [100/500] Loss: 2.7731
Epoch [3/50], Step [200/500] Loss: 2.8198
Epoch [3/50], Step [300/500] Loss: 2.7120
Epoch [3/50], Step [400/500] Loss: 2.8589
Epoch [3/50], Step [500/500] Loss: 2.5999
Current learning rate:  0.001
Epoch [4/50], Step [100/500] Loss: 2.8063
Epoch [4/50], Step [200/500] Loss: 2.7485
Epoch [4/50], Step [300/500] Loss: 2.8649
Epoch [4/50], Step [400/500] Loss: 2.8521
Epoch [4/50], Step [500/500] Loss: 2.8000
Current learning rate:  0.001
Epoch [5/50], Step [100/5

Epoch [34/50], Step [100/500] Loss: 2.7334
Epoch [34/50], Step [200/500] Loss: 2.6441
Epoch [34/50], Step [300/500] Loss: 2.7575
Epoch [34/50], Step [400/500] Loss: 2.6941
Epoch [34/50], Step [500/500] Loss: 2.7010
Current learning rate:  0.0003333333333333333
Epoch [35/50], Step [100/500] Loss: 2.8377
Epoch [35/50], Step [200/500] Loss: 2.8226
Epoch [35/50], Step [300/500] Loss: 2.8312
Epoch [35/50], Step [400/500] Loss: 2.7407
Epoch [35/50], Step [500/500] Loss: 2.7282
Current learning rate:  0.0003333333333333333
Epoch [36/50], Step [100/500] Loss: 2.8697
Epoch [36/50], Step [200/500] Loss: 2.7334
Epoch [36/50], Step [300/500] Loss: 2.7546
Epoch [36/50], Step [400/500] Loss: 2.6936
Epoch [36/50], Step [500/500] Loss: 2.7428
Current learning rate:  0.0003333333333333333
Epoch [37/50], Step [100/500] Loss: 2.6760
Epoch [37/50], Step [200/500] Loss: 2.8001
Epoch [37/50], Step [300/500] Loss: 2.8856
Epoch [37/50], Step [400/500] Loss: 2.6673
Epoch [37/50], Step [500/500] Loss: 2.7252
Cu

In [99]:
net_args = {
    "block" : ResidualBlock,
    "layers": [3, 4, 6, 3]
}

model6 = ResNetX(**net_args).to(device)


layer1

layer2

layer3


In [100]:
model6 = train_model(train_loader, model6, num_epochs, error, optimizer, curr_lr, total_step, device)

Training model
Epoch [1/50], Step [100/500] Loss: 2.4178
Epoch [1/50], Step [200/500] Loss: 2.4513
Epoch [1/50], Step [300/500] Loss: 2.3688
Epoch [1/50], Step [400/500] Loss: 2.3529
Epoch [1/50], Step [500/500] Loss: 2.3530
Current learning rate:  0.001
Epoch [2/50], Step [100/500] Loss: 2.3721
Epoch [2/50], Step [200/500] Loss: 2.3819
Epoch [2/50], Step [300/500] Loss: 2.3370
Epoch [2/50], Step [400/500] Loss: 2.3484
Epoch [2/50], Step [500/500] Loss: 2.3434
Current learning rate:  0.001
Epoch [3/50], Step [100/500] Loss: 2.3743
Epoch [3/50], Step [200/500] Loss: 2.4079
Epoch [3/50], Step [300/500] Loss: 2.3831
Epoch [3/50], Step [400/500] Loss: 2.3770
Epoch [3/50], Step [500/500] Loss: 2.4426
Current learning rate:  0.001
Epoch [4/50], Step [100/500] Loss: 2.3814
Epoch [4/50], Step [200/500] Loss: 2.3416
Epoch [4/50], Step [300/500] Loss: 2.3675
Epoch [4/50], Step [400/500] Loss: 2.3036
Epoch [4/50], Step [500/500] Loss: 2.3887
Current learning rate:  0.001
Epoch [5/50], Step [100/5

Epoch [34/50], Step [100/500] Loss: 2.3758
Epoch [34/50], Step [200/500] Loss: 2.3518
Epoch [34/50], Step [300/500] Loss: 2.4662
Epoch [34/50], Step [400/500] Loss: 2.4451
Epoch [34/50], Step [500/500] Loss: 2.4124
Current learning rate:  0.0003333333333333333
Epoch [35/50], Step [100/500] Loss: 2.4145
Epoch [35/50], Step [200/500] Loss: 2.4176
Epoch [35/50], Step [300/500] Loss: 2.4160
Epoch [35/50], Step [400/500] Loss: 2.4071
Epoch [35/50], Step [500/500] Loss: 2.3949
Current learning rate:  0.0003333333333333333
Epoch [36/50], Step [100/500] Loss: 2.4179
Epoch [36/50], Step [200/500] Loss: 2.3982
Epoch [36/50], Step [300/500] Loss: 2.3337
Epoch [36/50], Step [400/500] Loss: 2.3699
Epoch [36/50], Step [500/500] Loss: 2.3813
Current learning rate:  0.0003333333333333333
Epoch [37/50], Step [100/500] Loss: 2.3376
Epoch [37/50], Step [200/500] Loss: 2.3535
Epoch [37/50], Step [300/500] Loss: 2.3285
Epoch [37/50], Step [400/500] Loss: 2.3396
Epoch [37/50], Step [500/500] Loss: 2.3316
Cu

### Testing model
***

In [49]:
def test_model(model, test_loader, device): 
    print("Testing model")
    model.eval()
    with torch.no_grad():
        correct = 0
        total = 0
        for images, labels in test_loader:
            images = images.resize_(100, 1, 32, 32).to(device)
            labels = labels.to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

        print('Accuracy of the model on the test images: {} %'.format(100 * correct / total))

In [50]:
test_model(model, test_loader, device)

Testing model
Accuracy of the model on the test images: 9.87 %


In [51]:
test_model(model2, test_loader, device)

Testing model
Accuracy of the model on the test images: 10.47 %


In [57]:
test_model(model3, test_loader, device)

Testing model
Accuracy of the model on the test images: 10.0 %


In [64]:
test_model(model4, test_loader, device)

Testing model
Accuracy of the model on the test images: 9.99 %


In [104]:
test_model(model6, test_loader, device)

Testing model
Accuracy of the model on the test images: 10.05 %


In [68]:
from torchvision.models import resnet50

In [70]:
model5 = resnet50(pretrained=False)

In [84]:
dataloaders = {
    "train": train_loader,
    "test": test_loader
}

In [85]:
model5

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 

In [101]:
model5 = model5.cuda()

In [102]:
preprocess = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]
)])