## Student Identity
Name: Arman Lotfalikhani <br>
Student Number: 99109166

## Theoretical questions
### A:
In normal convolutional layers, all the samples are located in a rectangular shape, but deformable convolutions allow irregular sampling, and the samples may even be far apart
### B:
As the sampling is irregular, it can adapt better to transformations such as rotation, as the sampling itself can adjust to these changes.
### C:
Because the grids and the shifts are rectangular, when a rotation happens, the activation output of a single neuron may change substantially, but with irregular sampling, we can adjust and get a more or less equal output for the particular neoron.
### D:
Another normal convolutional layer with $C^\prime = C*k^2$ is used to compute the offsets

## Importing Libraries

In [1]:
import torch
import torchvision.datasets
from torchvision.datasets import CIFAR10
from torchvision import transforms
from torch.utils.data import DataLoader
import torch.nn as nn
import torch.nn.functional as F

import numpy as np
import time

In [2]:
train_set = torchvision.datasets.CIFAR10(root='.', train=True, download=True, transform=transforms.ToTensor())
test_set = torchvision.datasets.CIFAR10(root='.', train=False, download=True, transform=transforms.ToTensor())
trainloader = DataLoader(train_set, 64, shuffle=True)
testloader = DataLoader(test_set, 64, shuffle=True)

Files already downloaded and verified
Files already downloaded and verified


In [3]:
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1=nn.Conv2d(3,40,kernel_size=3,stride=1)
        self.avg_p1=nn.MaxPool2d(kernel_size=2,stride=2)
        self.conv2=nn.Conv2d(40,25,kernel_size=2,stride=1)
        self.avg_p2=nn.AvgPool2d(kernel_size=2,stride=2)

        self.lin1=nn.Linear(25*7*7,300)
        self.lin2=nn.Linear(300,100)
        self.lin3=nn.Linear(100,10)

    def forward(self, x):
        x=F.leaky_relu(self.conv1(x))
        x=self.avg_p1(x)
        x=F.leaky_relu(self.conv2(x))
        x=self.avg_p2(x)
        x=F.leaky_relu(self.lin1(torch.flatten(x,1)))
        x=F.leaky_relu(self.lin2(x))
        x=self.lin3(x)
        return x

In [4]:
net = Net()
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
loss_func=nn.CrossEntropyLoss()
gpu_net=net.to(device)
optimizer=torch.optim.Adam(gpu_net.parameters())

In [5]:
epoch_nums = 3
t=time.time()
for epoch in range(epoch_nums):
    correct = 0
    total = 0
    running_loss = 0.0
    for data in trainloader:
        images=data[0].to(device)
        my_classes=data[1].to(device)

        logits=gpu_net(images)
        loss=loss_func(logits,my_classes)

        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        predictions=torch.argmax(gpu_net(images),1)
        correct+=torch.sum(torch.eq(predictions,my_classes)).item()
        total+=len(predictions)

        running_loss=running_loss+loss.item()/len(trainloader)
    print("Epoch: %i Running loss: %f Accuracy: %i"%(epoch+1,running_loss,100*correct//total))
print('Elapsed time: ',time.time()-t)
print('Finished Training')

Epoch: 1 Running loss: 1.519513 Accuracy: 45
Epoch: 2 Running loss: 1.162651 Accuracy: 59
Epoch: 3 Running loss: 0.995600 Accuracy: 66
Elapsed time:  34.957897424697876
Finished Training


In [6]:
correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images_gpu=data[0].to(device)
        my_classes2_gpu=data[1].to(device)
        predictions=torch.argmax(gpu_net(images_gpu),1)
        correct+=torch.sum(torch.eq(predictions,my_classes2_gpu)).item()
        total+=len(predictions)

print(f'Accuracy of the network on the 10000 test images: {100 * correct // total} %')

Accuracy of the network on the 10000 test images: 64 %


In [7]:
class MyDeformConv2d(nn.Module):
    def __init__(self, in_channels, out_channels,
                 kernel_size, stride=1, padding=1, bias=False):
        super().__init__()
        self.deformable=torchvision.ops.DeformConv2d(in_channels=in_channels,
                                      out_channels=out_channels,
                                      kernel_size=kernel_size,
                                      stride=stride,
                                      padding=padding)
        self.normal_conv=nn.Conv2d(in_channels=in_channels,
                                   out_channels=2*kernel_size**2,
                                   kernel_size=kernel_size,
                                   stride=stride,
                                   padding=padding,
                                   bias=True)
    def forward(self, x):
        offset = self.normal_conv(x)
        out= self.deformable(x, offset, mask=None)
        return out

class Deformable_Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1=MyDeformConv2d(3,40,kernel_size=3,stride=1)
        self.max_p1=nn.MaxPool2d(kernel_size=2,stride=2)# becomes 40 15*15
        self.conv2=MyDeformConv2d(40,25,kernel_size=2,stride=1)#becomes 20 14*14
        self.max_p2=nn.MaxPool2d(kernel_size=2,stride=2)#becomes 20 7*7

        self.lin1=nn.Linear(25*8*8,300)
        self.lin2=nn.Linear(300,100)
        self.lin3=nn.Linear(100,10)

    def forward(self, x):
        x=F.leaky_relu(self.conv1(x))
        x=self.max_p1(x)
        x=F.leaky_relu(self.conv2(x))
        x=self.max_p2(x)
        x=F.leaky_relu(self.lin1(torch.flatten(x,1)))
        x=F.leaky_relu(self.lin2(x))
        x=self.lin3(x)
        return x

We have tried that the deformable network be as similar as possible to the original one, to capture the differences well.

In [8]:
net2= Deformable_Net()
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
loss_func=nn.CrossEntropyLoss()
gpu_net=net2.to(device)

optimizer=torch.optim.Adam(gpu_net.parameters())
epoch_nums = 3
t=time.time()
for epoch in range(epoch_nums):

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        images=data[0].to(device)
        my_classes=data[1].to(device)

        logits=gpu_net(images)
        loss=loss_func(logits,my_classes)

        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        running_loss=running_loss+loss.item()/len(trainloader)
    print("Epoch: %i Running loss: %f"%(epoch+1,running_loss))
print('Elapsed time: ',time.time()-t)
print('Finished Training')

Epoch: 1 Running loss: 1.530542
Epoch: 2 Running loss: 1.164199
Epoch: 3 Running loss: 0.988943
Elapsed time:  81.86091017723083
Finished Training


In [7]:
correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images_gpu=data[0].to(device)
        my_classes2_gpu=data[1].to(device)
        predictions=torch.argmax(gpu_net(images_gpu),1)
        correct+=torch.sum(torch.eq(predictions,my_classes2_gpu)).item()
        total+=len(predictions)

print(f'Accuracy of the network on the 10000 test images: {100 * correct // total} %')

Accuracy of the network on the 10000 test images: 66 %


In [9]:
class Deformable_Net2(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1=MyDeformConv2d(3,100,kernel_size=3,stride=1)
        self.max_p1=nn.MaxPool2d(kernel_size=2,stride=2)# becomes 40 15*15
        self.conv2=MyDeformConv2d(100,40,kernel_size=3,stride=1)#becomes 20 14*14
        self.max_p2=nn.MaxPool2d(kernel_size=2,stride=2)#becomes 20 7*7
        self.conv3=MyDeformConv2d(40,25,kernel_size=3,stride=1)

        self.lin1=nn.Linear(1600,300)
        self.lin2=nn.Linear(300,100)
        self.lin3=nn.Linear(100,10)

    def forward(self, x):
        x=F.leaky_relu(self.conv1(x))
        x=self.max_p1(x)
        x=F.leaky_relu(self.conv2(x))
        x=self.max_p2(x)
        x=F.leaky_relu(self.conv3(x))
        x=F.leaky_relu(self.lin1(torch.flatten(x,1)))
        x=F.leaky_relu(self.lin2(x))
        x=self.lin3(x)
        return x

Another deformable network with one more layer

In [10]:
net2= Deformable_Net2()
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
loss_func=nn.CrossEntropyLoss()
gpu_net=net2.to(device)
optimizer=torch.optim.Adam(gpu_net.parameters())
epoch_nums = 3
t=time.time()
for epoch in range(epoch_nums):
    correct = 0
    total = 0
    running_loss = 0.0
    for data in trainloader:
        images=data[0].to(device)
        my_classes=data[1].to(device)

        logits=gpu_net(images)
        loss=loss_func(logits,my_classes)

        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        predictions=torch.argmax(gpu_net(images),1)
        correct+=torch.sum(torch.eq(predictions,my_classes)).item()
        total+=len(predictions)

        running_loss=running_loss+loss.item()/len(trainloader)
    print("Epoch: %i Running loss: %f Accuracy: %i"%(epoch+1,running_loss,100*correct//total))
print('Elapsed time: ',time.time()-t)
print('Finished Training')

Epoch: 1 Running loss: 1.562213 Accuracy: 44
Epoch: 2 Running loss: 1.136853 Accuracy: 61
Epoch: 3 Running loss: 0.963186 Accuracy: 68
Elapsed time:  261.97031593322754
Finished Training


In [11]:
correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images_gpu=data[0].to(device)
        my_classes2_gpu=data[1].to(device)
        predictions=torch.argmax(gpu_net(images_gpu),1)
        correct+=torch.sum(torch.eq(predictions,my_classes2_gpu)).item()
        total+=len(predictions)

print(f'Accuracy of the network on the 10000 test images: {100 * correct // total} %')

Accuracy of the network on the 10000 test images: 66 %


### Conclusion:
We note that the accuracy has a slight 2% increase. In practice, both netowrks tend to overfit the dataset for larger epochs, so the difference cannot be captured in those cases. Also, as the error is $1-accuracy$, we have not printed it explicitly. The disadvantage of the deformable network is its longer training time, as it took more than twice the time for training a normal network (81 vs 35 seconds).
Also, adding a single more layer increases the time to 262 seconds, which is much higher that the previous one.