#### **Welcome to Assignment 2 on Deep Learning for Computer Vision.**
This assignment consists of three parts. Part-1 is based on the content you learned in Week-3 of course and Part-2 is based on the content you learned in Week-4 of the course. Part-3 is **un-graded** and mainly designed to help you flex the Deep Learning muscles grown in Part-2. 

Unlike the first two parts, you'll have to implement everything from scratch in Part-3. If you find answers to questions in Part-3, feel free to head out to the forums and discuss them with your classmates!

#### **Instructions**
1. Use Python 3.x to run this notebook
2. Write your code only in between the lines 'YOUR CODE STARTS HERE' and 'YOUR CODE ENDS HERE'.
you should not change anything else in the code cells, if you do, the answers you are supposed to get at the end of this assignment might be wrong.
3. Read documentation of each function carefully.
4. All the Best!

# Part-1

In [20]:
# Imports
import numpy as np
import cv2
import matplotlib.pyplot as plt
from PIL import Image

# %matplotlib inline uncomment this line if you're running this notebook on your local PC

In [21]:
# DO NOT CHANGE THIS CODE
np.random.seed(10)

### Question 1: Point matching using RANSAC

Given two sets of points related by affine transformation(with an outlier rate), use the RANSAC method to estimate the Affine transformation parameters between them and the number of inliers(Matching points).

Which of the following is the estimated number of inliers for an outlier rate of 0.7:

1. 76
2. 157
3. 223
4. 300

In [22]:
import numpy as np
np.random.seed(0)


# Affine Transform
# |x'|  = |a, b| * |x|  +  |tx|
# |y'|    |c, d|   |y|     |ty|
# pts_t =    A   * pts_s  + t

# -------------------------------------------------------------
# Test Class Affine

class Affine_Transform():

    def create_test_case(self, outlier_rate=0):
        ''' CREATE_TEST_CASE

            Randomly generate a test case of affine transformation.

            Input arguments:

            - outlier_rate : the percentage of outliers in test case,
            default is 0

            Outputs:

            - pts_s : Source points that will be transformed
            - pts_t : warped points
            - A, t : parameters of affine transformation, A is a 2x2
            matrix, t is a 2x1 vector, both of them are created randomly

        '''

        # Randomly generate affine transformation
        # A is a 2x2 matrix, the range of each value is from -2 to 2
        A = 4 * np.random.rand(2, 2) - 2

        # % t is a 2x1 VECTOR, the range of each value is from -10 to 10
        t = 20 * np.random.rand(2, 1) - 10

        # Set the number of points in test case
        num = 1000

        # Compute the number of outliers and inliers respectively
        outliers = int(np.round(num * outlier_rate))
        inliers = int(num - outliers)

        # Gernerate source points whose scope from (0,0) to (100, 100)
        pts_s = 100 * np.random.rand(2, num)
        # Initialize warped points matrix
        pts_t = np.zeros((2, num))

        # Compute inliers in warped points matrix by applying A and t
        pts_t[:, :inliers] = np.dot(A, pts_s[:, :inliers]) + t

        # Generate outliers in warped points matrix
        pts_t[:, inliers:] = 100 * np.random.rand(2, outliers)

        # Reset the order of warped points matrix,
        # outliers and inliers will scatter randomly in test case
        rnd_idx = np.random.permutation(num)
        pts_s = pts_s[:, rnd_idx]
        pts_t = pts_t[:, rnd_idx]

        return A, t, pts_s, pts_t

    def estimate_affine(self, pts_s, pts_t):
        ''' ESTIMATE_AFFINE

            Estimate affine transformation by the given points
            correspondences.

            Input arguments:
            - pts_t : points in target image
            - pts_s : points in source image

            Outputs:

            - A, t : the affine transformation, A is a 2x2 matrix
            that indicates the rotation and scaling transformation,
            t is a 2x1 vector determines the translation

            Method:

            To estimate an affine transformation between two images,
            at least 3 corresponding points are needed.
            In this case, 6-parameter affine transformation are taken into
            consideration, which is shown as follows:

            | x' | = | a b | * | x | + | tx |
            | y' |   | c d |   | y |   | ty |

            For 3 corresponding points, 6 equations can be formed as below:

            | x1 y1 0  0  1 0 |       | a  |       | x1' |
            | 0  0  x1 y1 0 1 |       | b  |       | y1' |
            | x2 y2 0  0  1 0 |   *   | c  |   =   | x2' |
            | 0  0  x2 y2 0 1 |       | d  |       | y2' |
            | x3 y3 0  0  1 0 |       | tx |       | x3' |
            | 0  0  x3 y3 0 1 |       | ty |       | y3' |

            |------> M <------|   |-> theta <-|   |-> b <-|

            Solve the equation to compute theta by:  theta = M \ b
            Thus, affine transformation can be obtained as:

            A = | a b |     t = | tx |
                | c d |         | ty |

        '''

        # Get the number of corresponding points
        pts_num = pts_s.shape[1]

        # Initialize the matrix M,
        # M has 6 columns, since the affine transformation
        # has 6 parameters in this case
        M = np.zeros((2 * pts_num, 6))

        for i in range(pts_num):
            # Form the matrix m
            temp = [[pts_s[0, i], pts_s[1, i], 0, 0, 1, 0],
                    [0, 0, pts_s[0, i], pts_s[1, i], 0, 1]]
            M[2 * i: 2 * i + 2, :] = np.array(temp)

        # Form the matrix b,
        # b contains all known target points
        b = pts_t.T.reshape((2 * pts_num, 1))

        try:
            # Solve the linear equation
            theta = np.linalg.lstsq(M, b)[0]

            # Form the affine transformation
            A = theta[:4].reshape((2, 2))
            t = theta[4:]
        except np.linalg.linalg.LinAlgError:
            # If M is singular matrix, return None
            # print("Singular matrix.")
            A = None
            t = None

        return A, t
# -------------------------------------------------------------

# Create instance
af = Affine_Transform()

# Generate a test case as validation with
# a rate of outliers
### YOUR CODE STARTS HERE
outlier_rate = 0.7
### YOUR CODE ENDS HERE
A_true, t_true, pts_s, pts_t = af.create_test_case(outlier_rate)

# At least 3 corresponding points to
# estimate affine transformation
K = 4
# Randomly select 3 pairs of points to do estimation
idx = np.random.randint(0, pts_s.shape[1], (K, 1))

A_test, t_test = af.estimate_affine(pts_s[:, idx], pts_t[:, idx])

# Display known parameters with estimations
# They should be same when outlier_rate equals to 0,
# otherwise, they are totally different in some cases
#print(A_true, '\n', t_true)
#print(A_test, '\n', t_test)

# -------------------------------------------------------------
# Test Class Ransac
# The number of iterations in RANSAC
ITER_NUM = 2000


class Ransac():

    def __init__(self, K=3, threshold=1):
        ''' __INIT__

            Initialize the instance.

            Input argements:

            - K : the number of corresponding points,
            default is 3
            - threshold : determing which points are inliers
            by comparing residual with it

        '''

        self.K = K
        self.threshold = threshold

    def residual_lengths(self, A, t, pts_s, pts_t):
        ''' RESIDUAL_LENGTHS

            Compute residual length (Euclidean distance) between
            estimation and real target points. Estimation are
            calculated by the given source point and affine
            transformation (A & t).

            Input arguments:

            - A, t : the estimated affine transformation calculated
            by least squares method
            - pts_s : key points from source image
            - pts_t : key points from target image

            Output:

            - residual : Euclidean distance between estimated points
            and real target points

        '''

        if not(A is None) and not(t is None):
            # Calculate estimated points:
            # pts_e = A * pts_s + t
            pts_e = np.dot(A, pts_s) + t

            # Calculate the residual length between estimated points
            # and target points
            diff_square = np.power(pts_e - pts_t, 2)
            residual = np.sqrt(np.sum(diff_square, axis=0))
            #print(residual.shape)
        else:
            residual = None

        return residual

    def ransac_fit(self, pts_s, pts_t):
        ''' RANSAC_FIT

            Apply the method of RANSAC to obtain the estimation of
            affine transformation and inliers as well.

            Input arguments:

            - pts_s : key points from source image
            - pts_t : key points from target image

            Output:

            - A, t : estimated affine transformation
            - inliers : indices of inliers that will be applied to refine the
            affine transformation

        '''
        min_residual=1000
        pts_num = pts_s.shape[1]
        
        #### YOUR CODE START HERE
        for i in range(ITER_NUM):
            idx = np.random.randint(0, pts_s.shape[1], (self.K, 1))
            #print(idx)
            A, t = af.estimate_affine(pts_s[:, idx], pts_t[:, idx])
            residual = self.residual_lengths(A, t, pts_s, pts_t)
            avg_res=np.sum(residual)/pts_num
            if  avg_res< min_residual:
                min_residual=avg_res
                print(min_residual,pts_num)
                inliers=[]
                for j in range(pts_num):
                    out = np.matmul(A,pts_s[:,j].reshape(2,1)) + t
                    #print(out.shape,pts_s[:,j].reshape(2,1).shape,t.shape)
                    res=np.sqrt(np.sum(np.power(out-pts_t[:,j].reshape(2,1),2)))
                    if  res< self.threshold:
                        inliers.append(j)
        
        
        ### YOUR CODE ENDS HERE
        return A, t, np.array(inliers)
# -------------------------------------------------------------

# Create instance
rs = Ransac(K=3, threshold=1)

residual = rs.residual_lengths(A_test, t_test, pts_s, pts_t)

# Run RANSAC to estimate affine transformation when
# too many outliers in points set
A_rsc, t_rsc, inliers = rs.ransac_fit(pts_s, pts_t)

# print the number of inliners or point matches
print (inliers.shape)




118.60503371408957 1000
43.481575740161496 1000
42.57823280539544 1000
40.945143546052 1000
40.70093593034735 1000
39.79358946975772 1000
32.968839571040185 1000
32.96883957104018 1000
32.96883957104017 1000
32.96883957104012 1000
(300,)


### Question 2: Detect corners in a given image using Harris Corner Detection Algorithm

Find the number of detected corner points in a given image using Harris Corner Detection Algorithm. Note that, Following criterion MUST be satisfied while applying Harris Corner detection Algorithm:

1. The size of neighbourhood considered for corner detection = 2.
2. Aperture parameter of Sobel derivative used = 3.
3. Harris detector free parameter in the equation = 0.04.

How many corners are detected?

1. 1068
2. 780
3. 1106
4. 976

In [23]:
import matplotlib.pyplot as plt
import numpy as np
import cv2

%matplotlib inline

# Read in the image
image = cv2.imread('image.png')

# Make a copy of the image
image_copy = np.copy(image)

# Change color to RGB (from BGR)
image_copy = cv2.cvtColor(image_copy, cv2.COLOR_BGR2RGB)

###YOUR CODE STARTS HERE

## STEP 1:  Convert to grayscale 

gray_image = cv2.cvtColor(image_copy, cv2.COLOR_RGB2GRAY)
## STEP 2: Detect corners 
dest_initial = cv2.cornerHarris(gray_image, 2, 3, 0.04)
## STEP 3: Dilate corner image to enhance corner points
dest = cv2.dilate(dest_initial, None)
## STEP 4:set threshold value as 0.1 * (maximum value of dilated corner image obtained from STEP3)
image[dest > 0.1 * dest.max()]=[0, 0, 255]
cv2.imshow('Image with Corners', image)
#cv2.waitKey(0) 
#cv2.destroyAllWindows()
## STEP 5: Count numer of detected corner points and draw them on the image
num_corners = np.sum(dest > 0.1 * dest.max())
print(num_corners)
### YOUR CODE ENDS HERE

1068


## Line detection from a given image. (Optional)


Find the starting and ending point co-ordinates of detected lines of a given image using hough transform. 

Following criterion need to be satisfied to qualify as a line:

1. Minimum line length = 100;
2. Maximum allowed gap between line segments = 200;
3. Accumulator threshold parameter = 50  (only those lines are returned that get enough votes);
4. Distance resolution of the accumulator in pixels = 1;
5. Angle resolution of the accumulator in radians = pi/180


Which is the mean of the start and end points of all the detected lines?

1. [324.6,  37.6], [490.4,  81.2]
2. [314.2, 34.2], [489.1,  76.4]
3. [312.9, 39.4], [492.3,  77.1]
4. None of the above

In [None]:
#Read image 
img = cv2.imread('image.png', cv2.IMREAD_COLOR)

# Visualize the input image
plt.imshow(img)
plt.title('Input Image')
plt.show()


#convert the image to gray-scale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

#Find the edges in the image using canny detector

from skimage import feature
edges = cv2.Canny(gray, 200, 300)

#### YOUR CODE STARTS HERE #####


#### YOUR CODE ENDS HERE #####

plt.imshow(img)
plt.title('Detected Line Image')
plt.show()

# Part-2

In [1]:
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
import torch.nn.functional as F
import timeit
import unittest

## Please DONOT remove these lines. 
torch.manual_seed(2021)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
np.random.seed(2021)

### Data Loading and Pre-processing

In [2]:
# check availability of GPU and set the device accordingly
#### YOUR CODE STARTS HERE ####

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
#### YOUR CODE ENDS HERE ####

# define a transforms for preparing the dataset
# for normalization of the MNIST dataset, take mean=0.1307 and std=0.3081
transform = transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.1307,),(0.3081,)),])
      # convert the image to a pytorch tensor
      # normalise the images with mean and std of the dataset (mean = 0.1307, std=0.3081)
print(device)

cuda:0


In [3]:
# Load the MNIST training, test datasets using `torchvision.datasets.MNIST` using the transform defined above
#### YOUR CODE STARTS HERE ####
train_dataset = datasets.MNIST('~/.pytorch/MNIST_data/',download=True,train=True,transform=transform)
test_dataset = datasets.MNIST('~/.pytorch/MNIST_data/',download=True,train=False,transform=transform)
#### YOUR CODE ENDS HERE ####

In [4]:
# create dataloaders for training and test datasets
# use a batch size of 32 and set shuffle=True for the training set
#### YOUR CODE STARTS HERE ####
train_dataloader = torch.utils.data.DataLoader(train_dataset,batch_size=32,shuffle=True)
test_dataloader = torch.utils.data.DataLoader(test_dataset,batch_size=32,shuffle=True)
#### YOUR CODE ENDS HERE ####

### Network Definition

In [14]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        #### YOUR CODE STARTS HERE ####
        # define a linear layer with output channels as 1024
        self.linear1 = nn.Linear(784,1024)
        # define a linear layer with output channels as 512
        self.linear2 = nn.Linear(1024,512)
        # define a linear layer with output channels as 256
        self.linear3 = nn.Linear(512,256)
        # define dropout layer with a probability of 0.25
        self.dropout1 = nn.Dropout(p=0.25)
        # define a linear layer with 128 output features
        self.linear4 = nn.Linear(256,128)
        # define a linear layer with output features corresponding to the number of classes in the dataset
        self.linear5 = nn.Linear(128,10)
        #### YOUR CODE ENDS HERE ####

    def forward(self, x):
        # Use the layers defined above in a sequential way (folow the same as the layer definitions above) and 
        # write the forward pass, after each of linear1, linear2, linear3, and linear4 use a relu activation. 
        # don't forget to resize your input x
        #### YOUR CODE STARTS HERE ####
        x = x.view(x.shape[0],-1)
        x= self.linear1(x)
        x= F.relu(x)
        x= self.linear2(x)
        x= F.relu(x)
        x= self.linear3(x)
        x= F.relu(x)
        x= self.dropout1(x)
        x= self.linear4(x)
        x= F.relu(x)
        x= self.linear5(x)
        #### YOUR CODE ENDS HERE ####
        output = F.log_softmax(x, dim=1)
        return output

### Question 3

What are total number of parameters in the model? 

1. 1654932
2. 1852197
3. 1494154
4. 2259843

In [None]:
#### YOUR CODE STARTS HERE ####
Network =Net()
parameters = Network.linear1.weight.numel() + Network.linear2.weight.numel() + Network.linear3.weight.numel() + Network.linear4.weight.numel() + Network.linear5.weight.numel() + Network.linear1.bias.numel() + Network.linear2.bias.numel() + Network.linear3.bias.numel() + Network.linear4.bias.numel() + Network.linear5.bias.numel()
print(parameters)
pytorch_total_params = sum(p.numel() for p in Network.parameters())
print(pytorch_total_params)
#print(Network.parameters())
#### YOUR CODE ENDS HERE ####

### Sanity Check
Make sure all the tests below pass without any errors, before you proceed with the training part.

In [5]:
import unittest

class TestImplementations(unittest.TestCase):
    
    # Dataloading tests
    def test_dataset(self):
        self.dataset_classes = ['0 - zero',
                                '1 - one',
                                '2 - two',
                                '3 - three',
                                '4 - four',
                                '5 - five',
                                '6 - six',
                                '7 - seven',
                                '8 - eight',
                                '9 - nine']
        self.assertTrue(train_dataset.classes == self.dataset_classes)
        self.assertTrue(train_dataset.train == True)
    
    def test_dataloader(self):        
        self.assertTrue(train_dataloader.batch_size == 32)
        self.assertTrue(test_dataloader.batch_size == 32)      

suite = unittest.TestLoader().loadTestsFromModule(TestImplementations())
unittest.TextTestRunner().run(suite)

..
----------------------------------------------------------------------
Ran 2 tests in 0.002s

OK


<unittest.runner.TextTestResult run=2 errors=0 failures=0>

### Training and Inference

In [6]:
def train(model, device, train_loader, optimizer, epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
      #### YOUR CODE STARTS HERE ####
        # send the image, target to the device
        data = data.to(device)
        target = target.to(device)
        # flush out the gradients stored in optimizer
        optimizer.zero_grad()
        # pass the image to the model and assign the output to variable named output
        output=model.forward(data)
        # calculate the loss (use nll_loss in pytorch)
        loss = F.nll_loss(output,target)
        # do a backward pass
        loss.backward()
        # update the weights
        optimizer.step()
      #### YOUR CODE ENDS HERE ####
        if batch_idx % 20 == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))

In [7]:
def test(model, device, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
          ### YOUR CODE STARTS HERE ####
            # send the image, target to the device
            data = data.to(device)
            target = target.to(device)
            # pass the image to the model and assign the output to variable named output
            output=model.forward(data)
          #### YOUR CODE ENDS HERE ####
            test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)

    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))

### Question 4

Run the code cell below and report the final test accuracy (If you are not getting the exact number shown in options, please report the closest number).
1. 58%
2. 69%
3. 97%
4. 89%

In [None]:
model = Net().to(device)

## Define Adam Optimiser with a learning rate of 0.0001
optimizer = optim.Adam(model.parameters(),lr=0.0001)

start = timeit.default_timer()
for epoch in range(1, 4):
    train(model, device, train_dataloader, optimizer, epoch)
    test(model, device, test_dataloader)
stop = timeit.default_timer()
print('Total time taken: {} seconds'.format(int(stop - start)) )

### Question 5

Modify the network to replace ReLU activations with Sigmoid and report the final test accuracy by running the cell below. (If you are not getting the exact number shown in options, please report the closest number). 

1. 48%
2. 11%
3. 39%
4. 69%

In [10]:
class NetSigmoid(nn.Module):
    def __init__(self):
        super(NetSigmoid, self).__init__()
        #### YOUR CODE STARTS HERE ####

        # define a linear layer with output channels as 1024
        self.linear1 = nn.Linear(784,1024)
        # define a linear layer with output channels as 512
        self.linear2 = nn.Linear(1024,512)
        # define a linear layer with output channels as 256
        self.linear3 = nn.Linear(512,256)
        # define dropout layer with a probability of 0.25
        self.dropout1 = nn.Dropout(p=0.25)
        # define a linear layer with 128 output features
        self.linear4 = nn.Linear(256,128)
        # define a linear layer with output features corresponding to the number of classes in the dataset
        self.linear5 = nn.Linear(128,10)
        #### YOUR CODE ENDS HERE ####

    def forward(self, x):

        #### YOUR CODE STARTS HERE ####
        x = x.view(x.shape[0],-1)
        x= self.linear1(x)
        x= F.sigmoid(x)
        x= self.linear2(x)
        x= F.sigmoid(x)
        x= self.linear3(x)
        x= F.sigmoid(x)
        x= self.dropout1(x)
        x= self.linear4(x)
        x= F.sigmoid(x)
        x= self.linear5(x)
        #### YOUR CODE ENDS HERE ####
        output = F.log_softmax(x, dim=1)
        return output

In [19]:
model = NetSigmoid().to(device)

## Define Adam Optimiser with a learning rate of 0.01
optimizer = optim.Adam(model.parameters(),lr=0.01)

start = timeit.default_timer()

for epoch in range(1, 4):
    train(model, device, train_dataloader, optimizer, epoch)
    test(model, device, test_dataloader)

stop = timeit.default_timer()
print('Total time taken: {} seconds'.format(int(stop - start)) )


Test set: Average loss: 1.8193, Accuracy: 2055/10000 (21%)




Test set: Average loss: 1.8513, Accuracy: 1949/10000 (19%)


Test set: Average loss: 1.9538, Accuracy: 2102/10000 (21%)

Total time taken: 59 seconds


### Question 6

Train the network defined in Question-4 with the same Adam optimizer but change the learning rate to 10. Report the final test accuracy by running the cell below. (If you are not getting the exact number shown in options, please report the closest number). 

1. 89%
2. 97%
3. 22%
4. 10%

In [None]:
model = Net().to(device)

## Define Adam Optimiser with a learning rate of 10
optimizer = optim.Adam(model.parameters(),lr=10)

start = timeit.default_timer()

for epoch in range(1, 4):
    train(model, device, train_dataloader, optimizer, epoch)
    test(model, device, test_dataloader)

stop = timeit.default_timer()
print('Total time taken: {} seconds'.format(int(stop - start)) )

### Question 7

Modify the network  Question-4 `(Net)` to replace ReLU activations with Tanh and initialise the `Linear` layer weights to zero. Report the final test accuracy by running the cell below. (If you are not getting the exact number shown in options, please report the closest number). 

1. 11%
2. 74%
3. 87%
4. 99%

In [16]:
class NetTanh(nn.Module):
    def __init__(self):
        super(NetTanh, self).__init__()
        #### YOUR CODE STARTS HERE ####
        # define a linear layer with output channels as 1024
        self.linear1 = nn.Linear(784,1024)
        # define a linear layer with output channels as 512
        self.linear2 = nn.Linear(1024,512)
        # define a linear layer with output channels as 256
        self.linear3 = nn.Linear(512,256)
        # define dropout layer with a probability of 0.25
        self.dropout1 = nn.Dropout(p=0.25)
        # define a linear layer with 128 output features
        self.linear4 = nn.Linear(256,128)
        # define a linear layer with output features corresponding to the number of classes in the dataset
        self.linear5 = nn.Linear(128,10)
        #### YOUR CODE ENDS HERE ####

  

    def forward(self, x):
        # Use the layers defined above in a sequential way (folow the same as the layer definitions above) and 
        # write the forward pass, after each of linear1, linear2, linear3 and linear4 use a tanh activation.  
        #### YOUR CODE STARTS HERE ####
        x = x.view(x.shape[0],-1)
        x= self.linear1(x)
        x= F.tanh(x)
        x= self.linear2(x)
        x= F.tanh(x)
        x= self.linear3(x)
        x= F.tanh(x)
        x= self.dropout1(x)
        x= self.linear4(x)
        x= F.tanh(x)
        x= self.linear5(x)
        #### YOUR CODE ENDS HERE ####
        output = F.log_softmax(x, dim=1)
        return output


In [17]:
model = NetTanh().to(device)

def init_weights(m):
  #### YOUR CODE STARTS HERE ####
  if isinstance(m, nn.Linear):
        torch.nn.init.zeros_(m.weight)
        #m.bias.data.fill_(0)
  #### YOUR CODE ENDS HERE ####
  

model.apply(init_weights)  
## Define Adam Optimiser with a learning rate of 0.01
optimizer = optim.Adam(model.parameters(),lr=0.01)

start = timeit.default_timer()

for epoch in range(1, 4):
    train(model, device, train_dataloader, optimizer, epoch)
    test(model, device, test_dataloader)

stop = timeit.default_timer()
print('Total time taken: {} seconds'.format(int(stop - start)) )




Test set: Average loss: 2.4010, Accuracy: 1010/10000 (10%)




Test set: Average loss: 2.3739, Accuracy: 982/10000 (10%)


Test set: Average loss: 2.4268, Accuracy: 982/10000 (10%)

Total time taken: 55 seconds


### Question 8

Initialize the network defined in Question-1 `(Net)` with Xavier's initialization ([torch.nn.init.xavier_normal](https://pytorch.org/docs/stable/nn.init.html))(for bias use zero). Train the network with Adam optimizer and report the final test accuracy by running the cell below. (If you are not getting the exact number shown in options, please report the closest number). 


1. 82%
2. 76%
3. 93%
4. 69%

In [15]:
model = Net().to(device)

def init_weights(m):
  #### YOUR CODE STARTS HERE ####
    if isinstance(m, nn.Linear):
        torch.nn.init.xavier_normal_(m.weight)
        m.bias.data.fill_(0)
        
  #### YOUR CODE ENDS HERE #### 

model.apply(init_weights)  
## Define Adam Optimiser with a learning rate of 0.01
optimizer = optim.Adam(model.parameters(),lr=0.01)

start = timeit.default_timer()

for epoch in range(1, 4):
    train(model, device, train_dataloader, optimizer, epoch)
    test(model, device, test_dataloader)

stop = timeit.default_timer()
print('Total time taken: {} seconds'.format(int(stop - start)) )


Test set: Average loss: 0.4401, Accuracy: 9131/10000 (91%)




Test set: Average loss: 0.4676, Accuracy: 8898/10000 (89%)


Test set: Average loss: 0.3747, Accuracy: 9148/10000 (91%)

Total time taken: 54 seconds


# Part-3 (**Optional**)
This section is un-graded and purely for practice. 

Main focus of this part is to help you flex the deep learning muscles built in the above part. You should build a network on the [SVHN dataset](http://ufldl.stanford.edu/housenumbers/). This dataset is similar to MNIST but unlike MNIST, the images are colored and more complex. 

As of writing this, the state-of-the-art(SoTA) performance on this dataset is 98.98%. You can try to start with the simple network we defined above for the MNSIT dataset(with some modification for dealing with different sized colored images unlike MNIST). But to achive the SoTA performance you need to do a lot of hackery. These are list of few things, we would encourage you to try: 

- Use data augmentation wisely. Read and understand how to perform the augmentations listed below. 
    * RandomFlips, Color Jittering
    * Cutout, Cutmix
    * Mixup
    * Auto-augment

- Try to use an image and increase the image size using standard image interpolation techniques. Try using tricks like Progressive resizing of images and see if it helps. 

- After certain number of layers, adding more layer might not be of much help, run experiments on SVHN and see if you observe this. 

- To understand the difficulties in training deeper networks read this paper: [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385)

- To improve the performance on SVHN, try using architectures like [ResNet](https://arxiv.org/abs/1512.03385), [DesnseNet](https://arxiv.org/abs/1608.06993) or [EfficientNet](https://arxiv.org/abs/1905.11946). Most of these architectures are available by default in PyTorch.
