# Implementing a convolutional layer

#### General instructions

Each week you will be given an assignment related to the associated module. You have roughly one week to complete and submit each of them. There are 3 weekly group sessions available to help you complete the assignments. Attendance is not mandatory but recommended. However, assignments are graded each week and not submitting them or submitting them after the deadline will give you no points

**FORMAT**: Jupyter notebook  
**DEADLINE**: Sunday 28th February, 23:59

## Introduction

The objective of this assignment is get an in-depth understanding of the convolutional layer as it is a crucial layer in Deep Neural Networks, especially when applied to image dataset. In order to do so, we will simply implement our own custom convolutional layer ``MyConv2d`` and make sure that we get the same results than if we used [nn.Conv2d](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html?highlight=conv2d#torch.nn.Conv2d).


## Contents:

1. Utils
2. Implement a custom layer in Pytorch: ``MyConv2d``
3. Use ``MyConv2d`` inside a neural network model
4. Test ``MyConv2d`` by comparing it to [nn.Conv2d](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html?highlight=conv2d#torch.nn.Conv2d)

## Andrew's Videos related to today's assignment

- [C4W1L02 Edge Detection Examples](https://www.youtube.com/watch?v=XuD4C8vJzEQ&list=PLkDaE6sCZn6Gl29AoE31iwdVwSG-KnDzF&index=2)
- [C4W1L04 Padding](https://www.youtube.com/watch?v=smHa2442Ah4&list=PLkDaE6sCZn6Gl29AoE31iwdVwSG-KnDzF&index=4)
- [C4W1L05 Strided Convolutions](https://www.youtube.com/watch?v=tQYZaDn_kSg&list=PLkDaE6sCZn6Gl29AoE31iwdVwSG-KnDzF&index=5)
- [C4W1L06 Convolutions Over Volumes](https://www.youtube.com/watch?v=KTB_OFoAQcc&list=PLkDaE6sCZn6Gl29AoE31iwdVwSG-KnDzF&index=6)
- [C4W1L07 One Layer of a Convolutional Net](https://www.youtube.com/watch?v=jPOAS7uCODQ&list=PLkDaE6sCZn6Gl29AoE31iwdVwSG-KnDzF&index=7)



In [1]:
import sys
import torch
from torch.optim import Optimizer
from torch import nn, optim
import torch.nn.functional as F
from torchvision import datasets, transforms
import datetime
import numpy as np
import collections
from typing import Sequence


## 1. Utils

Nothing to see in the cell below, just the definition of 3 functions we'll need later, you don't even need to read them, just know that there is:

- ``load_MNIST``:
- ``training_loop``
- ``int_to_pair`` : Return `(n, n)` if `n` is a int or `n` if `n` is already a tuple of length 2

In [2]:
device = torch.device('cpu')
print(f"Device {device}.")



def load_MNIST(data_path='../data/'):
    """
    Return MNIST train and val dataset
    """
    MNIST_train = datasets.MNIST(
        data_path,       
        train=True,      
        download=True,   
        transform=transforms.Compose([
            transforms.CenterCrop(20),
            transforms.Grayscale(),
            transforms.ToTensor(),
        ]))

    MNIST_val = datasets.CIFAR10(
        data_path, 
        train=False,      
        download=True,   
        transform=transforms.Compose([
            transforms.CenterCrop(20),
            transforms.Grayscale(),
            transforms.ToTensor(),
            
        ]))

    print('Size of the training dataset: ', len(MNIST_train))
    print('Size of the validation dataset: ', len(MNIST_val))

    return MNIST_train, MNIST_val


def training_loop(n_epochs, optimizer, model, loss_fn, train_loader):
    """
    Train our model and save weight values
    """
    model.train()
    for epoch in range(1, n_epochs + 1):
        loss_train = 0.0
        for imgs, labels in train_loader:
            imgs = imgs.to(device=device) 
            labels = labels.to(device=device)

            outputs = model(imgs)
            loss = loss_fn(outputs, labels)
            loss.backward()
            optimizer.step()

            # Here we store weight values at each step of the training process (see also last cell and MyNet definition)
            model.conv1_weight_values.append(model.conv1.weight.data.clone().detach())
            if model.conv1.bias is not None:
                model.conv1_bias_values.append(model.conv1.bias.data.clone().detach())
            
            optimizer.zero_grad()    
            
            loss_train += loss.item()

        print('{}  |  Epoch {}  |  Training loss {:.3f}'.format(
            datetime.datetime.now(), epoch,
            loss_train / len(train_loader)))

def int_to_pair(n):
    """
    Return `(n, n)` if `n` is a int or `n` if `n` is already a tuple of length 2
    """
    # If n is a float or integer
    if not isinstance(n, Sequence):
        return (int(n), int(n))
    elif len(n) == 1:
        return (int(n[0]), int(n[0]))
    elif len(n) == 2:
        return ( int(n[0]), int(n[1]) )
    else:
        raise ValueError("Please give an int or a pair of int")

MNIST_train, MNIST_val = load_MNIST()


Device cpu.
Files already downloaded and verified
Size of the training dataset:  60000
Size of the validation dataset:  10000


## 2. Implement a custom layer in Pytorch: MyConv2d 

In the cell below, there is a template of a ``MyConv2d`` class that would re-create a [nn.Conv2d](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html?highlight=conv2d#torch.nn.Conv2d) layer. By solving the 4 problems below you will complete this class step by step.

First of all, defining a custom layer in PyTorch is very similar to defining a custom neural network or a custom block of layers as they all require to subclass the [nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html?highlight=nn%20module#torch.nn.Module) class. (See the 3rd tutorial ``03 - Define a custom deep Neural Network in Pytorch`` for more details about defining your custom neural network). So as usual, we have to create a class that subclasses nn.Module and that implements a [forward](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.forward) method defining what happens to a given input.


## TODO:

### 1. Compute the output shape of a convolutional layer

In pytorch's documentation, you will often encounter these notations:

- ``N``: batch size,         (how many inputs do you feed at the same time)
- ``C``: number of channels, (refers to the colors RGB=3, RGBA=4, etc if we are talking about the input or refers to the number of filter if we are talking about the output of a convolutional layers)
- ``H``: height of the image
- ``W``: width of the image


Take a look at the [nn.Conv2d](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html?highlight=conv2d#torch.nn.Conv2d) documentation and scroll down to examine the output shape formula. 

You can also watch Andrew's video  [C4W1L07 One Layer of a Convolutional Net](https://www.youtube.com/watch?v=jPOAS7uCODQ&list=PLkDaE6sCZn6Gl29AoE31iwdVwSG-KnDzF&index=7) to get an illustration of this formula 

Write a method ``__get_output_size(self, x)`` (so inside the ``MyConv2d``  class and with ``x`` being an input batch of dimension ``(N, C_in, H_in, W_in)``) that returns the output shape ``(N, C_out, H_out, W_out)`` (following Pytorch's notations) of a convolutional layer whose structure would be defined according to:

- ``N = x.shape[0]``
- ``C_in = self.in_channels``
- ``H_in = x.shape[-2]``
- ``W_in = x.shape[-1]``
- ``kernel_size = self.kernel_size`` (Note: it's a tuple (kernel_size_height, kernel_size_width))
- ``padding = self.padding`` (Note: it's a tuple (padding_height, padding_width))
- ``stride = self.stride`` (Note: it's a tuple (stride_height, stride_width))

### 2. Apply padding to an image

In this video [C4W1L04 Padding](https://www.youtube.com/watch?v=smHa2442Ah4&list=PLkDaE6sCZn6Gl29AoE31iwdVwSG-KnDzF&index=4) you saw how to apply padding to an image

Write a method ``__apply_padding(self, x)`` (so inside the ``MyConv2d``  class and with ``x`` being an input batch of dimension ``(N, C_in, H_in, W_in)``) that applies padding to ``x``, i.e. it returns a tensor ``x_pad`` whose center values are the same as ``x`` values but with extra zeros on the border (the numbers of zeros to add are defined by ``self.padding``) 

**Note** the output value is then of dimension ``(N, C_in, H_in + 2*self.padding[0], W_in + 2*self.padding[1])``

### 3. Apply Convolution to an image

You saw how to apply convolution to an image, in the following video:

- [C4W1L02 Edge Detection Examples](https://www.youtube.com/watch?v=XuD4C8vJzEQ&list=PLkDaE6sCZn6Gl29AoE31iwdVwSG-KnDzF&index=2)
- [C4W1L05 Strided Convolutions](https://www.youtube.com/watch?v=tQYZaDn_kSg&list=PLkDaE6sCZn6Gl29AoE31iwdVwSG-KnDzF&index=5)
- [C4W1L06 Convolutions Over Volumes](https://www.youtube.com/watch?v=KTB_OFoAQcc&list=PLkDaE6sCZn6Gl29AoE31iwdVwSG-KnDzF&index=6)
- [C4W1L07 One Layer of a Convolutional Net](https://www.youtube.com/watch?v=jPOAS7uCODQ&list=PLkDaE6sCZn6Gl29AoE31iwdVwSG-KnDzF&index=7)

Write a method ``__apply_conv(self, x_pad, i, j)`` (so inside the ``MyConv2d``  class and with ``x_pad`` of dimension ``(N, C_in, H_in + 2*self.padding[0], W_in + 2*self.padding[1])``) that computes computes the ``[i,j]`` output of the convolutional operation applied to ``x_pad``. Note that since we are to follow Pytorch's implementation of convolution, we can choose to have a bias or not. So in your ``__apply_conv`` there must be somewhere a condition ``if self.bias is None: ...... else: ....... ``

**Note**: A few word about vectorization: In python you should always favour vectorized computations because it is much faster. BUT here the most important thing is that you get the right result. So in your  ``__apply_conv`` method you can choose between

- Vectorize computations with respect to the number of filter (``C_out`` dimension) and with respect to the batch size (``N`` dimension).
- Don't vectorize and have some for loops either inside your ``__apply_conv`` method or your ``forward`` method (next question)
Do as you wish.

### 4. Implementing a Convolutional layer (forward method)

In the [forward](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.forward) method, combine your previously defined methods to:

1. Figure out the output shape expected by calling ``self.__get_output_size``
2. Apply padding to ``x`` by calling ``self.__apply_padding`` and store the result in ``x_pad``
3. Instantiate an ``out`` tensor with the right shape
4. Compute and store in ``out`` the result of the convolution operation by calling ``self.__apply_conv`` (depending on point 3. you might have to have some for loops)
5. Return ``out``

In [3]:
class MyConv2d(nn.Module):
    """
    Custom convolutional 2d layer
    """
    
    def __init__(
        self, 
        in_channels:int,
        out_channels:int,
        kernel_size, 
        stride = 1, 
        padding = 0, 
        bias:bool = False,
    ):
        """
        in_channels: Number of input channels 
        out_channels: Number of input channels (number of filters)
        kernel_size: Size of your filter (either a int or a tuple of int)
        stride: Length of the kernel's jumps  (either a int or a tuple of int)
        padding: How many pixels do you add around the border of your image (either a int or a tuple of int)
        bias: Should we add a bias parameter to each filter or not?
        """

        super().__init__()
        # Will NOT be automatically added to the list
        # of trainable parameter (see doc of nn.Parameter)
        self.in_channels = int(in_channels)
        self.out_channels = int(out_channels)
        self.padding = int_to_pair(padding)
        self.stride = int_to_pair(stride)
        self.kernel_size = int_to_pair(kernel_size)

        # Will be automatically added to the list of 
        # model's trainable parameters (see doc quoted above)
        # Dim = (C_out, C_in, kernel_height, kernel_width)
        self.weight = nn.Parameter(torch.Tensor(1, self.out_channels, self.in_channels, self.kernel_size[0], self.kernel_size[1]))
        # Initialize weight (bad initialization here but we'll re-initialize them later anyway)
        self.weight.data = torch.zeros((self.out_channels, self.in_channels, self.kernel_size[0], self.kernel_size[1]))
        if bias:
            self.bias = nn.Parameter(torch.Tensor(self.out_channels))
            # Initialize bias (bad initialization here but we'll re-initialize them later anyway)
            self.bias.data = torch.zeros((self.out_channels))
        else:
            self.bias = None
            #print(self.bias) # Uncomment this to make sure that the printed output starts with  "Parameter containing: ..."
        #print(self.weight) # Uncomment this to make sure that the printed output starts with  "Parameter containing: ..."


    def __get_output_size(self, x):
        # shape of x: (N, C_in, H_in, W_in)
        # Note: batch_size is the only dimension that is not influenced
        # by the convolution operation
        return x.shape[0], self.out_channels, int(np.floor(((x.shape[-2] + 2*self.padding[0] - self.kernel_size[0])/self.stride[0]) + 1)), int(np.floor(((x.shape[-1] + 2*self.padding[1] - self.kernel_size[1])/self.stride[1]) + 1))

    def __apply_padding(self, x):
        return torch.tensor([[[[0 if w-self.padding[1] <= 0 or h-self.padding[0] <= 0 or w >= (x.shape[-1]+2*self.padding[1]-self.padding[1]) or h >= x.shape[-2]+2*self.padding[0]-self.padding[0] else x[n, c, h-self.padding[0], w-self.padding[1]] for w in range(x.shape[-1]+2*self.padding[1])] for h in range(x.shape[-2]+2*self.padding[0])] for c in range(self.in_channels)]for n in range(x.shape[0])])




    def __apply_conv(self, x_pad, i, j):
        t = torch.multiply(torch.narrow(torch.narrow(x_pad , -2, j, self.kernel_size[0]), -1, i, self.kernel_size[1]), self.weight)
        if self.bias is None:
            return torch.tensor([torch.sum(t[i]) for i in range(self.out_channels)])
        return torch.tensor([torch.sum(t[i]) + self.bias[i] for i in range(self.out_channels)])

        
    def forward(self, x):
        """
        Required method for any nn.Module class
        """
        # shape of x: (N, C_in, H_in, W_in)
        self.output_size = self.__get_output_size(x)
        x_pad = self.__apply_padding(x)
        out = torch.zeros(self.output_size)
        for n in range(0, self.output_size[0]):
            for h in range(0, self.output_size[2], self.stride[0]):
                for w in range(0, self.output_size[3], self.stride[1]):
                    convolution = self.__apply_conv(x_pad[n], w, h)
                    for c in range(self.out_channels):
                        out[n, c, h, w] = convolution[c]
        return out

    def __str__(self):
        """
        Standard python method to implement if you want to custom your ``print(MyConv2d(...))``

        This method is not mandotory, I just wrote it so that you can print your convolutional layer
        the same way it is printed when you use a Conv2d layer
        """
        string = (
            "MyConv2d(" + str(self.in_channels) + ", " + str(self.out_channels)
            +", kernel size=" + str(self.kernel_size) + ", stride=" + str(self.stride)
            +", padding=" + str(self.padding) + ", bias=" +str(self.bias)
        )
        return string


In [4]:
# x = torch.zeros(5, 3, 6, 6)
# for n in range(5):
#     for c in range(3):
#         for h in range(1, 5):
#             for w in range(1, 5):
#                 x[n, c, h, w] = np.random.randint(1, 10, 1)[0]
# k = torch.zeros(2, 3, 3, 3)
# for c in range(3):
#     for h in range(3):
#         for w in range(0, 3, 2):
#             if w == 0:
#                 k[0, c, h, w] = 1
#                 k[1, c, w, h] = 1
#             else:
#                 k[0, c, h, w] = -1
#                 k[1, c, w, h] = -1
# 
# print(k)
# print(x)


In [5]:
def __apply_padding(x, padding, in_channels):
    return torch.tensor([[[[0 if w == 0 or h == 0 or w == x.shape[-1]+2*padding[1]-1 or h == x.shape[-2]+2*padding[0]-1 else x[n, c, h-1, w-1] for w in range(x.shape[-1]+2*padding[1])] for h in range(x.shape[-2]+2*padding[0])] for c in range(in_channels)]for n in range(x.shape[0])])
# x = torch.ones(5, 3, 6, 6)
# print(__apply_padding(x, (1,1), 3))

In [6]:
def test_conv(x_pad, i, j, weight):
    sub = torch.narrow(torch.narrow(x_pad , -2, j, 3), -1, i, 3)
    print(sub)
    t = torch.multiply(torch.narrow(torch.narrow(x_pad , -2, j, 3), -1, i, 3), weight)
    print(t.shape)
    return [torch.sum(t[i]) for i in range(t.shape[0])]


# print("narrow")
# print(torch.narrow(x , 2, 0, 3))
# print("narrow")
# print(torch.narrow(torch.narrow(x , -2, 0, 3), -1, 0, 3))
# print(test_conv(x, 0, 0, k))

In [7]:
# practice_loader = torch.utils.data.DataLoader(MNIST_train, batch_size=1, shuffle=False)
# c_in = 1
# c_out = 2
# kernel = (3,4)
# stride = (2,1)
# padding = (1,2)
# my_conv = MyConv2d(c_in, c_out, kernel_size=kernel, stride=stride, padding=padding, bias=False)
# nn.init.uniform_(my_conv.weight.data, -1, 1)
# n = next(iter(practice_loader))[0]

In [8]:
# out = my_conv(n)
# print(out)



## 3. Use MyConv2d inside a neural network model

Just a very basic neural network so that we can test our conv layer on MNIST data. It consists of: 

- One 2d Convolutional layer (ours or Pytorch's) (see ``conv_type`` parameter)
- A tanh activation function
- One 2d MaxPooling layer to reduce the size of the image and therefore reduce the number of parameters
- One Fully Connected layer with 10 outputs for the 10 classes of the MNIST dataset

In [9]:
class MyNet(nn.Module):
    """
    Simple net with only one conv layer and one fc layer. conv layer can be ours or Pytorch's. 
    """

    def __init__(
        self,
        in_channels:int,
        out_channels:int,
        kernel_size = 3, 
        stride = 1, 
        padding = 0, 
        bias:bool = False,
        conv_type:str = 'custom',
    ):
        """
        in_channels: Number of input channels 
        out_channels: Number of input channels (number of filters)
        kernel_size: Size of your filter (either a int or a tuple of int)
        stride: Length of the kernel's jumps  (either a int or a tuple of int)
        padding: How many pixels do you add around the border of your image (either a int or a tuple of int)
        bias: Should we add a bias parameter to each filter or not?
        conv_type: Should we use MyConv2d or nn.Conv2d? 
        """
        
        super().__init__()
        # Make sure these parameters are all pairs of int
        kernel_size = int_to_pair(kernel_size)
        stride = int_to_pair(stride)
        padding = int_to_pair(padding)
        # Use MyConv2d
        if conv_type == 'custom':
            self.conv1 = MyConv2d(in_channels, out_channels, kernel_size=kernel_size, stride=stride, padding=padding, bias=bias)
        # Use pytorch's Conv2d
        else:
            self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=kernel_size, stride=stride, padding=padding, bias=bias)
        # Formulas for output shape (Hope you won't look at that before answering the first question in the cell above hehe)
        H_out = (20 + 2*padding[0] - kernel_size[0]) // stride[0] + 1 
        W_out = (20 + 2*padding[1] - kernel_size[1]) // stride[1] + 1 
        
        # We divide by 2 here because we know that we will apply a pooling layer
        self.fc2 = nn.Linear((H_out//2)*(W_out//2)*out_channels, 10)

        # Where we will store all the weight values so that we can compare between Conv2d and MyConv2d
        self.conv1_weight_values = []
        self.conv1_bias_values = []

    def forward(self, x):
        # x shape: (batch_size, C_in, H, W)
        out = F.max_pool2d(F.tanh(self.conv1(x)), 2)
        out = out.view(-1, out.shape[-3]*out.shape[-2]*out.shape[-1])
        out = self.fc2(out)
        return out

## 4. Test MyConv2d by comparing it to nn.Conv2d

We compare:

- The successive weight values throughout the training (after each batch)
- The successive training loss (after each epoch)

We absolutely don't care about the model performance itself nor if the model overfits, underfits etc. We just want to test our custom conv layer

**NOTE**

- Your implementation will necessarily be much slower than Pytorch's implementation. That's normal. To give you an idea, my implementation is 3-4 times slower than Pytorch's
- As reminded below: You can play with the following parameters if you want, especially for debugging purpose **BUT BEFORE SUBMITTING YOUR NOTEBOOK:**
  - set ``n_epochs`` to ``4`` 
  - set ``c_out`` to something ``>= 2`` before submitting your (run) notebook 
  - set ``kernel`` to something like ``(n1, n2)`` with ``n1`` not equal to ``n2`` before submitting your (run) notebook 
  - set ``stride`` to something like ``(n3, n4)`` with ``n3`` not equal to ``n4`` before submitting your (run) notebook 
  - set ``padding`` to something like ``(n5, n6)`` with ``n5`` not equal to ``n6`` before submitting your (run) notebook 
  - set ``bias`` to ``True`` before submitting your (run) notebook 


## TODO

1. According to you, why is your implementation slower? (There could be multiple reasons)

In [10]:
# DONT USE GPU!!! IT WOULD REQUIRE USING register_buffer TO MOVE THE MODEL CORRECTLY TO
# THE GPU. (See https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.register_buffer)
# And you probably don't want to spend the entire week learning how to use this properly
device = torch.device('cpu')
print(f"Training on device {device}.")

# We set shuffle to False so that we can compare both models more accurately
train_loader = torch.utils.data.DataLoader(MNIST_train, batch_size=32, shuffle=False)
loss_fn = nn.CrossEntropyLoss()

# We don't care about these parameters here in this assignment
n_epochs = 4    # Just high enough to make sure weight values don't diverge (you can temporarily reduce this number for debugging purpose)
lr = 0.1
c_in = 1        # Grey scale images so c_in = 1

# You can play with the following parameters if you want, especially for debugging purpose BUT BEFORE SUBMITTING YOUR NOTEBOOK:
# - set c_out to something >= 2 before submitting your notebook
# - set kernel to something like (n1, n2) with n1 not equal to n2
# - set stride to something like (n3, n4) with n3 not equal to n4
# - set padding to something like (n5, n6) with n5 not equal to n6
# - set bias to True
c_out = 2
kernel = (3,4)
stride = (2,1)
padding = (1,2)
bias = True 

model01 = MyNet(c_in, c_out, kernel_size=kernel, padding=padding, stride=stride, bias=bias, conv_type='custom').to(device=device)
model02 = MyNet(c_in, c_out, kernel_size=kernel, padding=padding, stride=stride, bias=bias, conv_type='pytorch').to(device=device) 

# Initialize all our weights
nn.init.uniform_(model02.conv1.weight.data, -1, 1) 
if model02.conv1.bias is not None:
    nn.init.uniform_(model02.conv1.bias.data, -1, 1) 
nn.init.uniform_(model02.fc2.weight.data, -1, 1)
nn.init.uniform_(model02.fc2.bias.data, -1, 1) 

# Make sure both models start with the same weights
model01.conv1.weight.data = model02.conv1.weight.data.clone()
if model01.conv1.bias is not None:
    model01.conv1.bias.data = model02.conv1.bias.data.clone()
model01.fc2.bias.data = model02.fc2.bias.data.clone()
model01.fc2.weight.data = model02.fc2.weight.data.clone()

# Keep track of the weight and bias values throughout the training (see also training loop and MyNet)
model01.conv1_weight_values.append(model01.conv1.weight.data.clone().detach())
if model01.conv1.bias is not None:
    model01.conv1_bias_values.append(model01.conv1.bias.data.clone().detach())
model02.conv1_weight_values.append(model02.conv1.weight.data.clone().detach())
if model02.conv1.bias is not None:
    model02.conv1_bias_values.append(model02.conv1.bias.data.clone().detach())

print("\n ========= Training using our Conv2d =========")

optimizer = optim.SGD(model01.parameters(), lr=lr)
training_loop(
    n_epochs = n_epochs,
    optimizer = optimizer, 
    model = model01,
    loss_fn = loss_fn,
    train_loader = train_loader,
)

print("\n ========= Training using Pytorch's Conv2d =========")


optimizer = optim.SGD(model02.parameters(), lr=lr)
training_loop(
    n_epochs = n_epochs,
    optimizer = optimizer, 
    model = model02,
    loss_fn = loss_fn,
    train_loader = train_loader,
)


print("\n ======= MSE:     Pytorch's Conv2d    VS    MyConv2d   =========")
print("MSE weight:  ", np.mean([ float(torch.sum( (model02.conv1_weight_values[i] - model01.conv1_weight_values[i])**2 )) for i in range(len(model02.conv1_weight_values))])  )
if bias:
    print("MSE bias:    ", np.mean([ float(torch.sum( (model02.conv1_bias_values[i] - model01.conv1_bias_values[i])**2 )) for i in range(len(model02.conv1_bias_values))])  )

Training on device cpu.





KeyboardInterrupt: 

In [None]:
print(model01.conv1_bias_values)
print(model01.conv1_weight_values)