In [1]:
#@title ## Mount Your Google Drive

#@markdown The next two cells are **magic** cells.
#@markdown They look like text cells, but they run code behind the scenes.
#@markdown You can run them by either clicking on the ▶️ button (to the left of the cell), or by clicking on the cell and typing `Ctrl+Enter` (or `Shift+Enter`).

#@markdown Please run this cell and follow the steps printed after running it. Specifically, it will print a URL you should enter, follow the instructions there and paste the code in the textbox below (and type `Enter`).

from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [2]:
#@title ## Map Your Directory
import os

def check_assignment(assignment_dir, files_list):
  files_in_dir = set(os.listdir(assignment_dir))
  for fname in files_list:
    if fname not in files_in_dir:
      raise FileNotFoundError(f'could not find file: {fname} in assignment_dir')

assignment_dest = "/content/hw2"
assignment_dir = "/content/gdrive/MyDrive/DL4CV/hw2"  #@param{type:"string"}
assignment_files = ['hw2.ipynb', 'autograd.py', 'functional.py', 'nn.py', 'optim.py',
                    'models.py', 'models_torch.py', 'train.py', 'train_torch.py', 'utils.py',
                    'test_autograd.py', 'test_functional.py', 'test_nn.py', 'test_optim.py']

# check Google Drive is mounted
if not os.path.isdir("/content/gdrive"):
  raise FileNotFoundError("Your Google Drive isn't mounted. Please run the above cell.")

# check all files there
check_assignment(assignment_dir, assignment_files)

# create symbolic link
!rm -f {assignment_dest}
!ln -s "{assignment_dir}" "{assignment_dest}"
print(f'Succesfully mapped (ln -s) "{assignment_dest}" -> "{assignment_dir}"')

# cd to linked dir
%cd -q {assignment_dest}
print(f'Succesfully changed directory (cd) to "{assignment_dest}"')
#@markdown Set the path `assignment_dir` to the assignment directory in your Google Drive and run this cell.

#@markdown If you are not sure what is the path, you can use the **Files (📁)** menu (on the left side) to check the path.

Succesfully mapped (ln -s) "/content/hw2" -> "/content/gdrive/MyDrive/DL4CV/hw2"
Succesfully changed directory (cd) to "/content/hw2"


## Imports and `autoreload`-Magic
Please run the cell below (only once) to load and set the `autoreload` magic, which automatically reloads the import calls to the python files with your solutions. That means that you can edit the files (in the right-side window), save them (`Ctrl+S`) and just re-run the relevant cells -- the new code will kick in automatically.

**Note:** You **MUST NOT** install any package. If you can't load something, you probably didn't follow the instructions (either didn't uploaded all the files, didn't mounted your Google driver or didn't mapped your directory).

**Note:** The exercise works as is. If you add or modify imports to things, it may break thing in the notebook. You may do so **AT YOUR OWN RISK**. We will not assist with issues in notebook with modified imports.

**Note:** Make sure you run **all the cells** up to the point. Some cells depends on previous cells (mainly imports). Furthermore, make sure to run the cell below (with the autoreload magic) before any cell below it.

In [3]:
import torch

%load_ext autoreload
%autoreload 2

# (A) Implement Components for Deep Neural Network From Scratch

In This part you will implement a deep neural network from scratch, including the necessary building blocks. You will implement it in the following order:

1. **Differentiable Functions:** a set of differentiable functions that are used as atomic building blocks.
2. **Autograd's backward:** the back-propagation `backward` method.
3. **Learnable Layers:** the Linear layer.
4. **Optimizer:** the SGD optimizer which will be used for training.


## (A.1) Differentiable Functions

In this section you will implement a set of differentiable functions from scratch. For each function, you will implement the forward and backward methods. After the description of the method, there is a testing cell which we will test the correctness of your code.

The skeletons of the differential functions to implement are in the `functional.py` file. Open this file by clicking on this link: `/content/hw2/functional.py`. Alternatively, you can go the left menu, click on **Files (📁)**, go to the directory `hw2` (or `content/hw2`) and double-click on `functional.py` to open it. The tests can be found in `test_functional.py` (link: `/content/hw2/test_functional.py`).

In each step you should fill the blanks (between `# BEGIN SOLUTION` and `# END SOLUTION`) in the relevant methods. DO NOT change any other code segments. You are provided with a cell to run the tests, and with a cell to debug your code (with the relevant imports). As a reminder, this notebook uses the `autoreload` magic which automatically reloads the imported `.py` files (just make sure you save these file with `Ctrl+S`).

### `ctx`
In the "from scratch" implementation, you should use a `ctx` (context) variable. This variable is a simplified version of the computation graph, and is needed for the back propagation algorithm.

Specifically, `ctx` is just a list (or stack) of "backward calls", where each "backward call" is a pair (list/tuple) of two objects:

1. **`backward_fn`:** The backward function. A reference to the backward function to be called in the backward pass.
2. **`args`:** A list (or tuple) of arguments to be passed to `backward_fn`. This list usually consists of the inputs and the outputs of the forward function. Sometimes additional arguments are passed as well. It's important to pass the actual inputs and outputs (same pointer), otherwise it would break the chain of gradients propagation.

The "backward calls" in `ctx` should be ordered in according to the time of addition. That is, a backward call that was added later should have an higher index in the list `ctx`. If `ctx` is `None`, it means that gradients (i.e. backward calls) should not be tracked.

You will use `ctx` in the backward pass in section (A.2). You can read it now to get a little context (pun intended).

**Note:** You are given an example of the forward and backward implementation of `mean`. You should read and understand how new backward calls are appended to `ctx`, and use this pattern in your solutions.

**Note:** When new tensors are created (using `zeros`, `ones`, `rand`, etc.), it's important to make sure they are on the same device (and has the correct `dtype`) as tensors they would be used together with (compared to, multiplied by, etc.). You may find the functions `torch.X_like` and `Tensor.new_X` handy.

### (A.1.1) Implement the Linear Function

Here you will implement a differentiable `linear` function. This includes the forward `linear` function and the backward `linear_backward` function.

#### `linear`
The `linear` function receives three arguments (in addition to the autograd context `ctx`):

  * `x`: The batched input. Has shape `(batch_size, in_dim)`.
  * `w`: The weight matrix. Has shape `(out_dim, in_dim)`.
  * `b`: The bias term. Has shape `(out_dim,)`.

It computes the (batched version of the) function: $$ \mathbf{y} = W \mathbf{x} + \mathbf{b} $$
The output `y` should have shape `(batch_size, out_dim)`.

#### `linear_backward`
The `linear_backward` function receives four arguments:

  * `y`: The batched output. Has shape `(batch_size, out_dim)`.
  * `x`: The batched input. Has shape `(batch_size, in_dim)`.
  * `w`: The weight matrix. Has shape `(out_dim, in_dim)`.
  * `b`: The bias term. Has shape `(out_dim,)`.

It computes the gradients of `x`, `w` and `b` w.r.t the loss, given the gradient of `y` (in `y.grad`) w.r.t the loss, and accumulates these gradients in `x.grad`, `w.grad` and `b.grad`, respectively.

---
You should test your solution by running the following cell. You can debug your solution in the cell below it.

In [4]:
!python -m unittest test_functional.TestLinear

......
----------------------------------------------------------------------
Ran 6 tests in 0.309s

OK


In [5]:
# Playground for debugging linear
from functional import linear, linear_backward

### (A.1.2) Implement the ReLU Activation

Here you will implement a differentiable `relu` activation. This includes the forward `relu` function and the backward `relu_backward` function.

#### `relu`
The `relu` function receives one argument (in addition to the autograd context `ctx`):

  * `x`: The input. Has an arbitrary shape.

It computes the (element-wise) function:
$$ y = \max(x, 0) $$
The output `y` should have the same shape as `x`.

#### `relu_backward`
The `relu_backward` function receives two arguments:

  * `y`: The output. Has the same shape as `x`.
  * `x`: The input. Has an arbitrary shape.

It computes the gradients of `x` w.r.t the loss, given the gradient of `y` (in `y.grad`) w.r.t the loss, and accumulates this gradient in `x.grad`.

---
You should test your solution by running the following cell. You can debug your solution in the cell below it.

In [6]:
!python -m unittest test_functional.TestReLU

.......
----------------------------------------------------------------------
Ran 7 tests in 0.009s

OK


In [7]:
# Playground for debugging relu
from functional import relu, relu_backward

### (A.1.3) Implement the Softmax Activation

Here you will implement a differentiable `softmax` activation. This includes the forward `softmax` function and the backward `softmax_backward` function.

**Note:** Similarly to homework assignment #1, your solution should be numerically stable.

#### `softmax`
The `softmax` function receives one argument (in addition to the autograd context `ctx`):

  * `x`: The batched input. Has shape `(batch_size, num_classes)`.

It computes the (batched version of the) function: $$ \mathbf{y}_i = \frac{e^{\mathbf{x}_i}}{\sum_j{e^{\mathbf{x}_j}}} $$
The output `y` should have the shape `(batch_size, num_classes)`. Each row in `y` should be a probability distribution over the classes.


#### `softmax_backward`
The `softmax_backward` function receives two arguments:

  * `y`: The batched output. Has shape `(batch_size, num_classes)`.
  * `x`: The batched input. Has shape `(batch_size, num_classes)`.

It computes the gradients of `x` w.r.t the loss, given the gradient of `y` (in `y.grad`) w.r.t the loss, and accumulates this gradient in `x.grad`.

---
You should test your solution by running the following cell. You can debug your solution in the cell below it.

In [8]:
!python -m unittest test_functional.TestSoftmax

.......
----------------------------------------------------------------------
Ran 7 tests in 0.026s

OK


In [9]:
# Playground for debugging softmax
from functional import softmax, softmax_backward

### (A.1.4) Implement the Cross-Entropy Loss

Here you will implement a differentiable `cross_entropy` activation. This includes the forward `cross_entropy` function and the backward `cross_entropy_backward` function.

**Note:** Similarly to homework assignment #1, your solution should be numerically stable.

**Note:** The signature of this function differs from PyTorch's `F.cross_entropy`. The function you should implement doesn't "reduce" (i.e. averages over) the batch (similarly to `F.cross_entropy(..., reduction='none')`). Furthermore, while `F.cross_entropy` receives the predictions **before** `softmax`, the function you should implement receives the predictions **after** `softmax`. We provide you the `cross_entropy_loss` which uses your implementation of `softmax` and `cross_entropy`, and has the same API as `F.cross_entropy`.

#### `cross_entropy`
The `cross_entropy` function receives two arguments (in addition to the autograd context `ctx`):

  * `pred`: The predicted _probabilities_. Has shape `(batch_size, num_classes)`. Each row is a probability distribution (non-negative values; sums to 1).
  * `target`: The batched correct labels. Has type of `torch.long` (integer values), and has shape `(batch_size,)`. Its values are between `0` and `num_classes - 1` (inclusive).

It computes the (batched version of the) function:
$$ \text{CE}(\hat{\mathbf{y}}, \ell)_i = -\log(\hat{\mathbf{y}}_i) \cdot \delta_{i,\ell} $$
Where $\hat{\mathbf{y}}$ (also called `pred` or `y_hat`) is the predicted probability measure over the classes and $\ell$ (also called `target` or `y`) is the target class label.

The output `loss` should have the shape `(batch_size,)`. Each row in `loss` should be the cross-entropy loss of that entry in the batch.

#### `cross_entropy_backward`
The `cross_entropy_backward` function receives three arguments:

  * `loss`: The batched loss. Has shape `(batch_size,)`.
  * `pred`: The batched predicted _probabilities_. Has shape `(batch_size, num_classes)`. Each row is a probability distribution (non-negative values; sums to 1).
  * `target`: The batched correct labels. Has type of `torch.long` (integer values), and has shape `(batch_size,)`. Its values are between `0` and `num_classes - 1` (inclusive).

It computes the gradients of `pred` w.r.t the (final scalar) loss, given the gradient of (batched) `loss` (in `loss.grad`) w.r.t the loss, and accumulates this gradient in `pred.grad`.

#### `cross_entropy_loss`
This function is provided for your use. It calls `softmax` to compute the probability distribution over the labels, then `cross_entropy` to computed the batched loss, and later `mean` to reduce it into a scalar loss (that can be used as the origin of gradients; see next part). You should NOT modify this method, and may use it later on.

**Note:** Please see how three differentiable functions (`softmax`, `cross_entropy` and `mean`) are chained to create a new differentiable function, without explicitly implementing its backward pass. You will chain differentiable functions to create a model in section (B).

---
You should test your solution by running the following cell. You can debug your solution in the cell below it.


In [10]:
!python -m unittest test_functional.TestCrossEntropy

.......
----------------------------------------------------------------------
Ran 7 tests in 0.036s

OK


In [11]:
# Playground for debugging cross_entropy
from functional import cross_entropy, cross_entropy_backward
from functional import cross_entropy_loss

## (A.2) Autograd

In this section you will implement a general `backward` method from scratch. This method stands at the core of back-propagation and autograd differentiation.

This method receives two arguments:

* `loss`: The loss tensor. This tensor must be a scalar (Has shape `()`). The loss the other tensors will be computed w.r.t this `loss`.
* `ctx`: The autograd context. A list of backward calls. These backward calls should be evaluated to back-propagate the gradient from `loss` to the tensors used in the computation of `loss`.

This method has two main steps:

* Setting the gradient of `loss` (to what?).
* Propagating the gradients backward using the computation history in `ctx` (how?).

The skeleton of the `backward` method is in the `autograd.py` file (link: `/content/hw2/autograd.py`). The tests can be found in `test_autograd.py` (link: `/content/hw2/test_autograd.py`). You should fill the blanks between `# BEGIN SOLUTION` and `# END SOLUTION`. DO NOT change any other code segments. You can use the provided `create_grad_if_necessary` which makes sure that tensors that need gradients have one (if not, it creates a `.grad` attribute in the tensor's shape filled with zeros). As a reminder, this notebook uses the `autoreload` magic which automatically reloads the imported `.py` files (just make sure you save these file with `Ctrl+S`).


In [12]:
!python -m unittest test_autograd.TestBackward

...
----------------------------------------------------------------------
Ran 3 tests in 0.008s

OK


In [13]:
# Playground for debugging backward
from autograd import backward
from functional import cross_entropy_loss

# You CAN modify the content of the cell below. It is just an example.
ctx = []
x = torch.randn(4, 5)
l = torch.randint(5, size=(4,), dtype=torch.long)
y = softmax(x, ctx=ctx)
loss = cross_entropy_loss(y, l, ctx=ctx)

print('before backward')
print('loss.grad:', loss.grad)
print('y.grad:', y.grad)
print('x.grad:', x.grad)

backward(loss, ctx)

print('\n\nafter backward')
print('loss.grad:', loss.grad)
print('y.grad:', y.grad)
print('x.grad:', x.grad)

before backward
loss.grad: None
y.grad: None
x.grad: None


after backward
loss.grad: tensor(1.)
y.grad: tensor([[ 0.0420, -0.1991,  0.0733,  0.0425,  0.0414],
        [ 0.0487,  0.0736, -0.2055,  0.0418,  0.0415],
        [ 0.0490,  0.0415,  0.0507,  0.0418, -0.1830],
        [ 0.0444,  0.0548,  0.0629,  0.0445, -0.2067]])
x.grad: tensor([[ 0.0020, -0.0490,  0.0430,  0.0026,  0.0014],
        [ 0.0026,  0.0230, -0.0261,  0.0003,  0.0002],
        [ 0.0231,  0.0033,  0.0275,  0.0040, -0.0579],
        [ 0.0005,  0.0048,  0.0106,  0.0005, -0.0165]])


## (A.3) Learnable Layers

In this section you will implement a learnable Linear layer. The implementation is similar to vanilla PyTorch.

The skeleton of the learnable Linear layer to implement is in the `nn.py` file (link: `/content/hw2/nn.py`). The tests can be found in `test_nn.py` (link: `/content/hw2/test_nn.py`).

Learnable layers (and networks) inherits from the provided class `Module` (which is similar to PyTorch's `nn.Module`). This abstract class implements some utility methods (some are not used in this assignment). Please read the list of `Module`'s methods and attributes in its documentation (link: `/content/hw2/nn.py`).

In the `nn.py` file, you should fill the blanks (between `# BEGIN SOLUTION` and `# END SOLUTION`) in the relevant methods. DO NOT change any other code segments. You are provided with a cell to run the tests, and with a cell to debug your code (with the relevant imports). As a reminder, this notebook uses the `autoreload` magic which automatically reloads the imported `.py` files (just make sure you save these file with `Ctrl+S`).

**Note:** To see how "atomic" differentiable functions are composed into a complex differentiable function, please look at the provided `cross_entropy_loss` in `/content/hw2/functional.py`.

**Note:** Since this part doesn't use PyTorch's built-in autograd mechanism, please do not use tensors' `requires_grad` (this will result in errors/warnings).
Furthermore, do not use `nn.Parameter` in _from scratch_ layers.

### (A.3.1) Implement the Linear Layer

So far you have implemented *stateless* differentiable functions, and the autograd mechanism. In this section, you will implement a *stateful* layer, with learnable parameters. That is the `Linear` layer.

The parameters of the `Linear` layer are the weight matrix `weight` and the bias term `bias`. In your layer, you should:

1. **Create parameter tensors:** create tensors for the parameters in the correct shape. The parameters should be attributes of the layer, i.e. set as `self.<param> = <tensor>`. This is done in `Linear.__init__`.
2. **Register them as parameters:** add their names to `self._parameters`. This will be used by the provided `Module.parameters()` (to list module's parameters) and `Module.to()` (to trasfer module's parameters to a device) methods. This is done in `Linear.__init__`.
3. **Initialize the parameters:** initialization of the layer parameters has significant influence on the local minimum the network reaches during training. This is done in `Linear.init_parameters()`. You should call this method from `Linear.__init__`, so newly created linear layers are initialized.
4. **Implement a forward method:** use the existing differentiable function from part A, and implement the `Linear.forward()` method.

In [14]:
!python -m unittest test_nn.TestLinear

....
----------------------------------------------------------------------
Ran 4 tests in 0.011s

OK


In [15]:
# Playground for debugging Linear
from nn import Linear

## (A.4) Optimizer

In this section you will implement an optimizer. The optimizer updates the parameters based on the gradients they had accumulated. To do so it should have three main functions:

1. `__init__`: Receives the list of parameters (weights) to update their values and save them. May receive additional arguments, such as learning-rate, etc.
2. `step`: Updates the parameters values based on the value of their gradients. Doesn't receive any argument.
3. `zero_grad`: Zeros the gradients of the tracked parameters. This is necessary since gradients are accumulated in each backward pass, and we don't want to mix between batches. Doesn't receive any argument.

The skeleton of the optimizer is in the `optim.py` file (link: `/content/hw2/optim.py`). The tests can be found in `test_optim.py` (link: `/content/hw2/test_optim.py`). You should fill the blanks between `# BEGIN SOLUTION` and `# END SOLUTION`. DO NOT change any other code segments. As a reminder, this notebook uses the `autoreload` magic which automatically reloads the imported `.py` files (just make sure you save these file with `Ctrl+S`).


### (A.4.1) SGD Optimizer
In this part, you'll implement an SGD optimizer. This optimizer has a simple update rule, which is:
$$\mathbf{x}_{n+1} = \mathbf{x}_{n} - \delta \cdot \mathbf{g}_{n} $$
Where $\mathbf{x}_{n}$ is the parameter at step $n$, $\mathbf{g}_{n}$ is its gradient at step $n$, and $\delta$ is the learning rate (also called `lr`).

You should implement the `__init__`, `step` and `zero_grad` methods of `SGD` optimizer in `optim.py`.

**Note:** Parameters (tensors) should be updated **in-place** (i.e. with the `-=` operator) in `step`.

**Note:** A gradient (`param.grad`) which is set to `None` is also considered as zero.

In [16]:
!python -m unittest test_optim.TestSGD

..
----------------------------------------------------------------------
Ran 2 tests in 0.004s

OK


In [17]:
# Playground for debugging SGD
from optim import SGD

# Setup Before Training

In this part you will need to use GPU (this will have a significant impact on the training speed). To get a GPU in Google Colab, please go to the top menu and to: **Runtime ➔ Change runtime type**. Then, select **GPU** as **Hardware accelerator**.

Please run the cell below to set your pytorch device (either GPU or CPU), to load the dataset and to create data loaders.



In [18]:
from utils import load_mnist

# Set the device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
pin_memory = device.type == 'cuda'

# Load the training and test sets
train_data = load_mnist(mode='train')
test_data = load_mnist(mode='test')

# Create dataloaders for training and test sets
train_loader = torch.utils.data.DataLoader(train_data, batch_size=64, shuffle=True, pin_memory=pin_memory)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=64, pin_memory=pin_memory)

# (B) Define and Train Neural Networks From Scratch


In this part, you will define and train neural networks from scratch. You will use your differentiable functions from section (A).

The skeletons for this assignment can be found in the `models.py` (link: `/content/hw2/models.py`) and `train.py` (link: `/content/hw2/train.py`) files. You should fill the blanks between `# BEGIN SOLUTION` and `# END SOLUTION`. As a reminder, this notebook uses the `autoreload` magic which automatically reloads the imported `.py` files (just make sure you save these file with `Ctrl+S`).

Please run the cell below to import the relevant objects in order to train the models.

In [19]:
from functional import cross_entropy_loss as cross_entropy_scratch
from models import SoftmaxClassifier as SoftmaxClassifierScratch
from models import MLP as MLPScratch
from optim import SGD as SGDScratch
from train import train_loop as train_loop_scratch

## (B.1) Implement and Train a SoftmaxClassifier

Here you will implement the `SoftmaxClassifier` (imported here as `SoftmaxClassifierScratch`). You have already implemented the `SoftmaxClassifier` in Homework 1, but now it will be implemented with autograd and modular differentiable functions.

Your solution should have the following parts:

1. Create a model.
2. (Optional) Transfer the model to `device`.
3. Create an optimizer. (this should be done when the model is in its final device. It will not work otherwise).
4. Set other hyper-parameters (loss function, number of epochs, etc.).
5. Train the model.

**Note:** As opposed to its name, `SoftmaxClassifier` should not perform softmax. That's because softmax part of the cross-entropy loss (in PyTorch and in the _from scratch_ section).


In [20]:
# BEGIN SOLUTION
IMAGE_SIZE = 28*28
NUM_CLASSES = 10

lr = 1e-2

# Define your model
model = SoftmaxClassifierScratch(in_dim=IMAGE_SIZE, num_classes=NUM_CLASSES)

# Transfer it to device
model = model.to(device)

# Set an optimizer
optimizer = SGDScratch(parameters=model.parameters(), lr=lr)

# Set a criterion (loss function)
criterion = cross_entropy_scratch

# Set the number of epochs
epochs = 10

# Train your model
train_loop_scratch(model=model,
                   criterion=criterion,
                   optimizer=optimizer,
                   train_loader=train_loader,
                   test_loader=test_loader,
                   device=device,
                   epochs=epochs)
# END SOLUTION

Train   Epoch: 001 / 010   Loss:  0.4728   Accuracy: 0.871
 Test   Epoch: 001 / 010   Loss:   0.331   Accuracy: 0.908
Train   Epoch: 002 / 010   Loss:  0.3331   Accuracy: 0.905
 Test   Epoch: 002 / 010   Loss:  0.3064   Accuracy: 0.915
Train   Epoch: 003 / 010   Loss:  0.3112   Accuracy: 0.912
 Test   Epoch: 003 / 010   Loss:  0.2944   Accuracy: 0.917
Train   Epoch: 004 / 010   Loss:  0.3001   Accuracy: 0.915
 Test   Epoch: 004 / 010   Loss:  0.2905   Accuracy: 0.918
Train   Epoch: 005 / 010   Loss:  0.2928   Accuracy: 0.916
 Test   Epoch: 005 / 010   Loss:  0.2845   Accuracy: 0.918
Train   Epoch: 006 / 010   Loss:  0.2873   Accuracy: 0.919
 Test   Epoch: 006 / 010   Loss:  0.2803   Accuracy: 0.920
Train   Epoch: 007 / 010   Loss:  0.2832   Accuracy: 0.920
 Test   Epoch: 007 / 010   Loss:  0.2819   Accuracy: 0.919
Train   Epoch: 008 / 010   Loss:  0.2801   Accuracy: 0.921
 Test   Epoch: 008 / 010   Loss:  0.2798   Accuracy: 0.920
Train   Epoch: 009 / 010   Loss:  0.2774   Accuracy: 0.9

## (B.2) Implement and Train a Deep Neural Network

Here you will implement a multi-layer perceptron (`MLP`) model (imported here as `MLPScratch`). You are allowed to modify the signiture of `MLP.__init__` and add additional arguments to your choice. Your network must have more than a single linear layer.

Your solution should have the same parts as in (B.1).

In [21]:
# BEGIN SOLUTION
IMAGE_SIZE = 28*28
NUM_CLASSES = 10

lr = 1e-1
hidden_size = 256

# Define your model
model = MLPScratch(in_dim=IMAGE_SIZE, num_classes=NUM_CLASSES, hidden_size=hidden_size)

# Transfer it to device
model = model.to(device)

# Set an optimizer
optimizer = SGDScratch(parameters=model.parameters(), lr=lr)

# Set a criterion (loss function)
criterion = cross_entropy_scratch

# Set the number of epochs
epochs = 10

# Train your model
train_loop_scratch(model=model,
                   criterion=criterion,
                   optimizer=optimizer,
                   train_loader=train_loader,
                   test_loader=test_loader,
                   device=device,
                   epochs=epochs)
# END SOLUTION

Train   Epoch: 001 / 010   Loss:  0.2801   Accuracy: 0.916
 Test   Epoch: 001 / 010   Loss:  0.1528   Accuracy: 0.950
Train   Epoch: 002 / 010   Loss:  0.1016   Accuracy: 0.968
 Test   Epoch: 002 / 010   Loss:  0.1094   Accuracy: 0.965
Train   Epoch: 003 / 010   Loss: 0.06624   Accuracy: 0.980
 Test   Epoch: 003 / 010   Loss: 0.07922   Accuracy: 0.973
Train   Epoch: 004 / 010   Loss: 0.04675   Accuracy: 0.985
 Test   Epoch: 004 / 010   Loss: 0.06431   Accuracy: 0.979
Train   Epoch: 005 / 010   Loss: 0.03444   Accuracy: 0.989
 Test   Epoch: 005 / 010   Loss: 0.07117   Accuracy: 0.977
Train   Epoch: 006 / 010   Loss: 0.02533   Accuracy: 0.992
 Test   Epoch: 006 / 010   Loss: 0.06336   Accuracy: 0.982
Train   Epoch: 007 / 010   Loss: 0.01804   Accuracy: 0.995
 Test   Epoch: 007 / 010   Loss: 0.07014   Accuracy: 0.979
Train   Epoch: 008 / 010   Loss: 0.01326   Accuracy: 0.996
 Test   Epoch: 008 / 010   Loss: 0.06429   Accuracy: 0.981
Train   Epoch: 009 / 010   Loss: 0.008497   Accuracy: 0.

# (C) Define and Train PyTorch Neural Networks

In this part, you will define and train neural networks using PyTorch's built-in autograd mechanism. You MAY NOT use your differentiable functions from section (A). The solution to this part is very similar to the solution of part (B), with some syntax changes.

The skeletons for this assignment can be found in the `models_torch.py` (link: `/content/hw2/models_torch.py`) and `train_torch.py` (link: `/content/hw2/train_torch.py`) files. You should fill the blanks between `# BEGIN SOLUTION` and `# END SOLUTION`. As a reminder, this notebook uses the `autoreload` magic which automatically reloads the imported `.py` files (just make sure you save these file with `Ctrl+S`).

Please run the cell below to import the relevant objects in order to train the models.

**Note:** some methods are imported with different names in this notebook to distinguish them from the _From Scratch_ part. This is not a best practice, and used solely as a way to avoid ambiguities in this assignment.

In [22]:
# NOTE: `cross_entropy_torch` is different from `cross_entropy_scratch`!
# cross_entropy_torch(pred, target) == cross_entropy_scratch(softmax(pred), target)
from torch.nn.functional import cross_entropy as cross_entropy_torch
from models_torch import SoftmaxClassifier as SoftmaxClassifierTorch
from models_torch import MLP as MLPTorch
from torch.optim import SGD as SGDTorch
from train_torch import train_loop as train_loop_torch
from utils import load_mnist

## (C.1) Implement and Train a Softmax Classifier

Here you will implement the `SoftmaxClassifier` class (imported as `SoftmaxClassifierTorch`).

Your solution should have the same parts as in (B.1).

In [23]:
# BEGIN SOLUTION
IMAGE_SIZE = 28*28
NUM_CLASSES = 10

lr = 1e-1

# Define your model
model = SoftmaxClassifierTorch(in_dim=IMAGE_SIZE, num_classes=NUM_CLASSES)

# Transfer it to device
model = model.to(device)

# Set an optimizer
optimizer = SGDTorch(model.parameters(), lr=lr)

# Set a criterion (loss function)
criterion = cross_entropy_torch

# Set the number of epochs
epochs = 10

# Train your model
train_loop_torch(model=model,
                 criterion=criterion,
                 optimizer=optimizer,
                 train_loader=train_loader,
                 test_loader=test_loader,
                 device=device,
                 epochs=epochs)
# END SOLUTION

Train   Epoch: 001 / 010   Loss:  0.3716   Accuracy: 0.891
 Test   Epoch: 001 / 010   Loss:    0.33   Accuracy: 0.903
Train   Epoch: 002 / 010   Loss:  0.3138   Accuracy: 0.910
 Test   Epoch: 002 / 010   Loss:  0.3322   Accuracy: 0.905
Train   Epoch: 003 / 010   Loss:  0.3023   Accuracy: 0.915
 Test   Epoch: 003 / 010   Loss:  0.3261   Accuracy: 0.911
Train   Epoch: 004 / 010   Loss:  0.2984   Accuracy: 0.915
 Test   Epoch: 004 / 010   Loss:   0.293   Accuracy: 0.917
Train   Epoch: 005 / 010   Loss:  0.2921   Accuracy: 0.917
 Test   Epoch: 005 / 010   Loss:  0.2893   Accuracy: 0.921
Train   Epoch: 006 / 010   Loss:  0.2894   Accuracy: 0.918
 Test   Epoch: 006 / 010   Loss:  0.2921   Accuracy: 0.923
Train   Epoch: 007 / 010   Loss:  0.2864   Accuracy: 0.920
 Test   Epoch: 007 / 010   Loss:  0.2982   Accuracy: 0.919
Train   Epoch: 008 / 010   Loss:  0.2846   Accuracy: 0.921
 Test   Epoch: 008 / 010   Loss:  0.3185   Accuracy: 0.911
Train   Epoch: 009 / 010   Loss:  0.2836   Accuracy: 0.9

## (C.2) Implement and Train a Deep Neural Network

Here you will implement the `MLP` class (imported as `MLPTorch`).

Your solution should have the same parts as in (B.2).

In [None]:
# BEGIN SOLUTION
IMAGE_SIZE = 28*28
NUM_CLASSES = 10

lr = 1e-1
hidden_size = 256

# Define your model
model = MLPTorch(in_dim=IMAGE_SIZE, num_classes=NUM_CLASSES, hidden_size=hidden_size)

# Transfer it to device
model = model.to(device)

# Set an optimizer
optimizer = SGDTorch(model.parameters(), lr=lr)

# Set a criterion (loss function)
criterion = cross_entropy_torch

# Set the number of epochs
epochs = 10

# Train your model
train_loop_torch(model=model,
                 criterion=criterion,
                 optimizer=optimizer,
                 train_loader=train_loader,
                 test_loader=test_loader,
                 device=device,
                 epochs=epochs)
# END SOLUTION

Train   Epoch: 001 / 010   Loss:  0.2745   Accuracy: 0.918
 Test   Epoch: 001 / 010   Loss:  0.1404   Accuracy: 0.956
Train   Epoch: 002 / 010   Loss:  0.1008   Accuracy: 0.969
 Test   Epoch: 002 / 010   Loss: 0.09733   Accuracy: 0.968
Train   Epoch: 003 / 010   Loss: 0.06668   Accuracy: 0.979
 Test   Epoch: 003 / 010   Loss: 0.07625   Accuracy: 0.976
Train   Epoch: 004 / 010   Loss: 0.04695   Accuracy: 0.985
 Test   Epoch: 004 / 010   Loss: 0.07324   Accuracy: 0.977
Train   Epoch: 005 / 010   Loss: 0.03476   Accuracy: 0.989
 Test   Epoch: 005 / 010   Loss: 0.07059   Accuracy: 0.978
Train   Epoch: 006 / 010   Loss: 0.02579   Accuracy: 0.992
 Test   Epoch: 006 / 010   Loss: 0.06584   Accuracy: 0.980
Train   Epoch: 007 / 010   Loss: 0.01852   Accuracy: 0.994
 Test   Epoch: 007 / 010   Loss: 0.06502   Accuracy: 0.980
Train   Epoch: 008 / 010   Loss: 0.01293   Accuracy: 0.996


# Submit Your Solution

In [None]:
#@title # Create and Download Your Solution

import os
import re
import zipfile
from google.colab import files

def create_zip(files, hw, name):
  zip_path = f'{hw}-{name}.zip'
  with zipfile.ZipFile(zip_path, 'w') as f:
    for fname in files:
      if not os.path.isfile(fname):
        raise FileNotFoundError(f"Couldn't find file: '{fname}' in the homework directory")
      f.write(fname, fname)
  return zip_path

# export notebook as html
!jupyter nbconvert --to html hw2.ipynb

##@markdown Please upload your typed solution (`.pdf` file) to the homework directory, and use the name `hw2-sol.pdf`.

student_name = "Itai Antebi"  #@param{type:"string"}
assignment_name = 'hw2'
assignment_sol_files = ['hw2.ipynb', 'hw2.html', 'autograd.py', 'functional.py', 'nn.py', 'optim.py',
                        'models.py', 'models_torch.py', 'train.py', 'train_torch.py']
zip_name = re.sub('[_ ]+', '_', re.sub(r'[^a-zA-Z_ ]+', '', student_name.lower()))

# create zip with your solution
zip_path = create_zip(assignment_sol_files, assignment_name, zip_name)

# download the zip
files.download(zip_path)

#@markdown Enter your name in `student_name` and run this cell to create and download a `.zip` file with your solution.

#@markdown You should submit your solution via the Dropbox link given in Piazza.

#@markdown **Note:** If you run this cell multiple times, you may be prompted by the browser to allow this page to download multiple files.