In [1]:
#@title ## Mount Your Google Drive
#@markdown Please run this cell (`Ctrl+Enter` or `Shift+Enter`) and follow the steps printed bellow.

from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [2]:
#@title ## Map Your Directory
import os

def check_assignment(assignment_dir, files_list):
  files_in_dir = set(os.listdir(assignment_dir))
  for fname in files_list:
    if fname not in files_in_dir:
      raise FileNotFoundError(f'could not find file: {fname} in assignment_dir')

assignment_dest = "/content/hw1"
assignment_dir = "/content/gdrive/MyDrive/DL4CV/hw1"  #@param{type:"string"}
assignment_files = ['hw1.ipynb', 'model.py', 'test_model.py', 'train.py', 'utils.py']

# check Google Drive is mounted
if not os.path.isdir("/content/gdrive"):
  raise FileNotFoundError("Your Google Drive isn't mounted. Please run the above cell.")

# check all files there
check_assignment(assignment_dir, assignment_files)

# create symbolic link
!rm -f {assignment_dest}
!ln -s "{assignment_dir}" "{assignment_dest}"
print(f'Succesfully mapped (ln -s) "{assignment_dest}" -> "{assignment_dir}"')

# cd to linked dir
%cd -q {assignment_dest}
print(f'Succesfully changed directory (cd) to "{assignment_dest}"')
#@markdown Set the path `assignment_dir` to the assignment directory in your Google Drive and run this cell.

Succesfully mapped (ln -s) "/content/hw1" -> "/content/gdrive/MyDrive/DL4CV/hw1"
Succesfully changed directory (cd) to "/content/hw1"


In [3]:
#@title ## Written Assignment

#@markdown In addition to this coding assignment, there is also a written assignment that can be found in `hw1.pdf`.

#@markdown Please solve this assignment and upload your solution as `hw1-sol.pdf`. It will be packed together with your coding solution in the **Submit Your Solution** section.

#@markdown Your solution to the written part should be typed, not hand-written. We recommend using LyX or LaTex, but you can also use Word or similar text editor.

# (A) Implement Softmax Classifier From Scratch

In this section of the exercise, you will implement a Softmax Classifier step-by-step, from scratch.

You should open the `model.py` file (by clicking on this link: `/content/hw1/model.py`). Alternatively, you can go the left menu, click on **Files (📁)**, go to the directory `hw1` (or `content/hw1`) and double-click on `model.py`.

In each part you will be asked to implement a single method. Your solution should be between the `# BEGIN SOLUTION` and `# BEGIN SOLUTION` comments. You should also remove the `raise NotImplementedError` line in your solution.

After the description of the method in this notebook, there is a testing cell which will test the correctness of your code (tests code is in: `/content/hw1/test_model.py`).

**Note:** The files in this assignment are auto-imported in this notebook. It means that you can change them, save them (`Ctrl+S`) and this change will immediately take affect in the notebook (when you use these functions again). You can use the dedicated playground cells to debug your code.

In [4]:
import torch

%load_ext autoreload
%autoreload 2

## (A.1) Implement Softmax

In this part you will implement the `softmax` activation function, which is defined as:
$$ \text{softmax}(\mathbf{x})_i = \frac{\exp(\mathbf{x}_i)}{\sum_{j=1}^{n} \exp(\mathbf{x}_j)} $$

The output of `softmax` is a probability measure over the `n` classes.

Since the use of batches is very common in ML and DL, your implementation should support running `softmax` of a batch of vectors (_i.e._ a tensor of shape `(batch_size, n)`). The softmax function is applied to each vector in the batch _independently_.

Real numbers have a fixed-length representation in computers, so very large numbers cannot be represented. Your solution should be numerically stable.

To solve this part, please implement the `softmax` function in `model.py`. You can test your solution by running the cell below.


In [5]:
!python -m unittest test_model.Softmax

.....
----------------------------------------------------------------------
Ran 5 tests in 0.008s

OK


In [6]:
# playground for debugging softmax
from model import softmax

## (A.2) Cross-Entropy Loss

In this part you will implement the `cross_entropy` loss function (for hard-label), which is defined as:
$$ \text{CE}(\hat{\mathbf{y}}, \ell)_i = -\log(\hat{\mathbf{y}}_i) \cdot \delta_{i,\ell} $$

Where $\hat{\mathbf{y}}$ (also called `pred` or `y_hat`) is the predicted probability measure over the classes and $\ell$ (also called `target` or `y`) is the target class label.

As before, you are required to make sure that your solution should support batches and be numerically stable.

To solve this part, please implement the `cross_entropy` function in `model.py`. You can test your solution by running the cell below.

In [7]:
!python -m unittest test_model.CrossEntropy

....
----------------------------------------------------------------------
Ran 4 tests in 0.005s

OK


In [8]:
# playground for debugging cross_entropy
from model import cross_entropy

## (A.3) Softmax Classifer

In this part you will implement the `softmax_classifier` function, which recieves and input $\mathbf{x}$, a weight matrix $W$ and a bias term $\mathbf{b}$ and returns:
$$ h_{\theta}(\mathbf{x}) = \text{softmax}\left( W \cdot \mathbf{x}  + \mathbf{b} \right) $$

Where $\theta$ is a notation for $(W,\mathbf{b})$.

Since this function has to deal with a batched input $\mathbf{x}$, it's actually represented as a matrix $X$ (also called `x`) of shape `(batch_size, in_dim)`. The weight matrix $W$ (also called `w`) is a matrix of shape `(out_dim, in_dim)`, and the bias term $\mathbf{b}$ (also called `b`) is a vector of shape `(out_dim,)`.

To solve this part, please implement the `softmax_classifier` function in `model.py`. You can test your solution by running the cell below.

In [9]:
!python -m unittest test_model.SoftmaxClassifier

..
----------------------------------------------------------------------
Ran 2 tests in 0.006s

OK


In [10]:
# playground for debugging softmax_classifier
from model import softmax_classifier

## (A.4) Softmax Classifier Backward
In this part you will implement the `softmax_classifier_backward`, which computes the gradients of the weights of the Softmax Classifier. In your theoretical homework assignment you've derived the formula for the gradient of $W$ (also called `weight` or `w`), given the input $\mathbf{x}$ (also called `input` or `x`), the classifier's prediction $\hat{\mathbf{y}}$ (also called `pred` or `y_hat`) and the target label $\ell$ (also called `target` or `y`).

To solve this part, please implement the `softmax_classifier_backward` function in `model.py`. You can test your solution by running the cell below.

In [11]:
!python -m unittest test_model.SoftmaxClassifierBackward

..
----------------------------------------------------------------------
Ran 2 tests in 0.008s

OK


In [12]:
# playground for debugging softmax_classifier_backward
from model import softmax_classifier_backward

# (B) Train the Model
In this part you will create and train the Softmax Classifier to detect hand-written digits from the [MNIST](https://en.wikipedia.org/wiki/MNIST_database) dataset. The dataset consists of images of digits (of size 28×28), and their values (0-9) as supervision.

The Softmax Classifier should classify over 10 classes, one per digit. The $0$ digit is the class at index `0`. In general, the $d$ digit is the class at index `d`. The output of the classifier is a probablity distribution over the 10 classes. The predicted class is that with highest probability (ties are broken arbitrarily; they are very rare).

Your goal is to achieve high accuracy on the test set. Accuracy (you can use the provided `accuracy` function) is the defined as the part of examples classified correctly (_i.e._, the predicted class is the correct value of the digit). However, this loss can't be optimized directly - so you'll train the classifier to minimize the Cross-Entropy loss.

The classifier is represented as a tuple `(w, b)`. Training the classifier means to update it's weights. The training process consists of multiple epochs. In each epoch, the classifier is trained over all the examples in the training test once. Every several epochs, the classifier is tested (_i.e._, evaluted) on the test set. No examples are shared between these sets.

You should open the `train.py` file (by clicking on this link: `/content/hw1/train.py`). Alternatively, you can go the left menu, click on **Files (📁)**, go to the directory `hw1` (or `content/hw1`) and double-click on `train.py`. This file contains the following methods:

1. `create_model`: You will implement this method to create (and initialize) a model.
2. `train_epoch`: You will implement this method to run a single training epoch.
3. `test_epoch`: You will impleement this method to run a single evaluation (test) epoch.
4. `train_loop`: This method is **GIVEN** to you as-is. It uses `train_epoch` and `test_epoch`. You should use it to train your model.

You are also recommended to look at the provided utilities file: `/content/hw1/utils.py`.

## (B.0) Load the MNIST Dataset

Please run the following cell to load the MNIST dataset.

In [13]:
from utils import load_mnist

# Load the training and test sets
train_data = load_mnist(mode='train')
test_data = load_mnist(mode='test')

# Create dataloaders for training and test sets
train_loader = torch.utils.data.DataLoader(train_data, batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=64)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz


HBox(children=(FloatProgress(value=0.0, max=9912422.0), HTML(value='')))


Extracting ./data/MNIST/raw/train-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./data/MNIST/raw/train-labels-idx1-ubyte.gz


HBox(children=(FloatProgress(value=0.0, max=28881.0), HTML(value='')))


Extracting ./data/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw/t10k-images-idx3-ubyte.gz


HBox(children=(FloatProgress(value=0.0, max=1648877.0), HTML(value='')))


Extracting ./data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz


HBox(children=(FloatProgress(value=0.0, max=4542.0), HTML(value='')))


Extracting ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw

Processing...


  return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)


Done!


## (B.1) Create a Model
In this part you will implement a method that creates a new model for MNIST classification (see details above, mainly the input and output sizes).

In [14]:
# playground for debugging create_model
from train import create_model

## (B.2) Train a Single Epoch
In this part you will implement the `train_epoch` method. This method recieves the model `(w, b)`, a learning rate `lr` and a data loader `loader` of the training set, and updates the model weights in order to minimize the cross-entropy loss. It also computes the average loss and accuracy over the training set\*.

You're given a skeleton of this method, mainly the iteration over the data loader. At each iteration (batch) the data loader returns two tensors: `x` and `y`. `x` is batch of images (has shape `(batch_size, 1, 28, 28)`), and `y` is a batch of (the correct) labels (has shape `(batch_size,)`).

In your solution, you should do as follows:

1.   Reshape the inputs `x` to match the shape expected by the classifier.
2.   Run the model to get a prediction.
3.   Compute the cross-entropy loss (**MUST** be stored in a tensor `loss`) and accuracy (**MUST** be stored in a tensor `acc`). You should use the `accuracy` method from `utils.py` (already imported in `train.py`).
4.   Run the backward step to compute the gradients of the weights.
5.   Update the weights according to their gradients and the learning rate.

---
\* This is not enitrely accurate (no pun intended), as the model changes throughout this training phase. This will be different than evaluation the model over the training set after the training phase. However, since iterating over the training set is expensive, this is the common practice.

In [15]:
# playground for debugging train_epoch
from train import create_model, train_epoch

## (B.3) Test After Epoch
In this part you will implement the `test_epoch` method. This method recieves the model `(w, b)` and a data loader `loader` of the test set, and computes the average loss and accuracy over it.

As before, you're given a skeleton of this function. Note that in `test_epoch` you **MUST NOT** update the model!

In your solution, you should do as follows:

1.   Reshape the inputs `x` to match the shape expected by the classifier.
2.   Run the model to get a prediction.
3.   Compute the cross-entropy loss (**MUST** be stored in a tensor `loss`) and accuracy (**MUST** be stored in a tensor `acc`). You should use the `accuracy` method from `utils.py` (already imported in `train.py`).

In [16]:
# playground for debugging test_epoch
from train import create_model, test_epoch

## (B.4) Train A Model
In this part you will train your model. You are provided with a `train_loop` method that uses your existing `train_epoch` and `test_epoch`.

In this phase, you should:

1. Create a model (you may want to check different initialization schemes and see how it changes the convergence speed).
2. Set learning rate and number of epochs (you may want to check different parameters and see how they affect the convergence).
3. Train your model using `train_loop`. This method reports the loss and accuracy.

In [None]:
from train import create_model, train_loop

# BEGIN SOLUTION

# 1. Create a model
w, b = create_model()

# 2. Set learning rate and number of epochs
lr = 0.02
epochs = 10

# END SOLUTION

# 3. Train your model with `train_loop`
train_loop(w=w,
           b=b,
           lr=lr,
           train_loader=train_loader,
           test_loader=test_loader,
           epochs=epochs)

Train   Epoch: 001 / 010   Loss:  0.4067   Accuracy: 0.887
 Test   Epoch: 001 / 010   Loss:  0.3095   Accuracy: 0.913
Train   Epoch: 002 / 010   Loss:   0.311   Accuracy: 0.911
 Test   Epoch: 002 / 010   Loss:  0.2967   Accuracy: 0.915
Train   Epoch: 003 / 010   Loss:  0.2955   Accuracy: 0.916
 Test   Epoch: 003 / 010   Loss:  0.2825   Accuracy: 0.919
Train   Epoch: 004 / 010   Loss:  0.2868   Accuracy: 0.919
 Test   Epoch: 004 / 010   Loss:  0.2795   Accuracy: 0.922
Train   Epoch: 005 / 010   Loss:  0.2812   Accuracy: 0.921
 Test   Epoch: 005 / 010   Loss:  0.2753   Accuracy: 0.921
Train   Epoch: 006 / 010   Loss:  0.2768   Accuracy: 0.922
 Test   Epoch: 006 / 010   Loss:  0.2782   Accuracy: 0.921
Train   Epoch: 007 / 010   Loss:  0.2734   Accuracy: 0.923
 Test   Epoch: 007 / 010   Loss:  0.2698   Accuracy: 0.923
Train   Epoch: 008 / 010   Loss:  0.2709   Accuracy: 0.925
 Test   Epoch: 008 / 010   Loss:  0.2749   Accuracy: 0.922


# Submit Your Solution

In [None]:
#@title # Create and Download Your Solution

import os
import re
import zipfile
from google.colab import files

def create_zip(files, hw, name):
  zip_path = f'{hw}-{name}.zip'
  with zipfile.ZipFile(zip_path, 'w') as f:
    for fname in files:
      if not os.path.isfile(fname):
        raise FileNotFoundError(f"Couldn't find file: '{fname}' in the homework directory")
      f.write(fname, fname)
  return zip_path

# export notebook as html
!jupyter nbconvert --to html hw1.ipynb

#@markdown Please upload your typed solution (`.pdf` file) to the homework directory, and use the name `hw1-sol.pdf`.

student_name = "Itai Antebi"  #@param{type:"string"}
assignment_name = 'hw1'
assignment_sol_files = ['hw1-sol.pdf', 'hw1.ipynb', 'hw1.html', 'model.py', 'train.py']
zip_name = re.sub('[_ ]+', '_', re.sub(r'[^a-zA-Z_ ]+', '', student_name.lower()))

# create zip with your solution
zip_path = create_zip(assignment_sol_files, assignment_name, zip_name)

# download the zip
files.download(zip_path)

#@markdown Enter your name in `student_name` and run this cell to create and download a `.zip` file with your solution.

#@markdown You should submit your solution via the Dropbox link given in Piazza.

#@markdown **Note:** If you run this cell multiple times, you may be prompted by the browser to allow this page to download multiple files.