# Assignment 1 (60 points total)

You will train a convolutional neural network (aka ConvNet or CNN) to solve yet another image classification problem: the Tiny ImageNet dataset (200 classes, 100K training images, 10K validation images). Try to achieve as high accuracy as possible.

This exercise is close to what people do in real life. No toy architectures this time.

## Grading

* 11 points for the report.
* 5 points for using an **interactive** (don't reinvent the wheel with `plt.plot`) tool for viewing progress, for example TensorBoard.
* 9 points for a network that gets $\geq$25% accuracy on the private **test** set.
* Up to 35 points for accuracy up to 50%, issued linearly (i.e. 0 points for 25%, 7 points for 30%, 21 points for 40%, 35 points for $\geq$50%.

## Grading Explained

* *Private test set*: it's a part of the dataset like the validation set, but for which the ground truth labels are known only to us (you won't be able to evaluate your model on it). When grading, we will compute test accuracy by running your code that computes val accuracy, but having replaced the images in `'val/'` with the test set.
* *How to submit*:
  * **<font color="red">Read this in advance, don't leave until the last minute.</font> Wrong checkpoint submission = <font color="red">0 points for accuracy</font>. Be careful!**
  * After you've trained your network, [save weights](https://pytorch.org/tutorials/recipes/recipes/saving_and_loading_a_general_checkpoint.html) to "*checkpoint.pth*" with `model.state_dict()` and `torch.save()`.
  * Set `DO_TRAIN = False`, click "Restart and Run All" and make sure that your validation accuracy is computed correctly.
  * Compute the MD5 checksum for "*checkpoint.pth*" (e.g. run `!md5sum checkpoint.pth`) and paste it into "*solution.py*" (`get_checkpoint_metadata()`). You'll be penalized if this checksum doesn't match your submitted file.
  * Upload "*checkpoint.pth*" to Google Drive, copy the view-only link to it and paste it into "*solution.py*" as well.
  * Make sure "Restart and Run All" also works with `DO_TRAIN = True`: trains your model and computes validation accuracy.
  * <font color="red">Important</font>: At least several hours before the deadline, **upload "*solution.py*" [here](http://350e-83-69-192-100.ngrok.io/) and make sure you get a "👌"**.

* *Report*: PDF, free form; should mention:
  * Your history of tweaks and improvements. How you started, what you searched. (*I have analyzed these and those conference papers/sources/blog posts. I tried this and that to adapt them to my problem. ...*)
  * Which network architectures have you tried? Which of them didn't work, and can you guess why? What is the final one and why?
  * Same for the training method (batch size, optimization algorithm, number of iterations, ...): which and why?
  * Same for anti-overfitting (regularization) techniques. Which ones have you tried? What were their effects, and can you guess why?
  * **Most importantly**: deep learning insights you gained. Can you give several examples of how *exactly* experience from this exercise will affect you training your future neural nets? (tricks, heuristics, conclusions, observations)
  * **List all sources of code**.
* *Progress viewing tool*: support the report with screenshots of accuracy and loss plots (training and validation) over time.

## Restrictions

* No pretrained networks.
* Don't enlarge images (e.g. don't resize them to $224 \times 224$ or $256 \times 256$).

## Tips

* **One change at a time**: don't test several new things at once (unless you are super confident that they will work). Train a model, introduce one change, train again.
* Google a lot: try to reinvent as few wheels as possible. Harvest inspiration from PyTorch recipes, from GitHub, from blogs...
* Use GPU.
* Regularization is very important: L2, batch normalization, dropout, data augmentation...
* Pay much attention to accuracy and loss graphs (e.g. in TensorBoard). Track failures early, stop bad experiments early.
* 2-3 hours of training (in Colab) should be enough for most models, maybe 4-6 hours if you're experimenting.
* Save checkpoints every so often in case things go wrong (optimization diverges, Colab disconnects...).
* Don't use too large batches, they can be slow and memory-hungry. This is true for inference too.
* Also don't forget to use `torch.no_grad()` and `.eval()` during inference.

In [8]:
# Determine the locations of auxiliary libraries and datasets.
# `AUX_DATA_ROOT` is where 'tiny-imagenet-2022.zip' is.

# Detect if we are in Google Colaboratory
try:
    import google.colab
    IN_COLAB = True
except ImportError:
    IN_COLAB = False

from pathlib import Path
if IN_COLAB:
    google.colab.drive.mount("/content/drive")
    
    # Change this if you created the shortcut in a different location
    AUX_DATA_ROOT = Path("/content/drive/My Drive/Deep Learning 2022 -- Home Assignment 1")
    
    assert AUX_DATA_ROOT.is_dir(), "Have you forgot to 'Add a shortcut to Drive'?"
    
    import sys
    sys.path.append(str(AUX_DATA_ROOT))
else:
    AUX_DATA_ROOT = Path(".")

In [9]:
# Imports

# Your solution
%load_ext autoreload
%autoreload 1

%aimport solution

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [10]:
# If `True`, will train the model from scratch and validate it.
# If `False`, instead of training will load weights from './checkpoint.pth'.
# When grading, we will test both cases.
DO_TRAIN = False

In [11]:
# Put training and validation images in `./tiny-imagenet-200/train` and `./tiny-imagenet-200/val`:
if not Path("tiny-imagenet-200/train/class_000/00000.jpg").is_file():
    import zipfile
    with zipfile.ZipFile(AUX_DATA_ROOT / 'tiny-imagenet-2022.zip', 'r') as archive:
        archive.extractall()

In [12]:
# Initialize dataloaders
train_dataloader = solution.get_dataloader("./tiny-imagenet-200/", 'train')
val_dataloader   = solution.get_dataloader("./tiny-imagenet-200/", 'val')

# Initialize the raw model
model = solution.get_model()

In [13]:
if DO_TRAIN:
    # Train from scratch
    optimizer = solution.get_optimizer(model)
    solution.train_on_tinyimagenet(train_dataloader, val_dataloader, model, optimizer)
else:
    # Download the checkpoint and initialize model weights from it
    import urllib
    import subprocess

    penalize = False

    # Get your link and checksum
    claimed_md5_checksum, google_drive_link = solution.get_checkpoint_metadata()

    # Use your link to download "checkpoint.pth"
    !pip install -U gdown
    !gdown --id {urllib.parse.urlparse(google_drive_link).path.split('/')[-2]} -O checkpoint.pth

    try:
        # Compute the actual checksum
        real_md5_checksum = subprocess.check_output(
            ["md5sum", "checkpoint.pth"]).decode().split()[0]
    except subprocess.CalledProcessError as err:
        # Couldn't download or the filename isn't "checkpoint.pth"
        print(f"Wrong link or filename: {err}")
        penalize = True
    else:
        # The trained checkpoint is different from the one submitted
        if real_md5_checksum != claimed_md5_checksum:
            print("Checksums differ! Late submission?")
            penalize = True

    if penalize:
        print("🔫 Prepare the penalizer! 🔫")

    # Finally load weights
    solution.load_weights(model, "./checkpoint.pth")



Downloading...
From: https://drive.google.com/uc?id=1eAf16xpXCQJbZVoSobmQ8No_eidoVhKs
To: d:\docs\Git\skoltech\term4\dl\hw1\checkpoint.pth

  0%|          | 0.00/45.2M [00:00<?, ?B/s]
  1%|1         | 524k/45.2M [00:00<00:09, 4.76MB/s]
  5%|4         | 2.10M/45.2M [00:00<00:04, 8.72MB/s]
  8%|8         | 3.67M/45.2M [00:00<00:03, 10.5MB/s]
 12%|#1        | 5.24M/45.2M [00:00<00:03, 11.0MB/s]
 15%|#5        | 6.82M/45.2M [00:00<00:03, 11.3MB/s]
 19%|#8        | 8.39M/45.2M [00:00<00:03, 11.4MB/s]
 22%|##2       | 9.96M/45.2M [00:00<00:03, 11.5MB/s]
 26%|##5       | 11.5M/45.2M [00:01<00:02, 11.6MB/s]
 29%|##9       | 13.1M/45.2M [00:01<00:02, 11.7MB/s]
 32%|###2      | 14.7M/45.2M [00:01<00:02, 11.7MB/s]
 36%|###5      | 16.3M/45.2M [00:01<00:02, 11.7MB/s]
 39%|###9      | 17.8M/45.2M [00:01<00:02, 11.2MB/s]
 43%|####2     | 19.4M/45.2M [00:01<00:02, 11.5MB/s]
 46%|####6     | 21.0M/45.2M [00:01<00:02, 11.9MB/s]
 50%|####9     | 22.5M/45.2M [00:01<00:01, 11.9MB/s]
 53%|#####3    | 24.1M

FileNotFoundError: [WinError 2] Не удается найти указанный файл

In [None]:
# Classify some validation samples
import torch

example_batch, example_batch_labels = next(iter(val_dataloader))
model.eval()
with torch.no_grad():
  _, example_predicted_labels = solution.predict(model, example_batch).max(1)

print("Predicted class / Ground truth class")
for predicted, gt in list(zip(example_predicted_labels, example_batch_labels))[:15]:
    print("{:03d} / {:03d}".format(predicted, gt))

In [None]:
# Print validation accuracy
val_accuracy, _ = solution.validate(val_dataloader, model)
val_accuracy *= 100
assert 1.5 <= val_accuracy <= 100.0
print("Validation accuracy: %.2f%%" % val_accuracy)