---
# IMPORTANT

**Please remember to save this notebook `SC201_Assignment5.ipynb` as you work on it!**

### 請大家務必在這份作業中使用 GPU。

請點選 `Runtime -> Change runtime type` 並將 `Hardware Accelerator` 設定為 `GPU`。

In [None]:
# this mounts your Google Drive to the Colab VM.
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

# 請輸入 a5 資料夾之所在位置
FOLDERNAME = 'Colab\ Notebooks/SC201_Assignment5'
assert FOLDERNAME is not None, "[!] Enter the foldername."

# now that we've mounted your Drive, this ensures that
# the Python interpreter of the Colab VM can load
# python files from within it.
import sys
sys.path.append('/content/drive/MyDrive/{}'.format(FOLDERNAME))

# this downloads the CIFAR-10 dataset to your Drive
# if it doesn't already exist.
%cd drive/MyDrive/$FOLDERNAME/sc201/datasets/
!bash get_datasets.sh
%cd /content

# What is PyTorch?

PyTorch 是一套計算系統，可以用來計算動態圖形 (neural network 是圖形的一種)。這些圖形是由 PyTorch 的 Tensor 物件組成的，Tensor 的用法如同 numpy 矩陣。PyTorch 內建自動微分的功能，使用者就不必手動處理 backward pass！

This notebook assumes that you are using **PyTorch version 1.4+**

## Why PyTorch?

* PyTorch 支援 GPU 計算，我們的 training 就可以利用 GPU 執行，程式會跑的更快！
* PyTorch 也是使用 modular design，大家以後就可以直接使用 PyTorch 既有模組（或是自己定義）並隨意拼湊成各式各樣的 neural network！
* 學術和業界中的 machine learning 都是使用 PyTorch 或是其他類似的強大計算套件，大家也就能跟上最新的研究和應用！

## How can I learn PyTorch on my own?

有興趣可以參考網路上的 PyTorch 教學，如 https://github.com/jcjohnson/pytorch-examples 

另外也可以參考 PyTorch 的說明書 [API doc](http://pytorch.org/docs/stable/index.html)。PyTorch 相關問題會建議大家在 [PyTorch forum](https://discuss.pytorch.org/) 上發問，而非 StackOverflow。

# Section I. Preparation

大家在之前的作業裡做 data preparation 都是呼叫我們提供的程式。

PyTorch 內建的 `DataLoader` 和 `sampler` 類別可以將這個步驟自動化。詳細用法請參考以下的 code，特別是 data 的正規化 (normalization) 和分劃 (partitioning into *train / val / test*)。

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import DataLoader
from torch.utils.data import sampler

import torchvision.datasets as dset
import torchvision.transforms as T

import numpy as np

In [None]:
NUM_TRAIN = 49000

# The torchvision.transforms package provides tools for preprocessing data
# and for performing data augmentation; here we set up a transform to
# preprocess the data by subtracting the mean RGB value and dividing by the
# standard deviation of each RGB value; we've hardcoded the mean and std.
transform = T.Compose([
                T.ToTensor(),
                T.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
            ])

# We set up a Dataset object for each split (train / val / test); Datasets load
# training examples one at a time, so we wrap each Dataset in a DataLoader which
# iterates through the Dataset and forms minibatches. We divide the CIFAR-10
# training set into train and val sets by passing a Sampler object to the
# DataLoader telling how it should sample from the underlying Dataset.
cifar10_train = dset.CIFAR10('./sc201/datasets', train=True, download=True,
                             transform=transform)
loader_train = DataLoader(cifar10_train, batch_size=64, 
                          sampler=sampler.SubsetRandomSampler(range(NUM_TRAIN)))

cifar10_val = dset.CIFAR10('./sc201/datasets', train=True, download=True,
                           transform=transform)
loader_val = DataLoader(cifar10_val, batch_size=64, 
                        sampler=sampler.SubsetRandomSampler(range(NUM_TRAIN, 50000)))

cifar10_test = dset.CIFAR10('./sc201/datasets', train=False, download=True, 
                            transform=transform)
loader_test = DataLoader(cifar10_test, batch_size=64)

我們透由 `device` 啟用 PyTorch 的 GPU 功能。

（如果您未將 CUDA 開啟，`torch.cuda.is_available()` 會回傳 False，使 notebook 轉回 CPU mode。）

In [None]:
USE_GPU = True

dtype = torch.float32 # we will be using float throughout this tutorial

if USE_GPU and torch.cuda.is_available():
    device = torch.device('cuda')
else:
    device = torch.device('cpu')

# Constant to control how frequently we print train loss
print_every = 100

print('using device:', device)

In [None]:
def train_part34(model, optimizer, epochs=1):
    """
    Train a model on CIFAR-10 using the PyTorch Module API.
    
    Inputs:
    - model: A PyTorch Module giving the model to train.
    - optimizer: An Optimizer object we will use to train the model
    - epochs: (Optional) A Python integer giving the number of epochs to train for
    
    Returns: Nothing, but prints model accuracies during training.
    """
    model = model.to(device=device)  # move the model parameters to CPU/GPU
    for e in range(epochs):
        for t, (x, y) in enumerate(loader_train):
            model.train()  # put model to training mode
            x = x.to(device=device, dtype=dtype)  # move to device, e.g. GPU
            y = y.to(device=device, dtype=torch.long)

            scores = model(x)
            loss_function = nn.CrossEntropyLoss()
            loss = loss_function(scores, y)

            # Zero out all of the gradients for the variables which the optimizer
            # will update.
            optimizer.zero_grad()

            # This is the backwards pass: compute the gradient of the loss with
            # respect to each  parameter of the model.
            loss.backward()

            # Actually update the parameters of the model using the gradients
            # computed by the backwards pass.
            optimizer.step()

            if t % print_every == 0:
                print('Iteration %d, loss = %.4f' % (t, loss.item()))
                check_accuracy_part34(loader_val, model)
                print()

In [None]:
def check_accuracy_part34(loader, model):
    if loader.dataset.train:
        print('Checking accuracy on validation set')
    else:
        print('Checking accuracy on test set')   
    num_correct = 0
    num_samples = 0
    model.eval()  # set model to evaluation mode
    with torch.no_grad():
        for x, y in loader:
            x = x.to(device=device, dtype=dtype)  # move to device, e.g. GPU
            y = y.to(device=device, dtype=torch.long)
            scores = model(x)
            _, preds = scores.max(1)
            num_correct += (preds == y).sum()
            num_samples += preds.size(0)
        acc = float(num_correct) / num_samples
        print('Got %d / %d correct (%.2f)' % (num_correct, num_samples, 100 * acc))

# PyTorch Sequential API

### Sequential API: Two-Layer Network
以下是 two-layer fully connected network 的 `nn.Sequential` 範例，我們把內建的 layer 依序丟入，並使用同樣的 training loop 進行訓練。

大家在這裡不用做 hyperparameter tuning，但是在不做 tuning 的情況下，模型應該還是能在一個 epoch 之內達到 40% 以上的準確率。

In [None]:
# We need to wrap `flatten` function in a module in order to stack it
# in nn.Sequential

hidden_layer_size = 4000
learning_rate = 1e-2

model = nn.Sequential(
    nn.Flatten(),
    nn.Linear(3 * 32 * 32, hidden_layer_size),
    nn.ReLU(),
    nn.Linear(hidden_layer_size, 10),
)

# you can use Nesterov momentum in optim.SGD
optimizer = optim.SGD(model.parameters(), lr=learning_rate,
                     momentum=0.9, nesterov=True)

train_part34(model, optimizer, epochs=3)

### Sequential API: Three-Layer ConvNet
請大家使用 `nn.Sequential` 建立並訓練出一套 three-layer ConvNet，架構依舊是：

1. Convolutional layer (with bias) with 32 5x5 filters, with zero-padding of 2
2. ReLU
3. Convolutional layer (with bias) with 16 3x3 filters, with zero-padding of 1
4. ReLU
5. Fully-connected layer (with bias) to compute scores for 10 classes

訓練的方式請使用 stochastic gradient descent with Nesterov momentum 0.9。

大家在這裡不用做 hyperparameter tuning，但是在不做 tuning 的情況下，模型應該還是能在一個 epoch 之內達到 55% 以上的準確率。

In [None]:
model = None
optimizer = None

################################################################################
# TODO: Rewrite the 2-layer ConvNet with bias from Part III with the           #
# Sequential API.                                                              #
################################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
pass
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
################################################################################
#                                 END OF YOUR CODE                             
################################################################################

train_part34(model, optimizer, epochs=3)

# Section V. CIFAR-10 open-ended challenge

最後這個章節是自由發揮題！請大家絞盡腦汁（以及 Google 的 GPU），使用 `nn.Module` 或是 `nn.Sequential` API 設計出一套 CNN 進行訓練，在十個 epoch 之內達到 70% 以上的 CIFAR-10 validation accuracy！上方的 check_accuracy 與 training 函數都可以使用。

請參考官方的 API 說明書：

* Layers in torch.nn package: http://pytorch.org/docs/stable/nn.html
* Activations: http://pytorch.org/docs/stable/nn.html#non-linear-activations
* Loss functions: http://pytorch.org/docs/stable/nn.html#loss-functions
* Optimizers: http://pytorch.org/docs/stable/optim.html

### Things you might try:
- **Filter size**: 上面的 CNN 使用的是 5x5 的 filter。
- **Number of filters**: 上面的 filter 數目為 32。
- **Pooling vs Strided Convolution**: Max pooling 和 strided convolutions 哪個效果會比較好呢？
- **Batch normalization**: 大家可以在 convolution layer 之後附加 spatial batch normalization，affine layer 之後附加 vanilla batch normalization。這樣的網路架構會不會跑得比較快？
- **Network architecture**: 深度網路會不會比較強大呢？大家可以試試看：
    - [conv-relu-pool] x N -> [affine] x M -> [softmax or SVM]
    - [conv-relu-conv-relu-pool] x N -> [affine] x M -> [softmax or SVM]
    - [batchnorm-relu-conv] x N -> [affine] x M -> [softmax or SVM]
- **Global Average Pooling**: 一般的 CNN 會在 convolution 結束後做 flattening 然後進入 affine layers。另外一種做法是在 convolution 結束後使用 global average pooling 取得一個 1x1 的 average image（形狀為 (1, 1 , Filter#)），然後 reshape 成長度為 Filter# 的向量。大家可以參考 [Google 的 Inception Network](https://arxiv.org/abs/1512.00567)（see Table 1）。
- **Regularization**: 大家可以使用 L2 regularization loss 或是 Dropout。

### Tips for training
記得要調整 learning rate 等 hyperparameters，找出最好的數值。Tuning 的過程應注意：

- 好的 hyperparameter 數值應該在一千個 iteration 以內見效。
- 記得使用 coarse-to-fine tuning：
    - 先進行粗調，不要訓練太久，不好的 hyperparameter 可以直接略過。
    - 找到適當的範圍後再進行微調，訓練更多遍。
- Hyperparameter tuning 應該使用 validation set 而不是 test set！後者是留到最後測試最好的模型使用的。

### Going above and beyond
大家如果有興趣，可以自行撰寫程式支援進階的功能！

- Alternative optimizers: 使用 Adam、Adagrad、RMSprop 等學習模式。
- Alternative activation functions：使用 leaky ReLU、parametric ReLU、ELU、MaxOut 等激勵函數。
- Model ensembles
- Data augmentation
- New architectures ([see this blog](https://chatbotslife.com/resnets-highwaynets-and-densenets-oh-my-9bb15918ee32))
  - [ResNets](https://arxiv.org/abs/1512.03385)：將前一層的 input 導入下一層。
  - [DenseNets](https://arxiv.org/abs/1608.06993)：將前面所有 layer 的 input 都導入下一層。

### Have fun and happy training! 

In [None]:
################################################################################
# TODO:                                                                        #         
# Experiment with any architectures, optimizers, and hyperparameters.          #
# Achieve AT LEAST 70% accuracy on the *validation set* within 10 epochs.      #
#                                                                              #
# Note that you can use the check_accuracy function to evaluate on either      #
# the test set or the validation set, by passing either loader_test or         #
# loader_val as the second argument to check_accuracy. You should not touch    #
# the test set until you have finished your architecture and  hyperparameter   #
# tuning, and only run the test set once at the end to report a final value.   #
################################################################################
model = None
optimizer = None

# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
################################################################################
#                                 END OF YOUR CODE                             
################################################################################

# You should get at least 70% accuracy
train_part34(model, optimizer, epochs=10)

## Describe what you did 

請敘述您採取的策略。

## Answer

[FILL THIS IN]

## Test set -- run this only once

請將最好的模型儲存於 `best_model`，並使用 test set 做測試。下方的 test accuracy 跟上方的 validation accuracy 有何關係？

In [None]:
best_model = model
check_accuracy_part34(loader_test, best_model)

---
# IMPORTANT

恭喜大家完成作業！**請開啟資料夾的分享功能，並將共用連結填寫在 stanCode 作業繳交表單內！**