Homework 13 - Network Compression
===

> Author: Arvin Liu (r09922071@ntu.edu.tw), this colab is modified from ML2021-HW3

If you have any questions, feel free to ask: ntu-ml-2021spring-ta@googlegroups.com

## **Intro**

HW13 is about network compression

There are many types of Network/Model Compression,  here we introduce two:
* Knowledge Distillation
* Design Architecture


The process of this notebook is as follows: <br/>
1. Introduce depthwise, pointwise and group convolution in MobileNet.
2. Design the model of this colab
3. Introduce Knowledge-Distillation
4. Set up TeacherNet and it would be helpful in training


## **About the Dataset**  *(same as HW3)*

The dataset used here is food-11, a collection of food images in 11 classes.

For the requirement in the homework, TAs slightly modified the data.
Please DO NOT access the original fully-labeled training data or testing labels.

Also, the modified dataset is for this course only, and any further distribution or commercial use is forbidden.

In [None]:
! /opt/bin/nvidia-smi

Fri Jul  2 02:08:29 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   49C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [None]:
from google.colab import drive
import os
drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [None]:
### This block is same as HW3 ###
# Download the dataset
# You may choose where to download the data.

# Google Drive
# !gdown --id '1awF7pZ9Dz7X1jn1_QAiKN-_v56veCEKy' --output food-11.zip
!gdown --id '1raqwXnH2DXJSEh99gn64WdqY0LPrgFS3' --output food-11.zip
# If you cannot successfully gdown, you can change a link. (Backup link is provided at the bottom of this colab tutorial).

# Dropbox
# !wget https://www.dropbox.com/s/m9q6273jl3djall/food-11.zip -O food-11.zip

# MEGA
# !sudo apt install megatools
# !megadl "https://mega.nz/#!zt1TTIhK!ZuMbg5ZjGWzWX1I6nEUbfjMZgCmAgeqJlwDkqdIryfg"

# Unzip the dataset.
# This may take some time.
!unzip -q food-11.zip

Downloading...
From: https://drive.google.com/uc?id=1raqwXnH2DXJSEh99gn64WdqY0LPrgFS3
To: /content/food-11.zip
963MB [00:07, 123MB/s]


## **Import Packages**  *(same as HW3)*

First, we need to import packages that will be used later.

In this homework, we highly rely on **torchvision**, a library of PyTorch.

In [None]:
### This block is same as HW3 ###
# Import necessary packages.
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch
import torchvision.transforms as transforms
import torchvision.models as models

from PIL import Image
# "ConcatDataset" and "Subset" are possibly useful when doing semi-supervised learning.
from torch.utils.data import ConcatDataset, DataLoader, Subset
from torchvision.datasets import DatasetFolder

# This is for the progress bar.
from tqdm.auto import tqdm

## **Dataset, Data Loader, and Transforms** *(similar to HW3)*

Torchvision provides lots of useful utilities for image preprocessing, data wrapping as well as data augmentation.

Here, since our data are stored in folders by class labels, we can directly apply **torchvision.datasets.DatasetFolder** for wrapping data without much effort.

Please refer to [PyTorch official website](https://pytorch.org/vision/stable/transforms.html) for details about different transforms.

---
**The only diffference with HW3 is that the transform functions are different.**

In [None]:
### This block is similar to HW3 ###
# It is important to do data augmentation in training.
# However, not every augmentation is useful.
# Please think about what kind of augmentation is helpful for food recognition.

train_tfm = transforms.Compose([
  # Resize the image into a fixed shape (height = width = 142)
	transforms.Resize((142, 142)),
  transforms.RandomHorizontalFlip(),
  transforms.RandomRotation(15),
  transforms.RandomAffine(degrees=30, translate=(0, 0.2), scale=(0.9, 1), shear=(6, 9), fillcolor=(255,255,255)),
  transforms.ColorJitter(brightness=(0.6, 1.4),contrast=(0.6, 1.4),saturation=(0.6, 1.4)),
	transforms.RandomCrop(128),
	transforms.ToTensor(),
])

# We don't need augmentations in testing and validation.
# All we need here is to resize the PIL image and transform it into Tensor.
test_tfm = transforms.Compose([
    # Resize the image into a fixed shape (height = width = 142)
    transforms.Resize((142, 142)),
    transforms.CenterCrop(128),
    transforms.ToTensor(),
])


  "Argument fillcolor is deprecated and will be removed since v0.10.0. Please, use fill instead"


In [None]:
### This block is similar to HW3 ###
# Batch size for training, validation, and testing.
# A greater batch size usually gives a more stable gradient.
# But the GPU memory is limited, so please adjust it carefully.
batch_size = 64

# Construct datasets.
# The argument "loader" tells how torchvision reads the data.
train_set = DatasetFolder("food-11/training/labeled", loader=lambda x: Image.open(x), extensions="jpg", transform=train_tfm)
valid_set = DatasetFolder("food-11/validation", loader=lambda x: Image.open(x), extensions="jpg", transform=test_tfm)
unlabeled_set = DatasetFolder("food-11/training/unlabeled", loader=lambda x: Image.open(x), extensions="jpg", transform=train_tfm)
test_set = DatasetFolder("food-11/testing", loader=lambda x: Image.open(x), extensions="jpg", transform=test_tfm)

# Construct data loaders.
train_loader = DataLoader(train_set, batch_size=batch_size, shuffle=True, num_workers=2, pin_memory=True)
valid_loader = DataLoader(valid_set, batch_size=batch_size, shuffle=True, num_workers=2, pin_memory=True)
test_loader = DataLoader(test_set, batch_size=batch_size, shuffle=False)

# **Architecture / Model Design**
The following are types of convolution layer design that has fewer parameters.

## **Depthwise & Pointwise Convolution**
![](https://i.imgur.com/FBgcA0s.png)
> Blue: the connection between layers \
> Green: the expansion of **receptive field** \
> (reference: arxiv:1810.04231)

(a) normal convolution layer: It is fully connected. The difference between fully connected layer and fully connected convolution layer is the operation. (multiply --> convolution)

(b) Depthwise convolution layer(DW): You can consider each feature map pass through their own filter and then pass through pointwise convolution layer(PW) to combine the information of all pixels in feature maps.


(c) Group convolution layer(GC): Group the feature maps. Each group passes their filter then concate together. If group_size = input_feature_size, then GC becomes DC (channels are independent). If group_size = 1, then GC becomes fully connected.

<img src="https://i.imgur.com/Hqhg0Q9.png" width="500px">


## **Implementation details**
```python
# Regular Convolution, # of params = in_chs * out_chs * kernel_size^2
nn.Conv2d(in_chs, out_chs, kernel_size, stride, padding)

# Group Convolution, "groups" controls the connections between inputs and
# outputs. in_chs and out_chs must both be divisible by groups.
nn.Conv2d(in_chs, out_chs, kernel_size, stride, padding, groups=groups)

# Depthwise Convolution, out_chs=in_chs=groups, # of params = in_chs * kernel_size^2
nn.Conv2d(in_chs, out_chs=in_chs, kernel_size, stride, padding, groups=in_chs)

# Pointwise Convolution, a.k.a 1 by 1 convolution, # of params = in_chs * out_chs
nn.Conv2d(in_chs, out_chs, 1)

# Merge Depthwise and Pointwise Convolution (without )
def dwpw_conv(in_chs, out_chs, kernel_size, stride, padding):
    return nn.Sequential(
        nn.Conv2d(in_chs, in_chs, kernels, stride, padding, groups=in_chs),
        nn.Conv2d(in_chs, out_chs, 1),
    )
```

## **Model**

The basic model here is simply a stack of convolutional layers followed by some fully-connected layers. You can take advatage of depthwise & pointwise convolution to make your model deeper, but still follow the size constraint.

In [None]:
def dwpw_conv(in_chs, out_chs, kernel_size, stride=1, padding=1):
    return nn.Sequential(
        nn.Conv2d(in_chs, in_chs, kernel_size, stride, padding, groups=in_chs),
        nn.Conv2d(in_chs, out_chs, 1),
    )

class StudentNet(nn.Module):
    def __init__(self):
      super(StudentNet, self).__init__()

      # ---------- TODO ----------
      # Modify your model architecture

      self.cnn = nn.Sequential(
        # nn.Conv2d(3, 32, 3),
        # nn.BatchNorm2d(32),
        # nn.ReLU(),
        dwpw_conv(3, 64, 3),
        nn.BatchNorm2d(64),
        nn.ReLU(),
        
        # nn.Conv2d(32, 32, 3),
        # nn.BatchNorm2d(32),
        # nn.ReLU(),
        # nn.MaxPool2d(2, 2, 0),     
        dwpw_conv(64, 128, 3),
        nn.BatchNorm2d(128),
        nn.ReLU(),
        nn.MaxPool2d(2, 2, 0),

        # nn.Conv2d(32, 64, 3), 
        # nn.BatchNorm2d(64),
        # nn.ReLU(),
        # nn.MaxPool2d(2, 2, 0),
        dwpw_conv(128, 128, 3),
        nn.BatchNorm2d(128),
        nn.ReLU(),    
        nn.MaxPool2d(2, 2, 0),

        # nn.Conv2d(64, 100, 3), 
        # nn.BatchNorm2d(100),
        # nn.ReLU(),
        # nn.MaxPool2d(2, 2, 0),
        dwpw_conv(128, 256, 3),  
        nn.BatchNorm2d(256),
        nn.ReLU(),   
        nn.MaxPool2d(2, 2, 0),


        # Here we adopt Global Average Pooling for various input size.
        nn.AdaptiveAvgPool2d((1, 1)),
      )
      self.fc = nn.Sequential(
        nn.Linear(256, 11),
      )
      
    def forward(self, x):
      out = self.cnn(x)
      out = out.view(out.size()[0], -1)
      return self.fc(out)


## **Model Analysis**

Use `torchsummary` to get your model architecture (screenshot or pasting text are allowed.) and numbers of 
parameters, these two information should be submit to your NTU Cool questions.

Note that the number of parameters **should not greater than 100,000**, or you'll get penalty in this homework.


In [None]:
from torchsummary import summary

student_net = StudentNet()
summary(student_net, (3, 128, 128), device="cpu")

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1          [-1, 3, 128, 128]              30
            Conv2d-2         [-1, 64, 128, 128]             256
       BatchNorm2d-3         [-1, 64, 128, 128]             128
              ReLU-4         [-1, 64, 128, 128]               0
            Conv2d-5         [-1, 64, 128, 128]             640
            Conv2d-6        [-1, 128, 128, 128]           8,320
       BatchNorm2d-7        [-1, 128, 128, 128]             256
              ReLU-8        [-1, 128, 128, 128]               0
         MaxPool2d-9          [-1, 128, 64, 64]               0
           Conv2d-10          [-1, 128, 64, 64]           1,280
           Conv2d-11          [-1, 128, 64, 64]          16,512
      BatchNorm2d-12          [-1, 128, 64, 64]             256
             ReLU-13          [-1, 128, 64, 64]               0
        MaxPool2d-14          [-1, 128,

  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)


## **Knowledge Distillation**

<img src="https://i.imgur.com/H2aF7Rv.png=100x" width="500px">

Since we have a learned big model, let it teach the other small model. In implementation, let the training target be the prediction of big model instead of the ground truth.

## **Why it works?**
* If the data is not clean, then the prediction of big model could ignore the noise of the data with wrong labeled.
* The labels might have some relations. Number 8 is more similar to 6, 9, 0 than 1, 7, for example.


## **How to implement?**
* $Loss = \alpha T^2 \times KL(\frac{\text{Teacher's Logits}}{T} || \frac{\text{Student's Logits}}{T}) + (1-\alpha)(\text{Original Loss})$
* Note that the logits here should have passed softmax.

In [None]:
def loss_fn_kd(outputs, labels, teacher_outputs, T=1.5 ,alpha=0.5):
    # hard_loss = F.cross_entropy(outputs, labels) * (1. - alpha)
    hard_loss = F.cross_entropy(F.softmax(outputs,dim=1), labels) * (1. - alpha)
    # ---------- TODO ----------
    # Complete soft loss in knowledge distillation
    # soft_loss = 0 
    soft_loss = nn.KLDivLoss()(F.log_softmax(outputs/T, dim=1), F.softmax(teacher_outputs/T, dim=1)) * (alpha * T * T)
    
    return hard_loss + soft_loss

# reference: https://github.com/peterliht/knowledge-distillation-pytorch/issues/10

## **Teacher Model Setting**
We provide a well-trained teacher model to help you knowledge distillation to student model.
Note that if you want to change the transform function, you should consider  if suitable for this well-trained teacher model.
* If you cannot successfully gdown, you can change a link. (Backup link is provided at the bottom of this colab tutorial).


In [None]:
# Download teacherNet
!gdown --id '1gb8kS_AV-yLAgChB7zSlbUhg_4uPAOMK' --output teacher_net.ckpt
# Load teacherNet
teacher_net = torch.load('./teacher_net.ckpt')
teacher_net.eval()

Downloading...
From: https://drive.google.com/uc?id=1gb8kS_AV-yLAgChB7zSlbUhg_4uPAOMK
To: /content/teacher_net.ckpt
44.8MB [00:00, 123MB/s] 


ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
  

## **Generate Pseudo Labels in Unlabeled Data**

Since we have a well-trained model, we can use this model to predict pseudo-labels and help the student network train well. Note that you 
**CANNOT** use well-trained model to pseudo-label the test data. 


---

**AGAIN, DO NOT USE TEST DATA FOR PURPOSE OTHER THAN INFERENCING**

* Because If you use teacher network to predict pseudo-labels of the test data, you can only use student network to overfit these pseudo-labels without train/unlabeled data. In this way, your kaggle accuracy will be as high as the teacher network, but the fact is that you just overfit the test data and your true testing accuracy is very low. 
* These contradict the purpose of these assignment (network compression); therefore, you should not misuse the test data.
* If you have any concerns, you can email us.


In [None]:
# "cuda" only when GPUs are available.
device = "cuda" if torch.cuda.is_available() else "cpu"

# Initialize a model, and put it on the device specified.
student_net = student_net.to(device)
teacher_net = teacher_net.to(device)

# Whether to do pseudo label.
do_semi = True

def get_pseudo_labels(dataset, model):
    loader = DataLoader(dataset, batch_size=batch_size*3, shuffle=False, pin_memory=True)
    pseudo_labels = []
    for batch in tqdm(loader):
        # A batch consists of image data and corresponding labels.
        img, _ = batch

        # Forward the data
        # Using torch.no_grad() accelerates the forward process.
        with torch.no_grad():
            logits = model(img.to(device))
            pseudo_labels.append(logits.argmax(dim=-1).detach().cpu())
        # Obtain the probability distributions by applying softmax on logits.
    pseudo_labels = torch.cat(pseudo_labels)
    # Update the labels by replacing with pseudo labels.
    for idx, ((img, _), pseudo_label) in enumerate(zip(dataset.samples, pseudo_labels)):
        dataset.samples[idx] = (img, pseudo_label.item())
    return dataset

if do_semi:
    # Generate new trainloader with unlabeled set.
    unlabeled_set = get_pseudo_labels(unlabeled_set, teacher_net)
    concat_dataset = ConcatDataset([train_set, unlabeled_set])
    train_loader = DataLoader(concat_dataset, batch_size=batch_size, shuffle=True, pin_memory=True, drop_last=True)




HBox(children=(FloatProgress(value=0.0, max=36.0), HTML(value='')))




## **Training** *(similar to HW3)*

You can finish supervised learning by simply running the provided code without any modification.

The function "get_pseudo_labels" is used for semi-supervised learning.
It is expected to get better performance if you use unlabeled data for semi-supervised learning.
However, you have to implement the function on your own and need to adjust several hyperparameters manually.

For more details about semi-supervised learning, please refer to [Prof. Lee's slides](https://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/semi%20(v3).pdf).

Again, please notice that utilizing external data (or pre-trained model) for training is **prohibited**.

---
**The only diffference with HW3 is that you should use loss in  knowledge distillation.**




In [None]:
# For the classification task, we use cross-entropy as the measurement of performance.
criterion = nn.CrossEntropyLoss()

# Initialize optimizer, you may fine-tune some hyperparameters such as learning rate on your own.
optimizer = torch.optim.Adam(student_net.parameters(), lr=0.0003, weight_decay=1e-5)

# The number of training epochs.
n_epochs = 150
max_valid_acc = 0
is_best = False
model_path = 'gdrive/MyDrive/ML_hw13/student.ckpt'

for epoch in range(n_epochs):
    # ---------- Training ----------
    # Make sure the model is in train mode before training.
    student_net.train()

    # These are used to record information in training.
    train_loss = []
    train_accs = []

    # Iterate the training set by batches.
    for batch in tqdm(train_loader):

        # A batch consists of image data and corresponding labels.
        imgs, labels = batch

        # Forward the data. (Make sure data and model are on the same device.)
        logits = student_net(imgs.to(device))
        # Teacher net will not be updated. And we use torch.no_grad
        # to tell torch do not retain the intermediate values
        # (which are for backpropgation) and save the memory.
        with torch.no_grad():
          soft_labels = teacher_net(imgs.to(device))
        
        # Calculate the loss in knowledge distillation method.
        loss = loss_fn_kd(logits, labels.to(device), soft_labels)

        # Gradients stored in the parameters in the previous step should be cleared out first.
        optimizer.zero_grad()

        # Compute the gradients for parameters.
        loss.backward()

        # Clip the gradient norms for stable training.
        grad_norm = nn.utils.clip_grad_norm_(student_net.parameters(), max_norm=10)

        # Update the parameters with computed gradients.
        optimizer.step()

        # Compute the accuracy for current batch.
        acc = (logits.argmax(dim=-1) == labels.to(device)).float().mean()

        # Record the loss and accuracy.
        train_loss.append(loss.item())
        train_accs.append(acc)

    # The average loss and accuracy of the training set is the average of the recorded values.
    train_loss = sum(train_loss) / len(train_loss)
    train_acc = sum(train_accs) / len(train_accs)

    # Print the information.
    print(f"[ Train | {epoch + 1:03d}/{n_epochs:03d} ] loss = {train_loss:.5f}, acc = {train_acc:.5f}")


    # ---------- Validation ----------
    # Make sure the model is in eval mode so that some modules like dropout are disabled and work normally.
    student_net.eval()

    # These are used to record information in validation.
    valid_loss = []
    valid_accs = []

    # Iterate the validation set by batches.
    for batch in tqdm(valid_loader):

        # A batch consists of image data and corresponding labels.
        imgs, labels = batch

        # We don't need gradient in validation.
        # Using torch.no_grad() accelerates the forward process.
        with torch.no_grad():
          logits = student_net(imgs.to(device))
          soft_labels = teacher_net(imgs.to(device))
        # We can still compute the loss (but not the gradient).
        loss = loss_fn_kd(logits, labels.to(device), soft_labels)

        # Compute the accuracy for current batch.
        acc = (logits.argmax(dim=-1) == labels.to(device)).float().detach().cpu().view(-1).numpy()

        # Record the loss and accuracy.
        valid_loss.append(loss.item())
        valid_accs += list(acc)

    # The average loss and accuracy for entire validation set is the average of the recorded values.
    valid_loss = sum(valid_loss) / len(valid_loss)
    valid_acc = sum(valid_accs) / len(valid_accs)

    # Print the information.
    print(f"[ Valid | {epoch + 1:03d}/{n_epochs:03d} ] loss = {valid_loss:.5f}, acc = {valid_acc:.5f}")


    if valid_acc > max_valid_acc:
      max_valid_acc = valid_acc
      print("Saving best model with valid accuracy =" + str(max_valid_acc))
      is_best = True
    

    if is_best is True:
      checkpoint = {'epoch': epoch + 1,'state_dict': student_net.state_dict(),'optimizer': optimizer.state_dict()}
      torch.save(checkpoint, model_path)
      is_best = False

HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))

  "reduction: 'mean' divides the total loss by both the batch size and the support size."



[ Train | 001/150 ] loss = 1.37362, acc = 0.21885


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 001/150 ] loss = 1.38719, acc = 0.22727
Saving best model with valid accuracy =0.22727272727272727


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 002/150 ] loss = 1.33747, acc = 0.27303


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 002/150 ] loss = 1.37121, acc = 0.23030
Saving best model with valid accuracy =0.23030303030303031


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 003/150 ] loss = 1.31918, acc = 0.29931


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 003/150 ] loss = 1.35518, acc = 0.26212
Saving best model with valid accuracy =0.26212121212121214


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 004/150 ] loss = 1.30841, acc = 0.31767


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 004/150 ] loss = 1.36069, acc = 0.27424
Saving best model with valid accuracy =0.27424242424242423


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 005/150 ] loss = 1.29903, acc = 0.32975


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 005/150 ] loss = 1.35326, acc = 0.28030
Saving best model with valid accuracy =0.2803030303030303


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 006/150 ] loss = 1.29401, acc = 0.33989


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 006/150 ] loss = 1.32716, acc = 0.30152
Saving best model with valid accuracy =0.3015151515151515


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 007/150 ] loss = 1.28466, acc = 0.35298


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 007/150 ] loss = 1.33849, acc = 0.29394


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 008/150 ] loss = 1.27663, acc = 0.36262


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 008/150 ] loss = 1.31652, acc = 0.34091
Saving best model with valid accuracy =0.3409090909090909


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 009/150 ] loss = 1.27039, acc = 0.37835


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 009/150 ] loss = 1.34760, acc = 0.30909


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 010/150 ] loss = 1.26343, acc = 0.38677


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 010/150 ] loss = 1.32147, acc = 0.34394
Saving best model with valid accuracy =0.34393939393939393


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 011/150 ] loss = 1.25783, acc = 0.39742


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 011/150 ] loss = 1.32258, acc = 0.32576


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 012/150 ] loss = 1.24896, acc = 0.41122


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 012/150 ] loss = 1.32908, acc = 0.31515


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 013/150 ] loss = 1.24876, acc = 0.40453


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 013/150 ] loss = 1.29344, acc = 0.38182
Saving best model with valid accuracy =0.38181818181818183


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 014/150 ] loss = 1.24009, acc = 0.42319


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 014/150 ] loss = 1.28647, acc = 0.35606


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 015/150 ] loss = 1.23677, acc = 0.42370


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 015/150 ] loss = 1.30653, acc = 0.34848


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 016/150 ] loss = 1.23248, acc = 0.42786


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 016/150 ] loss = 1.25729, acc = 0.39545
Saving best model with valid accuracy =0.39545454545454545


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 017/150 ] loss = 1.22868, acc = 0.43841


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 017/150 ] loss = 1.29773, acc = 0.36364


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 018/150 ] loss = 1.22501, acc = 0.43902


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 018/150 ] loss = 1.26070, acc = 0.40606
Saving best model with valid accuracy =0.40606060606060607


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 019/150 ] loss = 1.21800, acc = 0.44897


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 019/150 ] loss = 1.25183, acc = 0.41515
Saving best model with valid accuracy =0.41515151515151516


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 020/150 ] loss = 1.21512, acc = 0.44988


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 020/150 ] loss = 1.25983, acc = 0.41061


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 021/150 ] loss = 1.21108, acc = 0.45617


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 021/150 ] loss = 1.25101, acc = 0.44091
Saving best model with valid accuracy =0.4409090909090909


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 022/150 ] loss = 1.20794, acc = 0.46236


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 022/150 ] loss = 1.25916, acc = 0.40606


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 023/150 ] loss = 1.20296, acc = 0.46804


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 023/150 ] loss = 1.22954, acc = 0.44545
Saving best model with valid accuracy =0.44545454545454544


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 024/150 ] loss = 1.20118, acc = 0.47514


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 024/150 ] loss = 1.26318, acc = 0.40606


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 025/150 ] loss = 1.19831, acc = 0.47616


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 025/150 ] loss = 1.25444, acc = 0.42424


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 026/150 ] loss = 1.19583, acc = 0.47961


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 026/150 ] loss = 1.26013, acc = 0.40303


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 027/150 ] loss = 1.19162, acc = 0.48671


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 027/150 ] loss = 1.22841, acc = 0.43939


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 028/150 ] loss = 1.19156, acc = 0.48397


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 028/150 ] loss = 1.24920, acc = 0.42121


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 029/150 ] loss = 1.18372, acc = 0.49330


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 029/150 ] loss = 1.22361, acc = 0.46364
Saving best model with valid accuracy =0.4636363636363636


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 030/150 ] loss = 1.18087, acc = 0.49655


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 030/150 ] loss = 1.21087, acc = 0.48636
Saving best model with valid accuracy =0.4863636363636364


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 031/150 ] loss = 1.17739, acc = 0.50568


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 031/150 ] loss = 1.23846, acc = 0.42121


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 032/150 ] loss = 1.17937, acc = 0.50304


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 032/150 ] loss = 1.20988, acc = 0.46364


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 033/150 ] loss = 1.17603, acc = 0.50244


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 033/150 ] loss = 1.22601, acc = 0.44091


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 034/150 ] loss = 1.17440, acc = 0.50446


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 034/150 ] loss = 1.24625, acc = 0.43485


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 035/150 ] loss = 1.17258, acc = 0.50670


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 035/150 ] loss = 1.20311, acc = 0.48939
Saving best model with valid accuracy =0.4893939393939394


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 036/150 ] loss = 1.17070, acc = 0.51218


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 036/150 ] loss = 1.20029, acc = 0.50455
Saving best model with valid accuracy =0.5045454545454545


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 037/150 ] loss = 1.16552, acc = 0.52344


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 037/150 ] loss = 1.21901, acc = 0.45909


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 038/150 ] loss = 1.16302, acc = 0.52466


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 038/150 ] loss = 1.19665, acc = 0.49697


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 039/150 ] loss = 1.16336, acc = 0.52597


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 039/150 ] loss = 1.20249, acc = 0.48182


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 040/150 ] loss = 1.16015, acc = 0.52628


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 040/150 ] loss = 1.21336, acc = 0.47424


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 041/150 ] loss = 1.15572, acc = 0.52780


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 041/150 ] loss = 1.20856, acc = 0.47727


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 042/150 ] loss = 1.15828, acc = 0.53064


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 042/150 ] loss = 1.18371, acc = 0.50152


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 043/150 ] loss = 1.15400, acc = 0.53247


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 043/150 ] loss = 1.19587, acc = 0.50152


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 044/150 ] loss = 1.15263, acc = 0.53328


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 044/150 ] loss = 1.20475, acc = 0.49091


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 045/150 ] loss = 1.15069, acc = 0.53856


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 045/150 ] loss = 1.20252, acc = 0.49697


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 046/150 ] loss = 1.14956, acc = 0.54140


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 046/150 ] loss = 1.18160, acc = 0.52576
Saving best model with valid accuracy =0.5257575757575758


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 047/150 ] loss = 1.15226, acc = 0.53308


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 047/150 ] loss = 1.18773, acc = 0.50909


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 048/150 ] loss = 1.14799, acc = 0.53764


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 048/150 ] loss = 1.17557, acc = 0.49848


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 049/150 ] loss = 1.14613, acc = 0.54779


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 049/150 ] loss = 1.18486, acc = 0.51818


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 050/150 ] loss = 1.14339, acc = 0.55225


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 050/150 ] loss = 1.18384, acc = 0.51364


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 051/150 ] loss = 1.14000, acc = 0.55367


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 051/150 ] loss = 1.15893, acc = 0.53788
Saving best model with valid accuracy =0.5378787878787878


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 052/150 ] loss = 1.13735, acc = 0.55712


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 052/150 ] loss = 1.17147, acc = 0.52576


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 053/150 ] loss = 1.14275, acc = 0.54819


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 053/150 ] loss = 1.18028, acc = 0.51364


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 054/150 ] loss = 1.13585, acc = 0.55591


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 054/150 ] loss = 1.16286, acc = 0.54394
Saving best model with valid accuracy =0.543939393939394


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 055/150 ] loss = 1.13744, acc = 0.55459


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 055/150 ] loss = 1.18418, acc = 0.50909


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 056/150 ] loss = 1.13536, acc = 0.55763


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 056/150 ] loss = 1.17134, acc = 0.55000
Saving best model with valid accuracy =0.55


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 057/150 ] loss = 1.13146, acc = 0.56311


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 057/150 ] loss = 1.20505, acc = 0.49848


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 058/150 ] loss = 1.13236, acc = 0.56047


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 058/150 ] loss = 1.16548, acc = 0.52121


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 059/150 ] loss = 1.12936, acc = 0.56717


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 059/150 ] loss = 1.16968, acc = 0.53788


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 060/150 ] loss = 1.12968, acc = 0.56686


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 060/150 ] loss = 1.18014, acc = 0.52424


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 061/150 ] loss = 1.12854, acc = 0.57062


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 061/150 ] loss = 1.16308, acc = 0.53333


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 062/150 ] loss = 1.12721, acc = 0.56849


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 062/150 ] loss = 1.16405, acc = 0.54242


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 063/150 ] loss = 1.12601, acc = 0.56676


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 063/150 ] loss = 1.19539, acc = 0.51515


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 064/150 ] loss = 1.12307, acc = 0.57397


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 064/150 ] loss = 1.21842, acc = 0.46667


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 065/150 ] loss = 1.12213, acc = 0.57315


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 065/150 ] loss = 1.20091, acc = 0.48030


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 066/150 ] loss = 1.12346, acc = 0.57031


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 066/150 ] loss = 1.17052, acc = 0.53030


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 067/150 ] loss = 1.11989, acc = 0.57721


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 067/150 ] loss = 1.15579, acc = 0.56061
Saving best model with valid accuracy =0.5606060606060606


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 068/150 ] loss = 1.12072, acc = 0.57457


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 068/150 ] loss = 1.19497, acc = 0.50455


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 069/150 ] loss = 1.11911, acc = 0.57894


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 069/150 ] loss = 1.20826, acc = 0.48333


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 070/150 ] loss = 1.11943, acc = 0.58076


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 070/150 ] loss = 1.16481, acc = 0.55000


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 071/150 ] loss = 1.11861, acc = 0.57610


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 071/150 ] loss = 1.16793, acc = 0.54697


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 072/150 ] loss = 1.11750, acc = 0.57741


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 072/150 ] loss = 1.15207, acc = 0.53485


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 073/150 ] loss = 1.11442, acc = 0.58300


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 073/150 ] loss = 1.19034, acc = 0.52424


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 074/150 ] loss = 1.11439, acc = 0.58482


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 074/150 ] loss = 1.13836, acc = 0.57879
Saving best model with valid accuracy =0.5787878787878787


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 075/150 ] loss = 1.11219, acc = 0.58858


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 075/150 ] loss = 1.18461, acc = 0.52727


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 076/150 ] loss = 1.11086, acc = 0.58502


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 076/150 ] loss = 1.14737, acc = 0.56364


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 077/150 ] loss = 1.11301, acc = 0.58726


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 077/150 ] loss = 1.15897, acc = 0.53333


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 078/150 ] loss = 1.11204, acc = 0.58787


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 078/150 ] loss = 1.15516, acc = 0.53333


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 079/150 ] loss = 1.11001, acc = 0.59000


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 079/150 ] loss = 1.15385, acc = 0.55000


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 080/150 ] loss = 1.10950, acc = 0.58878


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 080/150 ] loss = 1.17798, acc = 0.53030


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 081/150 ] loss = 1.10527, acc = 0.59598


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 081/150 ] loss = 1.18139, acc = 0.53485


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 082/150 ] loss = 1.10775, acc = 0.59142


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 082/150 ] loss = 1.16901, acc = 0.54545


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 083/150 ] loss = 1.10771, acc = 0.59162


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 083/150 ] loss = 1.13846, acc = 0.56667


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 084/150 ] loss = 1.10444, acc = 0.59497


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 084/150 ] loss = 1.20395, acc = 0.49242


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 085/150 ] loss = 1.10551, acc = 0.59213


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 085/150 ] loss = 1.18526, acc = 0.51061


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 086/150 ] loss = 1.10707, acc = 0.59081


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 086/150 ] loss = 1.13562, acc = 0.56970


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 087/150 ] loss = 1.10285, acc = 0.59801


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 087/150 ] loss = 1.13923, acc = 0.58030
Saving best model with valid accuracy =0.5803030303030303


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 088/150 ] loss = 1.10149, acc = 0.59547


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 088/150 ] loss = 1.14155, acc = 0.56061


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 089/150 ] loss = 1.10250, acc = 0.59771


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 089/150 ] loss = 1.15884, acc = 0.55303


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 090/150 ] loss = 1.09928, acc = 0.60400


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 090/150 ] loss = 1.14277, acc = 0.56818


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 091/150 ] loss = 1.10209, acc = 0.59730


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 091/150 ] loss = 1.13862, acc = 0.57121


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 092/150 ] loss = 1.10225, acc = 0.59984


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 092/150 ] loss = 1.16286, acc = 0.54242


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 093/150 ] loss = 1.10071, acc = 0.59933


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 093/150 ] loss = 1.14763, acc = 0.54242


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 094/150 ] loss = 1.09734, acc = 0.60755


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 094/150 ] loss = 1.14933, acc = 0.55455


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 095/150 ] loss = 1.09820, acc = 0.60410


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 095/150 ] loss = 1.12219, acc = 0.58636
Saving best model with valid accuracy =0.5863636363636363


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 096/150 ] loss = 1.09998, acc = 0.59761


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 096/150 ] loss = 1.17804, acc = 0.52727


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 097/150 ] loss = 1.09731, acc = 0.60511


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 097/150 ] loss = 1.13566, acc = 0.56667


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 098/150 ] loss = 1.09298, acc = 0.61161


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 098/150 ] loss = 1.14397, acc = 0.57121


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 099/150 ] loss = 1.09207, acc = 0.61272


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 099/150 ] loss = 1.12556, acc = 0.59242
Saving best model with valid accuracy =0.5924242424242424


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 100/150 ] loss = 1.09040, acc = 0.61404


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 100/150 ] loss = 1.12786, acc = 0.58485


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 101/150 ] loss = 1.09403, acc = 0.60907


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 101/150 ] loss = 1.13108, acc = 0.59091


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 102/150 ] loss = 1.09308, acc = 0.61049


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 102/150 ] loss = 1.13429, acc = 0.57273


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 103/150 ] loss = 1.09263, acc = 0.60988


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 103/150 ] loss = 1.12589, acc = 0.56667


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 104/150 ] loss = 1.09164, acc = 0.61496


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 104/150 ] loss = 1.16750, acc = 0.54848


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 105/150 ] loss = 1.08922, acc = 0.61769


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 105/150 ] loss = 1.18447, acc = 0.53485


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 106/150 ] loss = 1.09058, acc = 0.60978


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 106/150 ] loss = 1.13474, acc = 0.57576


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 107/150 ] loss = 1.09059, acc = 0.61211


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 107/150 ] loss = 1.14558, acc = 0.56970


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 108/150 ] loss = 1.09139, acc = 0.61617


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 108/150 ] loss = 1.17929, acc = 0.54091


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 109/150 ] loss = 1.08581, acc = 0.62429


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 109/150 ] loss = 1.15638, acc = 0.56364


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 110/150 ] loss = 1.08759, acc = 0.61780


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 110/150 ] loss = 1.13157, acc = 0.58636


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 111/150 ] loss = 1.08706, acc = 0.61841


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 111/150 ] loss = 1.13678, acc = 0.56364


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 112/150 ] loss = 1.08563, acc = 0.61496


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 112/150 ] loss = 1.13167, acc = 0.56818


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 113/150 ] loss = 1.08452, acc = 0.62043


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 113/150 ] loss = 1.13163, acc = 0.57273


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 114/150 ] loss = 1.08616, acc = 0.61668


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 114/150 ] loss = 1.13395, acc = 0.55758


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 115/150 ] loss = 1.08324, acc = 0.62328


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 115/150 ] loss = 1.11656, acc = 0.59697
Saving best model with valid accuracy =0.5969696969696969


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 116/150 ] loss = 1.08314, acc = 0.62033


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 116/150 ] loss = 1.13390, acc = 0.58333


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 117/150 ] loss = 1.08201, acc = 0.62510


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 117/150 ] loss = 1.13080, acc = 0.58182


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 118/150 ] loss = 1.08455, acc = 0.62196


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 118/150 ] loss = 1.13585, acc = 0.58485


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 119/150 ] loss = 1.08152, acc = 0.62632


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 119/150 ] loss = 1.17920, acc = 0.52273


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 120/150 ] loss = 1.07781, acc = 0.62561


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 120/150 ] loss = 1.15297, acc = 0.54848


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 121/150 ] loss = 1.08173, acc = 0.62277


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 121/150 ] loss = 1.11782, acc = 0.59091


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 122/150 ] loss = 1.07959, acc = 0.62409


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 122/150 ] loss = 1.14230, acc = 0.56970


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 123/150 ] loss = 1.07906, acc = 0.63048


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 123/150 ] loss = 1.23598, acc = 0.46818


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 124/150 ] loss = 1.08062, acc = 0.62754


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 124/150 ] loss = 1.16076, acc = 0.55303


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 125/150 ] loss = 1.08033, acc = 0.62855


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 125/150 ] loss = 1.15022, acc = 0.56818


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 126/150 ] loss = 1.07776, acc = 0.62946


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 126/150 ] loss = 1.12857, acc = 0.58182


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 127/150 ] loss = 1.07806, acc = 0.62662


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 127/150 ] loss = 1.11088, acc = 0.58636


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 128/150 ] loss = 1.07688, acc = 0.62906


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 128/150 ] loss = 1.14072, acc = 0.58030


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 129/150 ] loss = 1.07699, acc = 0.63088


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 129/150 ] loss = 1.11260, acc = 0.60758
Saving best model with valid accuracy =0.6075757575757575


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 130/150 ] loss = 1.07484, acc = 0.63433


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 130/150 ] loss = 1.13606, acc = 0.58030


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 131/150 ] loss = 1.07460, acc = 0.63190


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 131/150 ] loss = 1.19529, acc = 0.52576


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 132/150 ] loss = 1.07458, acc = 0.63667


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 132/150 ] loss = 1.20307, acc = 0.50606


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 133/150 ] loss = 1.07661, acc = 0.63139


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 133/150 ] loss = 1.14129, acc = 0.58636


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 134/150 ] loss = 1.07335, acc = 0.63241


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 134/150 ] loss = 1.20102, acc = 0.50909


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 135/150 ] loss = 1.07181, acc = 0.64174


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 135/150 ] loss = 1.13160, acc = 0.59545


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 136/150 ] loss = 1.07380, acc = 0.63971


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 136/150 ] loss = 1.09609, acc = 0.63182
Saving best model with valid accuracy =0.6318181818181818


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 137/150 ] loss = 1.07095, acc = 0.64448


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 137/150 ] loss = 1.15678, acc = 0.55909


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 138/150 ] loss = 1.07081, acc = 0.63951


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 138/150 ] loss = 1.10796, acc = 0.61970


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 139/150 ] loss = 1.07111, acc = 0.63758


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 139/150 ] loss = 1.13092, acc = 0.59848


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 140/150 ] loss = 1.07009, acc = 0.63748


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 140/150 ] loss = 1.08728, acc = 0.63030


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 141/150 ] loss = 1.07147, acc = 0.63829


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 141/150 ] loss = 1.10067, acc = 0.62121


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 142/150 ] loss = 1.06876, acc = 0.64032


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 142/150 ] loss = 1.09141, acc = 0.61667


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 143/150 ] loss = 1.07031, acc = 0.64215


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 143/150 ] loss = 1.12303, acc = 0.58485


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 144/150 ] loss = 1.06729, acc = 0.64823


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 144/150 ] loss = 1.10503, acc = 0.61364


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 145/150 ] loss = 1.06638, acc = 0.64631


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 145/150 ] loss = 1.20365, acc = 0.50909


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 146/150 ] loss = 1.06695, acc = 0.64610


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 146/150 ] loss = 1.14629, acc = 0.58182


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 147/150 ] loss = 1.06633, acc = 0.64174


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 147/150 ] loss = 1.16987, acc = 0.54697


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 148/150 ] loss = 1.06710, acc = 0.64783


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 148/150 ] loss = 1.16206, acc = 0.54697


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 149/150 ] loss = 1.06513, acc = 0.64347


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 149/150 ] loss = 1.11773, acc = 0.60000


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 150/150 ] loss = 1.06709, acc = 0.64367


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 150/150 ] loss = 1.12327, acc = 0.58182


## **Testing** *(same as HW3)*

For inference, we need to make sure the model is in eval mode, and the order of the dataset should not be shuffled ("shuffle=False" in test_loader).

Last but not least, don't forget to save the predictions into a single CSV file.
The format of CSV file should follow the rules mentioned in the slides.

### **WARNING -- Keep in Mind**

Cheating includes but not limited to:
1.   using testing labels,
2.   submitting results to previous Kaggle competitions,
3.   sharing predictions with others,
4.   copying codes from any creatures on Earth,
5.   asking other people to do it for you.

Any violations bring you punishments from getting a discount on the final grade to failing the course.

It is your responsibility to check whether your code violates the rules.
When citing codes from the Internet, you should know what these codes exactly do.
You will **NOT** be tolerated if you break the rule and claim you don't know what these codes do.


In [None]:
### This block is same as HW3 ###
# Make sure the model is in eval mode.
# Some modules like Dropout or BatchNorm affect if the model is in training mode.
student_net.eval()

# Initialize a list to store the predictions.
predictions = []

# Iterate the testing set by batches.
for batch in tqdm(test_loader):
    # A batch consists of image data and corresponding labels.
    # But here the variable "labels" is useless since we do not have the ground-truth.
    # If printing out the labels, you will find that it is always 0.
    # This is because the wrapper (DatasetFolder) returns images and labels for each batch,
    # so we have to create fake labels to make it work normally.
    imgs, labels = batch

    # We don't need gradient in testing, and we don't even have labels to compute loss.
    # Using torch.no_grad() accelerates the forward process.
    with torch.no_grad():
        logits = student_net(imgs.to(device))

    # Take the class with greatest logit as prediction and record it.
    predictions.extend(logits.argmax(dim=-1).cpu().numpy().tolist())

HBox(children=(FloatProgress(value=0.0, max=53.0), HTML(value='')))




In [None]:
### This block is same as HW3 ###
# Save predictions into the file.
with open("predict_hw13.csv", "w") as f:

    # The first row must be "Id, Category"
    f.write("Id,Category\n")

    # For the rest of the rows, each image id corresponds to a predicted class.
    for i, pred in  enumerate(predictions):
         f.write(f"{i},{pred}\n")

## **Statistics**

|Baseline|Accuracy|Training Time|
|-|-|-|
|Simple Baseline |0.59856|2 Hours|
|Medium Baseline |0.65412|2 Hours|
|Strong Baseline |0.72819|4 Hours|
|Boss Baseline |0.81003|Unmeasueable|

## **Learning Curve**

![img](https://lh5.googleusercontent.com/amMLGa7dkqvXGmsJlrVN49VfSjClk5d-n7nCi_Y3ROK4himsBSHhB7SpdWe7Zm06ctRO77VdDkD9u_aKfAh1tMW-KcyYX7vF7LPlKqOo2fVtt3SyfsLv0KTYDB0YbAk6ZhyOIKT8Zfg)



## **Q&A**

If you have any question about this colab, please send a email to ntu-ml-2021spring-ta@googlegroups.com

## **Backup Links**

In [None]:
# resnet_model 
# !gdown --id '1zH1x39Y8a0XyOORG7TWzAnFf_YPY8e-m' --output resnet_model.ckpt
# !gdown --id '1VBIeQKH4xRHfToUxuDxtEPsqz0MHvrgd' --output resnet_model.ckpt
# !gdown --id '1Er2azErvXWS5m1jboKN7BLxNXnuAatYw' --output resnet_model.ckpt
# !gdown --id '1Qya0vmf3nRl11IyxxF7nudDpZI_Q4Amh' --output resnet_model.ckpt
# !gdown --id '1fGOOb5ndljraBIkRkLp3bW9orR4YN97U' --output resnet_model.ckpt
# !gdown --id '1apHLvZBZ3GYEMxXxToGKF7qDLn1XbOfJ' --output resnet_model.ckpt
# !gdown --id '1vsDylNsLaAqxonop7Mw3dBAig0EO7tlF' --output resnet_model.ckpt
# !gdown --id '1V_hXJM_V9-10i6wldRyl0SOiivPp4SNt' --output resnet_model.ckpt
# !gdown --id '11HzaJM2M2yg6KYhLaWpWy8WmPIIvJgnk' --output resnet_model.ckpt

# food-11
# !gdown --id '1qdyNN0Ek4S5yi-pAqHes1yjj5cNkENCc' --output food-11.zip
# !gdown --id '1c0Q1EP6yIx0O2rqVMIVInIt8wFjLxmRh' --output food-11.zip
# !gdown --id '1hKO054nT1R8egcXY2-tgQbwX4EjowRLz' --output food-11.zip
# !gdown --id '1_7_uC1WUvX6H51gQaYmI4q3AezdQJhud' --output food-11.zip
# !gdown --id '12bz82Zpx0_7BDGXq4nRt7E_fMFmILoc9' --output food-11.zip
# !gdown --id '1oiqRKrDQXVBM5y63MeEaHxFmCIzNXx1Q' --output food-11.zip
# !gdown --id '1qaL43sl4qUMeCT1OVpk4aOFycnLL5ZJX' --output food-11.zip

## Reference:
* Knowledge distillation: https://github.com/peterliht/knowledge-distillation-pytorch/issues/10