<a href="https://colab.research.google.com/github/Offliners/OFF/blob/main/HW13/homework13.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Homework 13 - Network Compression
===

> Author: Arvin Liu (r09922071@ntu.edu.tw), this colab is modified from ML2021-HW3

If you have any questions, feel free to ask: ntu-ml-2021spring-ta@googlegroups.com

## **Intro**

HW13 is about network compression

There are many types of Network/Model Compression,  here we introduce two:
* Knowledge Distillation
* Design Architecture


The process of this notebook is as follows: <br/>
1. Introduce depthwise, pointwise and group convolution in MobileNet.
2. Design the model of this colab
3. Introduce Knowledge-Distillation
4. Set up TeacherNet and it would be helpful in training


In [1]:
!nvidia-smi

Wed Jun 23 02:37:06 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.27       Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   63C    P8    11W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## **About the Dataset**  *(same as HW3)*

The dataset used here is food-11, a collection of food images in 11 classes.

For the requirement in the homework, TAs slightly modified the data.
Please DO NOT access the original fully-labeled training data or testing labels.

Also, the modified dataset is for this course only, and any further distribution or commercial use is forbidden.

In [3]:
### This block is same as HW3 ###
# Download the dataset
# You may choose where to download the data.

# Google Drive
!gdown --id '157WYqfKxvr0IdomE-2RQVg9C29effzyH' --output food-11.zip
# If you cannot successfully gdown, you can change a link. (Backup link is provided at the bottom of this colab tutorial).

# Dropbox
# !wget https://www.dropbox.com/s/m9q6273jl3djall/food-11.zip -O food-11.zip

# MEGA
# !sudo apt install megatools
# !megadl "https://mega.nz/#!zt1TTIhK!ZuMbg5ZjGWzWX1I6nEUbfjMZgCmAgeqJlwDkqdIryfg"

# Unzip the dataset.
# This may take some time.
!unzip -q food-11.zip

Downloading...
From: https://drive.google.com/uc?id=157WYqfKxvr0IdomE-2RQVg9C29effzyH
To: /content/food-11.zip
963MB [00:10, 88.2MB/s]


## **Import Packages**  *(same as HW3)*

First, we need to import packages that will be used later.

In this homework, we highly rely on **torchvision**, a library of PyTorch.

In [4]:
### This block is same as HW3 ###
# Import necessary packages.
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch
import torchvision.transforms as transforms
import torchvision.models as models
import math

from PIL import Image
# "ConcatDataset" and "Subset" are possibly useful when doing semi-supervised learning.
from torch.utils.data import ConcatDataset, DataLoader, Subset
from torchvision.datasets import DatasetFolder

# This is for the progress bar.
from tqdm.auto import tqdm

## **Dataset, Data Loader, and Transforms** *(similar to HW3)*

Torchvision provides lots of useful utilities for image preprocessing, data wrapping as well as data augmentation.

Here, since our data are stored in folders by class labels, we can directly apply **torchvision.datasets.DatasetFolder** for wrapping data without much effort.

Please refer to [PyTorch official website](https://pytorch.org/vision/stable/transforms.html) for details about different transforms.

---
**The only diffference with HW3 is that the transform functions are different.**

In [8]:
### This block is similar to HW3 ###
# It is important to do data augmentation in training.
# However, not every augmentation is useful.
# Please think about what kind of augmentation is helpful for food recognition.

train_tfm = transforms.Compose([
    # Resize the image into a fixed shape (height = width = 142)
	  transforms.Resize((142, 142)),
    transforms.RandomRotation(30),
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.ColorJitter(brightness=0.5, contrast=0.5, saturation=0.5, hue=0),
    transforms.RandomResizedCrop(128,scale=(0.08, 1.0)),
    transforms.ToTensor(),
])

# We don't need augmentations in testing and validation.
# All we need here is to resize the PIL image and transform it into Tensor.
test_tfm = transforms.Compose([
    # Resize the image into a fixed shape (height = width = 142)
    transforms.Resize((142, 142)),
    transforms.CenterCrop(128),
    transforms.ToTensor(),
])

In [9]:
### This block is similar to HW3 ###
# Batch size for training, validation, and testing.
# A greater batch size usually gives a more stable gradient.
# But the GPU memory is limited, so please adjust it carefully.
batch_size = 64

# Construct datasets.
# The argument "loader" tells how torchvision reads the data.
train_set = DatasetFolder("food-11/training/labeled", loader=lambda x: Image.open(x), extensions="jpg", transform=train_tfm)
valid_set = DatasetFolder("food-11/validation", loader=lambda x: Image.open(x), extensions="jpg", transform=test_tfm)
unlabeled_set = DatasetFolder("food-11/training/unlabeled", loader=lambda x: Image.open(x), extensions="jpg", transform=train_tfm)
test_set = DatasetFolder("food-11/testing", loader=lambda x: Image.open(x), extensions="jpg", transform=test_tfm)

# Construct data loaders.
train_loader = DataLoader(train_set, batch_size=batch_size, shuffle=True, num_workers=2, pin_memory=True)
valid_loader = DataLoader(valid_set, batch_size=batch_size, shuffle=True, num_workers=2, pin_memory=True)
test_loader = DataLoader(test_set, batch_size=batch_size * 4, shuffle=False)

# **Architecture / Model Design**
The following are types of convolution layer design that has fewer parameters.

## **Depthwise & Pointwise Convolution**
![](https://i.imgur.com/FBgcA0s.png)
> Blue: the connection between layers \
> Green: the expansion of **receptive field** \
> (reference: arxiv:1810.04231)

(a) normal convolution layer: It is fully connected. The difference between fully connected layer and fully connected convolution layer is the operation. (multiply --> convolution)

(b) Depthwise convolution layer(DW): You can consider each feature map pass through their own filter and then pass through pointwise convolution layer(PW) to combine the information of all pixels in feature maps.


(c) Group convolution layer(GC): Group the feature maps. Each group passes their filter then concate together. If group_size = input_feature_size, then GC becomes DC (channels are independent). If group_size = 1, then GC becomes fully connected.

<img src="https://i.imgur.com/Hqhg0Q9.png" width="500px">


## **Implementation details**
```python
# Regular Convolution, # of params = in_chs * out_chs * kernel_size^2
nn.Conv2d(in_chs, out_chs, kernel_size, stride, padding)

# Group Convolution, "groups" controls the connections between inputs and
# outputs. in_chs and out_chs must both be divisible by groups.
nn.Conv2d(in_chs, out_chs, kernel_size, stride, padding, groups=groups)

# Depthwise Convolution, out_chs=in_chs=groups, # of params = in_chs * kernel_size^2
nn.Conv2d(in_chs, out_chs=in_chs, kernel_size, stride, padding, groups=in_chs)

# Pointwise Convolution, a.k.a 1 by 1 convolution, # of params = in_chs * out_chs
nn.Conv2d(in_chs, out_chs, 1)

# Merge Depthwise and Pointwise Convolution (without )
def dwpw_conv(in_chs, out_chs, kernel_size, stride, padding):
    return nn.Sequential(
        nn.Conv2d(in_chs, in_chs, kernels, stride, padding, groups=in_chs),
        nn.Conv2d(in_chs, out_chs, 1),
    )
```

## **Model**

The basic model here is simply a stack of convolutional layers followed by some fully-connected layers. You can take advatage of depthwise & pointwise convolution to make your model deeper, but still follow the size constraint.

In [10]:
class StudentNet(nn.Module):
    def __init__(self):
      super(StudentNet, self).__init__()

      # ---------- TODO ----------
      # Modify your model architecture

      self.cnn = nn.Sequential(
        nn.Conv2d(3, 32, 3), 
        nn.BatchNorm2d(32),
        nn.ReLU(),
        nn.Conv2d(32, 32, 3),  
        nn.BatchNorm2d(32),
        nn.ReLU(),
        nn.MaxPool2d(2, 2, 0),     

        nn.Conv2d(32, 64, 3), 
        nn.BatchNorm2d(64),
        nn.ReLU(),
        nn.MaxPool2d(2, 2, 0),     

        nn.Conv2d(64, 100, 3), 
        nn.BatchNorm2d(100),
        nn.ReLU(),
        nn.MaxPool2d(2, 2, 0),
        
        # Here we adopt Global Average Pooling for various input size.
        nn.AdaptiveAvgPool2d((1, 1)),
      )
      self.fc = nn.Sequential(
        nn.Linear(100, 11),
      )
      
    def forward(self, x):
      out = self.cnn(x)
      out = out.view(out.size()[0], -1)
      return self.fc(out)

def conv_bn(inp, oup, stride):
    return nn.Sequential(
        nn.Conv2d(inp, oup, 3, stride, 1, bias=False),
        nn.BatchNorm2d(oup),
        nn.ReLU6(inplace=True)
    )


def conv_1x1_bn(inp, oup):
    return nn.Sequential(
        nn.Conv2d(inp, oup, 1, 1, 0, bias=False),
        nn.BatchNorm2d(oup),
        nn.ReLU6(inplace=True)
    )


def make_divisible(x, divisible_by=8):
    import numpy as np
    return int(np.ceil(x * 1. / divisible_by) * divisible_by)


class InvertedResidual(nn.Module):
    def __init__(self, inp, oup, stride, expand_ratio):
        super(InvertedResidual, self).__init__()
        self.stride = stride
        assert stride in [1, 2]

        hidden_dim = int(inp * expand_ratio)
        self.use_res_connect = self.stride == 1 and inp == oup

        if expand_ratio == 1:
            self.conv = nn.Sequential(
                # dw
                nn.Conv2d(hidden_dim, hidden_dim, 3, stride, 1, groups=hidden_dim, bias=False),
                nn.BatchNorm2d(hidden_dim),
                nn.ReLU6(inplace=True),
                # pw-linear
                nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),
                nn.BatchNorm2d(oup),
            )
        else:
            self.conv = nn.Sequential(
                # pw
                nn.Conv2d(inp, hidden_dim, 1, 1, 0, bias=False),
                nn.BatchNorm2d(hidden_dim),
                nn.ReLU6(inplace=True),
                # dw
                nn.Conv2d(hidden_dim, hidden_dim, 3, stride, 1, groups=hidden_dim, bias=False),
                nn.BatchNorm2d(hidden_dim),
                nn.ReLU6(inplace=True),
                # pw-linear
                nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),
                nn.BatchNorm2d(oup),
            )

    def forward(self, x):
        if self.use_res_connect:
            return x + self.conv(x)
        else:
            return self.conv(x)


class MobileNetV2(nn.Module):
    def __init__(self, n_class=11, input_size=128, width_mult=1.):
        super(MobileNetV2, self).__init__()
        block = InvertedResidual
        input_channel = 3
        last_channel = 32
        interverted_residual_setting = [
            # t, c, n, s
            [1, 16, 1, 1],
            [6, 32, 2, 2],
            [6, 64, 2, 2],
        ]

        # building first layer
        assert input_size % 32 == 0
        # input_channel = make_divisible(input_channel * width_mult)  # first channel is always 32!
        self.last_channel = make_divisible(last_channel * width_mult) if width_mult > 1.0 else last_channel
        self.features = [conv_bn(3, input_channel, 2)]
        # building inverted residual blocks
        for t, c, n, s in interverted_residual_setting:
            output_channel = make_divisible(c * width_mult) if t > 1 else c
            for i in range(n):
                if i == 0:
                    self.features.append(block(input_channel, output_channel, s, expand_ratio=t))
                else:
                    self.features.append(block(input_channel, output_channel, 1, expand_ratio=t))
                input_channel = output_channel
        # building last several layers
        self.features.append(conv_1x1_bn(input_channel, self.last_channel))
        # make it nn.Sequential
        self.features = nn.Sequential(*self.features)

        # building classifier
        self.classifier = nn.Linear(self.last_channel, n_class)

        self._initialize_weights()

    def forward(self, x):
        x = self.features(x)
        x = x.mean(3).mean(2)
        x = self.classifier(x)
        return x

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                m.weight.data.normal_(0, math.sqrt(2. / n))
                if m.bias is not None:
                    m.bias.data.zero_()
            elif isinstance(m, nn.BatchNorm2d):
                m.weight.data.fill_(1)
                m.bias.data.zero_()
            elif isinstance(m, nn.Linear):
                n = m.weight.size(1)
                m.weight.data.normal_(0, 0.01)
                m.bias.data.zero_()


def mobilenet_v2():
    model = MobileNetV2(width_mult=1)

    return model


## **Model Analysis**

Use `torchsummary` to get your model architecture (screenshot or pasting text are allowed.) and numbers of 
parameters, these two information should be submit to your NTU Cool questions.

Note that the number of parameters **should not greater than 100,000**, or you'll get penalty in this homework.


In [11]:
from torchsummary import summary

student_net = MobileNetV2() # StudentNet()
summary(student_net, (3, 128, 128), device="cpu")

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1            [-1, 3, 64, 64]              81
       BatchNorm2d-2            [-1, 3, 64, 64]               6
             ReLU6-3            [-1, 3, 64, 64]               0
            Conv2d-4            [-1, 3, 64, 64]              27
       BatchNorm2d-5            [-1, 3, 64, 64]               6
             ReLU6-6            [-1, 3, 64, 64]               0
            Conv2d-7           [-1, 16, 64, 64]              48
       BatchNorm2d-8           [-1, 16, 64, 64]              32
  InvertedResidual-9           [-1, 16, 64, 64]               0
           Conv2d-10           [-1, 96, 64, 64]           1,536
      BatchNorm2d-11           [-1, 96, 64, 64]             192
            ReLU6-12           [-1, 96, 64, 64]               0
           Conv2d-13           [-1, 96, 32, 32]             864
      BatchNorm2d-14           [-1, 96,

## **Knowledge Distillation**

<img src="https://i.imgur.com/H2aF7Rv.png=100x" width="500px">

Since we have a learned big model, let it teach the other small model. In implementation, let the training target be the prediction of big model instead of the ground truth.

## **Why it works?**
* If the data is not clean, then the prediction of big model could ignore the noise of the data with wrong labeled.
* The labels might have some relations. Number 8 is more similar to 6, 9, 0 than 1, 7, for example.


## **How to implement?**
* $Loss = \alpha T^2 \times KL(\frac{\text{Teacher's Logits}}{T} || \frac{\text{Student's Logits}}{T}) + (1-\alpha)(\text{Original Loss})$
* Note that the logits here should have passed softmax.

In [12]:
def loss_fn_kd(outputs, labels, teacher_outputs, T=10, alpha=0.5):
    hard_loss = F.cross_entropy(outputs, labels) * (1. - alpha) 
    # ---------- TODO ----------
    # Complete soft loss in knowledge distillation
    soft_loss = F.kl_div(F.log_softmax(outputs / T , dim=1) , F.softmax(teacher_outputs / T, dim=1), reduction='batchmean') * alpha * (T ** 2)
    return hard_loss + soft_loss

## **Teacher Model Setting**
We provide a well-trained teacher model to help you knowledge distillation to student model.
Note that if you want to change the transform function, you should consider  if suitable for this well-trained teacher model.
* If you cannot successfully gdown, you can change a link. (Backup link is provided at the bottom of this colab tutorial).


In [13]:
# Download teacherNet
!gdown --id '1zH1x39Y8a0XyOORG7TWzAnFf_YPY8e-m' --output teacher_net.ckpt
# Load teacherNet
teacher_net = torch.load('./teacher_net.ckpt')
teacher_net.eval()

Downloading...
From: https://drive.google.com/uc?id=1zH1x39Y8a0XyOORG7TWzAnFf_YPY8e-m
To: /content/teacher_net.ckpt
44.8MB [00:00, 209MB/s]


ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
  

## **Generate Pseudo Labels in Unlabeled Data**

Since we have a well-trained model, we can use this model to predict pseudo-labels and help the student network train well. Note that you 
**CANNOT** use well-trained model to pseudo-label the test data. 


---

**AGAIN, DO NOT USE TEST DATA FOR PURPOSE OTHER THAN INFERENCING**

* Because If you use teacher network to predict pseudo-labels of the test data, you can only use student network to overfit these pseudo-labels without train/unlabeled data. In this way, your kaggle accuracy will be as high as the teacher network, but the fact is that you just overfit the test data and your true testing accuracy is very low. 
* These contradict the purpose of these assignment (network compression); therefore, you should not misuse the test data.
* If you have any concerns, you can email us.


In [14]:
# "cuda" only when GPUs are available.
device = "cuda" if torch.cuda.is_available() else "cpu"

checkpoint_path = './drive/MyDrive/HW13/best_checkpoint.pt'
# student_net = MobileNetV2().to(device)
# student_net.load_state_dict(torch.load(checkpoint_path))

# Initialize a model, and put it on the device specified.
student_net = student_net.to(device)
teacher_net = teacher_net.to(device)

# Whether to do pseudo label.
do_semi = True

def get_pseudo_labels(dataset, model):
    loader = DataLoader(dataset, batch_size=batch_size*3, shuffle=False, pin_memory=True)
    pseudo_labels = []
    for batch in tqdm(loader):
        # A batch consists of image data and corresponding labels.
        img, _ = batch

        # Forward the data
        # Using torch.no_grad() accelerates the forward process.
        with torch.no_grad():
            logits = model(img.to(device))
            pseudo_labels.append(logits.argmax(dim=-1).detach().cpu())
        # Obtain the probability distributions by applying softmax on logits.
    pseudo_labels = torch.cat(pseudo_labels)
    # Update the labels by replacing with pseudo labels.
    for idx, ((img, _), pseudo_label) in enumerate(zip(dataset.samples, pseudo_labels)):
        dataset.samples[idx] = (img, pseudo_label.item())
    return dataset

if do_semi:
    # Generate new trainloader with unlabeled set.
    unlabeled_set = get_pseudo_labels(unlabeled_set, teacher_net)
    concat_dataset = ConcatDataset([train_set, unlabeled_set])
    train_loader = DataLoader(concat_dataset, batch_size=batch_size, shuffle=True, pin_memory=True, drop_last=True)




HBox(children=(FloatProgress(value=0.0, max=36.0), HTML(value='')))

  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)





## **Training** *(similar to HW3)*

You can finish supervised learning by simply running the provided code without any modification.

The function "get_pseudo_labels" is used for semi-supervised learning.
It is expected to get better performance if you use unlabeled data for semi-supervised learning.
However, you have to implement the function on your own and need to adjust several hyperparameters manually.

For more details about semi-supervised learning, please refer to [Prof. Lee's slides](https://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/semi%20(v3).pdf).

Again, please notice that utilizing external data (or pre-trained model) for training is **prohibited**.

---
**The only diffference with HW3 is that you should use loss in  knowledge distillation.**




In [15]:
# For the classification task, we use cross-entropy as the measurement of performance.
criterion = nn.CrossEntropyLoss()

# Initialize optimizer, you may fine-tune some hyperparameters such as learning rate on your own.
optimizer = torch.optim.Adam(student_net.parameters(), lr=3e-4, weight_decay=4e-5)

# The number of training epochs.
n_epochs = 200
best_val_acc = 0.0

for epoch in range(n_epochs):
    # ---------- Training ----------
    # Make sure the model is in train mode before training.
    student_net.train()

    # These are used to record information in training.
    train_loss = []
    train_accs = []

    # Iterate the training set by batches.
    for batch in tqdm(train_loader):

        # A batch consists of image data and corresponding labels.
        imgs, labels = batch

        # Forward the data. (Make sure data and model are on the same device.)
        logits = student_net(imgs.to(device))
        # Teacher net will not be updated. And we use torch.no_grad
        # to tell torch do not retain the intermediate values
        # (which are for backpropgation) and save the memory.
        with torch.no_grad():
          soft_labels = teacher_net(imgs.to(device))
        
        # Calculate the loss in knowledge distillation method.
        loss = loss_fn_kd(logits, labels.to(device), soft_labels)

        # Gradients stored in the parameters in the previous step should be cleared out first.
        optimizer.zero_grad()

        # Compute the gradients for parameters.
        loss.backward()

        # Clip the gradient norms for stable training.
        grad_norm = nn.utils.clip_grad_norm_(student_net.parameters(), max_norm=10)

        # Update the parameters with computed gradients.
        optimizer.step()

        # Compute the accuracy for current batch.
        acc = (logits.argmax(dim=-1) == labels.to(device)).float().mean()

        # Record the loss and accuracy.
        train_loss.append(loss.item())
        train_accs.append(acc)

    # The average loss and accuracy of the training set is the average of the recorded values.
    train_loss = sum(train_loss) / len(train_loss)
    train_acc = sum(train_accs) / len(train_accs)

    # Print the information.
    print(f"[ Train | {epoch + 1:03d}/{n_epochs:03d} ] loss = {train_loss:.5f}, acc = {train_acc:.5f}")

    # ---------- Validation ----------
    # Make sure the model is in eval mode so that some modules like dropout are disabled and work normally.
    student_net.eval()

    # These are used to record information in validation.
    valid_loss = []
    valid_accs = []

    # Iterate the validation set by batches.
    for batch in tqdm(valid_loader):

        # A batch consists of image data and corresponding labels.
        imgs, labels = batch

        # We don't need gradient in validation.
        # Using torch.no_grad() accelerates the forward process.
        with torch.no_grad():
          logits = student_net(imgs.to(device))
          soft_labels = teacher_net(imgs.to(device))
        # We can still compute the loss (but not the gradient).
        loss = loss_fn_kd(logits, labels.to(device), soft_labels)

        # Compute the accuracy for current batch.
        acc = (logits.argmax(dim=-1) == labels.to(device)).float().detach().cpu().view(-1).numpy()

        # Record the loss and accuracy.
        valid_loss.append(loss.item())
        valid_accs += list(acc)

    # The average loss and accuracy for entire validation set is the average of the recorded values.
    valid_loss = sum(valid_loss) / len(valid_loss)
    valid_acc = sum(valid_accs) / len(valid_accs)

    # Print the information.
    print(f"[ Valid | {epoch + 1:03d}/{n_epochs:03d} ] loss = {valid_loss:.5f}, acc = {valid_acc:.5f}")

    if valid_acc > best_val_acc:
        torch.save(student_net.state_dict(), checkpoint_path)
        best_val_acc = valid_acc
        print(f'Epoch {epoch} best model saved')

HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 001/200 ] loss = 13.81563, acc = 0.17675


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 001/200 ] loss = 31.93811, acc = 0.13636
Epoch 0 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 002/200 ] loss = 13.33714, acc = 0.20049


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 002/200 ] loss = 31.71351, acc = 0.13788
Epoch 1 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 003/200 ] loss = 12.80368, acc = 0.21550


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 003/200 ] loss = 31.28620, acc = 0.14848
Epoch 2 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 004/200 ] loss = 12.91875, acc = 0.22646


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 004/200 ] loss = 31.29252, acc = 0.15303
Epoch 3 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 005/200 ] loss = 12.65554, acc = 0.22656


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 005/200 ] loss = 30.65164, acc = 0.16061
Epoch 4 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 006/200 ] loss = 12.32576, acc = 0.23316


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 006/200 ] loss = 30.70699, acc = 0.19545
Epoch 5 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 007/200 ] loss = 12.20227, acc = 0.26004


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 007/200 ] loss = 30.26270, acc = 0.23788
Epoch 6 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 008/200 ] loss = 12.03579, acc = 0.27628


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 008/200 ] loss = 29.93737, acc = 0.23939
Epoch 7 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 009/200 ] loss = 11.89588, acc = 0.28987


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 009/200 ] loss = 29.94607, acc = 0.25909
Epoch 8 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 010/200 ] loss = 11.71809, acc = 0.30580


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 010/200 ] loss = 29.13926, acc = 0.27879
Epoch 9 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 011/200 ] loss = 11.50467, acc = 0.31291


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 011/200 ] loss = 28.75597, acc = 0.28788
Epoch 10 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 012/200 ] loss = 11.53919, acc = 0.31940


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 012/200 ] loss = 28.35567, acc = 0.28939
Epoch 11 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 013/200 ] loss = 11.29374, acc = 0.31595


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 013/200 ] loss = 28.14498, acc = 0.29848
Epoch 12 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 014/200 ] loss = 11.10959, acc = 0.32965


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 014/200 ] loss = 27.72941, acc = 0.31364
Epoch 13 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 015/200 ] loss = 10.95544, acc = 0.33604


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 015/200 ] loss = 27.53317, acc = 0.31667
Epoch 14 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 016/200 ] loss = 10.89573, acc = 0.34304


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 016/200 ] loss = 27.73643, acc = 0.32121
Epoch 15 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 017/200 ] loss = 10.69930, acc = 0.34619


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 017/200 ] loss = 26.86513, acc = 0.33636
Epoch 16 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 018/200 ] loss = 10.68527, acc = 0.35095


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 018/200 ] loss = 26.52449, acc = 0.34394
Epoch 17 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 019/200 ] loss = 10.61941, acc = 0.35775


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 019/200 ] loss = 26.33722, acc = 0.34545
Epoch 18 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 020/200 ] loss = 10.23419, acc = 0.36414


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 020/200 ] loss = 26.01598, acc = 0.34091


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 021/200 ] loss = 10.32082, acc = 0.36841


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 021/200 ] loss = 26.06076, acc = 0.34697
Epoch 20 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 022/200 ] loss = 10.08038, acc = 0.37480


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 022/200 ] loss = 25.69187, acc = 0.35909
Epoch 21 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 023/200 ] loss = 10.07085, acc = 0.37865


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 023/200 ] loss = 25.53484, acc = 0.35909


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 024/200 ] loss = 9.89886, acc = 0.38220


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 024/200 ] loss = 25.71729, acc = 0.35606


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 025/200 ] loss = 9.93174, acc = 0.38596


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 025/200 ] loss = 25.08952, acc = 0.36212
Epoch 24 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 026/200 ] loss = 9.78649, acc = 0.39509


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 026/200 ] loss = 24.88600, acc = 0.37576
Epoch 25 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 027/200 ] loss = 9.62292, acc = 0.39184


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 027/200 ] loss = 24.77532, acc = 0.39545
Epoch 26 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 028/200 ] loss = 9.68047, acc = 0.40219


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 028/200 ] loss = 23.89557, acc = 0.40758
Epoch 27 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 029/200 ] loss = 9.50118, acc = 0.40361


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 029/200 ] loss = 24.11163, acc = 0.40000


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 030/200 ] loss = 9.43479, acc = 0.41031


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 030/200 ] loss = 23.51587, acc = 0.42576
Epoch 29 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 031/200 ] loss = 9.36782, acc = 0.41903


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 031/200 ] loss = 23.38419, acc = 0.42727
Epoch 30 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 032/200 ] loss = 9.34499, acc = 0.41386


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 032/200 ] loss = 23.62839, acc = 0.40303


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 033/200 ] loss = 9.17064, acc = 0.42218


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 033/200 ] loss = 23.36186, acc = 0.42727


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 034/200 ] loss = 9.09241, acc = 0.42269


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 034/200 ] loss = 22.56250, acc = 0.43636
Epoch 33 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 035/200 ] loss = 9.02290, acc = 0.43141


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 035/200 ] loss = 23.05430, acc = 0.42879


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 036/200 ] loss = 9.00087, acc = 0.43750


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 036/200 ] loss = 22.34011, acc = 0.45303
Epoch 35 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 037/200 ] loss = 8.77777, acc = 0.43364


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 037/200 ] loss = 21.90754, acc = 0.46061
Epoch 36 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 038/200 ] loss = 8.73304, acc = 0.44014


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 038/200 ] loss = 21.89287, acc = 0.45909


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 039/200 ] loss = 8.58795, acc = 0.44450


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 039/200 ] loss = 21.60919, acc = 0.48182
Epoch 38 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 040/200 ] loss = 8.55913, acc = 0.45282


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 040/200 ] loss = 21.59572, acc = 0.47576


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 041/200 ] loss = 8.49746, acc = 0.44602


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 041/200 ] loss = 21.58029, acc = 0.46970


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 042/200 ] loss = 8.40985, acc = 0.45444


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 042/200 ] loss = 21.28141, acc = 0.46364


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 043/200 ] loss = 8.39673, acc = 0.45556


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 043/200 ] loss = 20.94252, acc = 0.49091
Epoch 42 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 044/200 ] loss = 8.19542, acc = 0.46489


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 044/200 ] loss = 21.73819, acc = 0.47424


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 045/200 ] loss = 8.15493, acc = 0.46165


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 045/200 ] loss = 21.01223, acc = 0.50152
Epoch 44 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 046/200 ] loss = 8.19848, acc = 0.47240


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 046/200 ] loss = 20.77423, acc = 0.49848


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 047/200 ] loss = 7.95505, acc = 0.47494


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 047/200 ] loss = 20.75598, acc = 0.49697


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 048/200 ] loss = 7.99646, acc = 0.47616


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 048/200 ] loss = 20.43019, acc = 0.50303
Epoch 47 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 049/200 ] loss = 7.87589, acc = 0.48153


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 049/200 ] loss = 20.31121, acc = 0.50455
Epoch 48 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 050/200 ] loss = 7.92546, acc = 0.48133


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 050/200 ] loss = 20.44912, acc = 0.51970
Epoch 49 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 051/200 ] loss = 7.74733, acc = 0.48275


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 051/200 ] loss = 19.78526, acc = 0.50152


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 052/200 ] loss = 7.63517, acc = 0.48640


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 052/200 ] loss = 20.09698, acc = 0.50152


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 053/200 ] loss = 7.70491, acc = 0.48711


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 053/200 ] loss = 20.47323, acc = 0.49242


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 054/200 ] loss = 7.55958, acc = 0.48996


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 054/200 ] loss = 19.82178, acc = 0.51364


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 055/200 ] loss = 7.55652, acc = 0.49219


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 055/200 ] loss = 19.48368, acc = 0.52273
Epoch 54 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 056/200 ] loss = 7.50983, acc = 0.49746


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 056/200 ] loss = 19.44760, acc = 0.51970


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 057/200 ] loss = 7.47087, acc = 0.50264


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 057/200 ] loss = 19.29672, acc = 0.53636
Epoch 56 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 058/200 ] loss = 7.46206, acc = 0.50436


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 058/200 ] loss = 19.70797, acc = 0.52879


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 059/200 ] loss = 7.41531, acc = 0.50538


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 059/200 ] loss = 18.98883, acc = 0.54242
Epoch 58 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 060/200 ] loss = 7.23211, acc = 0.50254


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 060/200 ] loss = 19.30022, acc = 0.51061


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 061/200 ] loss = 7.18904, acc = 0.51491


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 061/200 ] loss = 18.99325, acc = 0.53485


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 062/200 ] loss = 7.07936, acc = 0.51136


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 062/200 ] loss = 18.59830, acc = 0.55606
Epoch 61 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 063/200 ] loss = 7.18309, acc = 0.51268


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 063/200 ] loss = 18.80184, acc = 0.53182


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 064/200 ] loss = 7.10745, acc = 0.51765


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 064/200 ] loss = 18.70473, acc = 0.55152


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 065/200 ] loss = 7.07871, acc = 0.50984


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 065/200 ] loss = 18.25337, acc = 0.54545


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 066/200 ] loss = 7.09491, acc = 0.51948


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 066/200 ] loss = 18.56781, acc = 0.53030


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 067/200 ] loss = 6.97434, acc = 0.52080


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 067/200 ] loss = 18.43131, acc = 0.54848


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 068/200 ] loss = 7.00512, acc = 0.51928


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 068/200 ] loss = 18.56200, acc = 0.53182


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 069/200 ] loss = 6.91270, acc = 0.51826


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 069/200 ] loss = 18.02642, acc = 0.57727
Epoch 68 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 070/200 ] loss = 6.96291, acc = 0.52912


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 070/200 ] loss = 17.98725, acc = 0.58333
Epoch 69 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 071/200 ] loss = 6.85272, acc = 0.52232


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 071/200 ] loss = 17.92630, acc = 0.55000


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 072/200 ] loss = 6.85930, acc = 0.52963


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 072/200 ] loss = 17.69447, acc = 0.56970


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 073/200 ] loss = 6.81968, acc = 0.53369


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 073/200 ] loss = 18.28237, acc = 0.54697


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 074/200 ] loss = 6.71055, acc = 0.53957


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 074/200 ] loss = 17.65816, acc = 0.55758


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 075/200 ] loss = 6.76345, acc = 0.53642


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 075/200 ] loss = 17.82791, acc = 0.58333


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 076/200 ] loss = 6.72109, acc = 0.53734


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 076/200 ] loss = 17.61384, acc = 0.57727


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 077/200 ] loss = 6.67863, acc = 0.53328


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 077/200 ] loss = 17.82589, acc = 0.55909


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 078/200 ] loss = 6.61221, acc = 0.54302


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 078/200 ] loss = 17.26549, acc = 0.58788
Epoch 77 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 079/200 ] loss = 6.55423, acc = 0.54556


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 079/200 ] loss = 17.30106, acc = 0.58333


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 080/200 ] loss = 6.60667, acc = 0.54525


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 080/200 ] loss = 17.06910, acc = 0.57727


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 081/200 ] loss = 6.42157, acc = 0.54485


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 081/200 ] loss = 17.03265, acc = 0.59242
Epoch 80 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 082/200 ] loss = 6.42111, acc = 0.54150


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 082/200 ] loss = 17.05379, acc = 0.58788


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 083/200 ] loss = 6.46413, acc = 0.54870


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 083/200 ] loss = 17.10537, acc = 0.59394
Epoch 82 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 084/200 ] loss = 6.46190, acc = 0.54931


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 084/200 ] loss = 16.60949, acc = 0.60455
Epoch 83 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 085/200 ] loss = 6.35734, acc = 0.55408


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 085/200 ] loss = 16.62921, acc = 0.61515
Epoch 84 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 086/200 ] loss = 6.35762, acc = 0.55367


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 086/200 ] loss = 16.68615, acc = 0.59848


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 087/200 ] loss = 6.22324, acc = 0.55063


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 087/200 ] loss = 17.05492, acc = 0.58485


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 088/200 ] loss = 6.24989, acc = 0.55509


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 088/200 ] loss = 16.71880, acc = 0.59848


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 089/200 ] loss = 6.25317, acc = 0.55469


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 089/200 ] loss = 16.70331, acc = 0.60758


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 090/200 ] loss = 6.31353, acc = 0.55672


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 090/200 ] loss = 16.65375, acc = 0.58788


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 091/200 ] loss = 6.19045, acc = 0.55793


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 091/200 ] loss = 16.34930, acc = 0.61061


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 092/200 ] loss = 6.22000, acc = 0.55357


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 092/200 ] loss = 16.33125, acc = 0.60303


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 093/200 ] loss = 6.19263, acc = 0.55438


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 093/200 ] loss = 16.74522, acc = 0.57121


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 094/200 ] loss = 6.10101, acc = 0.55601


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 094/200 ] loss = 16.39650, acc = 0.58333


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 095/200 ] loss = 6.12003, acc = 0.55905


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 095/200 ] loss = 15.92114, acc = 0.62424
Epoch 94 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 096/200 ] loss = 6.07027, acc = 0.56149


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 096/200 ] loss = 16.15055, acc = 0.62273


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 097/200 ] loss = 6.07644, acc = 0.56869


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 097/200 ] loss = 16.03347, acc = 0.62576
Epoch 96 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 098/200 ] loss = 6.07659, acc = 0.56646


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 098/200 ] loss = 16.23115, acc = 0.61364


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 099/200 ] loss = 6.05205, acc = 0.56778


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 099/200 ] loss = 15.76151, acc = 0.61970


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 100/200 ] loss = 6.03649, acc = 0.57092


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 100/200 ] loss = 15.55874, acc = 0.61515


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 101/200 ] loss = 6.04396, acc = 0.56595


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 101/200 ] loss = 15.80420, acc = 0.60000


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 102/200 ] loss = 6.01167, acc = 0.56849


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 102/200 ] loss = 15.49300, acc = 0.61970


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 103/200 ] loss = 5.93463, acc = 0.57681


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 103/200 ] loss = 15.60380, acc = 0.62576


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 104/200 ] loss = 6.03393, acc = 0.57143


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 104/200 ] loss = 15.76044, acc = 0.62273


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 105/200 ] loss = 5.86592, acc = 0.56686


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 105/200 ] loss = 15.56286, acc = 0.63485
Epoch 104 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 106/200 ] loss = 5.87753, acc = 0.57214


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 106/200 ] loss = 15.20915, acc = 0.65152
Epoch 105 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 107/200 ] loss = 5.89701, acc = 0.57873


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 107/200 ] loss = 15.10942, acc = 0.63030


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 108/200 ] loss = 5.77903, acc = 0.57904


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 108/200 ] loss = 15.46531, acc = 0.63485


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 109/200 ] loss = 5.86131, acc = 0.57792


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 109/200 ] loss = 15.52824, acc = 0.63333


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 110/200 ] loss = 5.75990, acc = 0.57894


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 110/200 ] loss = 15.39885, acc = 0.61515


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 111/200 ] loss = 5.74130, acc = 0.58502


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 111/200 ] loss = 15.07677, acc = 0.64697


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 112/200 ] loss = 5.79691, acc = 0.57833


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 112/200 ] loss = 15.48546, acc = 0.65758
Epoch 111 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 113/200 ] loss = 5.73024, acc = 0.57681


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 113/200 ] loss = 14.62700, acc = 0.67424
Epoch 112 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 114/200 ] loss = 5.71164, acc = 0.57386


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 114/200 ] loss = 16.19969, acc = 0.62273


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 115/200 ] loss = 5.71788, acc = 0.57802


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 115/200 ] loss = 15.18177, acc = 0.60909


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 116/200 ] loss = 5.71819, acc = 0.58178


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 116/200 ] loss = 15.32658, acc = 0.63485


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 117/200 ] loss = 5.65203, acc = 0.57914


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 117/200 ] loss = 14.86746, acc = 0.65455


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 118/200 ] loss = 5.70511, acc = 0.57944


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 118/200 ] loss = 14.76451, acc = 0.65152


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 119/200 ] loss = 5.60986, acc = 0.58563


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 119/200 ] loss = 14.99991, acc = 0.67121


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 120/200 ] loss = 5.64359, acc = 0.58969


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 120/200 ] loss = 14.56952, acc = 0.66970


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 121/200 ] loss = 5.66755, acc = 0.58228


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 121/200 ] loss = 15.37289, acc = 0.63788


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 122/200 ] loss = 5.58797, acc = 0.58817


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 122/200 ] loss = 14.82926, acc = 0.65606


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 123/200 ] loss = 5.59436, acc = 0.59060


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 123/200 ] loss = 15.01317, acc = 0.66061


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 124/200 ] loss = 5.54672, acc = 0.59091


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 124/200 ] loss = 14.63032, acc = 0.64242


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 125/200 ] loss = 5.53914, acc = 0.58513


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 125/200 ] loss = 14.51241, acc = 0.64697


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 126/200 ] loss = 5.45985, acc = 0.59142


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 126/200 ] loss = 14.43281, acc = 0.66970


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 127/200 ] loss = 5.48304, acc = 0.59436


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 127/200 ] loss = 14.59051, acc = 0.66515


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 128/200 ] loss = 5.53057, acc = 0.59375


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 128/200 ] loss = 14.92528, acc = 0.66970


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 129/200 ] loss = 5.40071, acc = 0.60562


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 129/200 ] loss = 14.41019, acc = 0.65909


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 130/200 ] loss = 5.44421, acc = 0.59294


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 130/200 ] loss = 14.65279, acc = 0.67424


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 131/200 ] loss = 5.51993, acc = 0.59679


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 131/200 ] loss = 14.41950, acc = 0.68788
Epoch 130 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 132/200 ] loss = 5.40629, acc = 0.59558


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 132/200 ] loss = 13.91082, acc = 0.68939
Epoch 131 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 133/200 ] loss = 5.40618, acc = 0.59720


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 133/200 ] loss = 14.33512, acc = 0.67879


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 134/200 ] loss = 5.44066, acc = 0.59750


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 134/200 ] loss = 14.13072, acc = 0.67273


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 135/200 ] loss = 5.45410, acc = 0.59517


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 135/200 ] loss = 14.07194, acc = 0.67424


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 136/200 ] loss = 5.36252, acc = 0.59984


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 136/200 ] loss = 14.21922, acc = 0.67121


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 137/200 ] loss = 5.38206, acc = 0.60055


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 137/200 ] loss = 14.18509, acc = 0.66667


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 138/200 ] loss = 5.30881, acc = 0.60055


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 138/200 ] loss = 13.93392, acc = 0.66818


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 139/200 ] loss = 5.37484, acc = 0.59334


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 139/200 ] loss = 13.92347, acc = 0.67727


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 140/200 ] loss = 5.31365, acc = 0.60319


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 140/200 ] loss = 14.03453, acc = 0.66061


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 141/200 ] loss = 5.35485, acc = 0.59862


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 141/200 ] loss = 14.11514, acc = 0.67424


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 142/200 ] loss = 5.30108, acc = 0.59537


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 142/200 ] loss = 13.80971, acc = 0.69242
Epoch 141 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 143/200 ] loss = 5.34368, acc = 0.59903


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 143/200 ] loss = 14.02450, acc = 0.67727


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 144/200 ] loss = 5.30825, acc = 0.59537


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 144/200 ] loss = 14.02399, acc = 0.68333


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 145/200 ] loss = 5.29057, acc = 0.60004


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 145/200 ] loss = 13.80012, acc = 0.69394
Epoch 144 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 146/200 ] loss = 5.20807, acc = 0.59933


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 146/200 ] loss = 14.07657, acc = 0.68939


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 147/200 ] loss = 5.24003, acc = 0.60735


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 147/200 ] loss = 13.93129, acc = 0.68182


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 148/200 ] loss = 5.28651, acc = 0.60034


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 148/200 ] loss = 13.91020, acc = 0.67273


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 149/200 ] loss = 5.24715, acc = 0.60329


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 149/200 ] loss = 13.22658, acc = 0.70606
Epoch 148 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 150/200 ] loss = 5.15664, acc = 0.59892


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 150/200 ] loss = 13.46604, acc = 0.70758
Epoch 149 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 151/200 ] loss = 5.22086, acc = 0.60846


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 151/200 ] loss = 13.69291, acc = 0.70152


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 152/200 ] loss = 5.26082, acc = 0.60045


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 152/200 ] loss = 13.55588, acc = 0.70152


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 153/200 ] loss = 5.14668, acc = 0.61171


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 153/200 ] loss = 13.53198, acc = 0.69394


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 154/200 ] loss = 5.23025, acc = 0.60288


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 154/200 ] loss = 13.36925, acc = 0.69545


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 155/200 ] loss = 5.15629, acc = 0.60501


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 155/200 ] loss = 14.52064, acc = 0.64697


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 156/200 ] loss = 5.17828, acc = 0.60278


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 156/200 ] loss = 13.29509, acc = 0.70303


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 157/200 ] loss = 5.15423, acc = 0.59547


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 157/200 ] loss = 13.45424, acc = 0.70152


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 158/200 ] loss = 5.13481, acc = 0.60684


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 158/200 ] loss = 14.03643, acc = 0.66364


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 159/200 ] loss = 5.21642, acc = 0.60664


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 159/200 ] loss = 13.55756, acc = 0.68788


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 160/200 ] loss = 5.14472, acc = 0.60298


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 160/200 ] loss = 13.19149, acc = 0.68939


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 161/200 ] loss = 5.14312, acc = 0.60116


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 161/200 ] loss = 12.70502, acc = 0.71970
Epoch 160 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 162/200 ] loss = 5.06631, acc = 0.60917


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 162/200 ] loss = 13.67865, acc = 0.69091


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 163/200 ] loss = 5.02808, acc = 0.61597


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 163/200 ] loss = 13.03989, acc = 0.70000


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 164/200 ] loss = 5.05567, acc = 0.61800


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 164/200 ] loss = 13.03633, acc = 0.72424
Epoch 163 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 165/200 ] loss = 5.08782, acc = 0.61536


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 165/200 ] loss = 13.13757, acc = 0.68636


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 166/200 ] loss = 5.06920, acc = 0.60795


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 166/200 ] loss = 12.80100, acc = 0.71061


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 167/200 ] loss = 5.04950, acc = 0.61425


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 167/200 ] loss = 13.33731, acc = 0.68788


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 168/200 ] loss = 5.03809, acc = 0.60745


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 168/200 ] loss = 13.22328, acc = 0.70758


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 169/200 ] loss = 5.05076, acc = 0.61698


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 169/200 ] loss = 13.24164, acc = 0.71364


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 170/200 ] loss = 4.99158, acc = 0.61445


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 170/200 ] loss = 12.68369, acc = 0.71364


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 171/200 ] loss = 5.01366, acc = 0.60806


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 171/200 ] loss = 12.96272, acc = 0.71061


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 172/200 ] loss = 5.01051, acc = 0.61262


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 172/200 ] loss = 13.30583, acc = 0.68030


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 173/200 ] loss = 5.03035, acc = 0.62125


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 173/200 ] loss = 13.58913, acc = 0.68636


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 174/200 ] loss = 5.03743, acc = 0.61790


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 174/200 ] loss = 13.40107, acc = 0.69545


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 175/200 ] loss = 5.02441, acc = 0.61820


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 175/200 ] loss = 12.87092, acc = 0.72576
Epoch 174 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 176/200 ] loss = 4.98187, acc = 0.61810


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 176/200 ] loss = 13.08617, acc = 0.71667


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 177/200 ] loss = 5.00251, acc = 0.61567


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 177/200 ] loss = 12.71270, acc = 0.71061


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 178/200 ] loss = 4.93678, acc = 0.61374


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 178/200 ] loss = 13.21001, acc = 0.69545


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 179/200 ] loss = 4.93702, acc = 0.61739


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 179/200 ] loss = 12.61175, acc = 0.71212


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 180/200 ] loss = 4.98528, acc = 0.61496


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 180/200 ] loss = 13.00833, acc = 0.72273


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 181/200 ] loss = 4.93366, acc = 0.61445


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 181/200 ] loss = 12.99395, acc = 0.72121


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 182/200 ] loss = 4.85797, acc = 0.61516


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 182/200 ] loss = 13.27327, acc = 0.69394


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 183/200 ] loss = 4.93731, acc = 0.61790


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 183/200 ] loss = 12.56473, acc = 0.71212


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 184/200 ] loss = 4.87848, acc = 0.61983


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 184/200 ] loss = 12.82412, acc = 0.69848


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 185/200 ] loss = 4.92371, acc = 0.61820


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 185/200 ] loss = 12.71064, acc = 0.69697


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 186/200 ] loss = 4.87298, acc = 0.61830


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 186/200 ] loss = 12.69317, acc = 0.72727
Epoch 185 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 187/200 ] loss = 4.93821, acc = 0.61485


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 187/200 ] loss = 12.31422, acc = 0.71212


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 188/200 ] loss = 4.85142, acc = 0.61830


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 188/200 ] loss = 12.32946, acc = 0.71970


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 189/200 ] loss = 4.95232, acc = 0.61922


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 189/200 ] loss = 12.24084, acc = 0.73485
Epoch 188 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 190/200 ] loss = 4.88091, acc = 0.61861


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 190/200 ] loss = 12.22804, acc = 0.71364


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 191/200 ] loss = 4.82493, acc = 0.62196


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 191/200 ] loss = 12.72500, acc = 0.71515


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 192/200 ] loss = 4.92082, acc = 0.61323


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 192/200 ] loss = 12.70427, acc = 0.74848
Epoch 191 best model saved


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 193/200 ] loss = 4.92397, acc = 0.61922


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 193/200 ] loss = 11.94412, acc = 0.73030


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 194/200 ] loss = 4.83572, acc = 0.62064


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 194/200 ] loss = 12.75335, acc = 0.73485


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 195/200 ] loss = 4.82289, acc = 0.61780


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 195/200 ] loss = 12.47206, acc = 0.70758


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 196/200 ] loss = 4.83512, acc = 0.61901


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 196/200 ] loss = 13.03742, acc = 0.71515


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 197/200 ] loss = 4.86948, acc = 0.62926


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 197/200 ] loss = 12.09535, acc = 0.71212


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 198/200 ] loss = 4.75476, acc = 0.62358


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 198/200 ] loss = 12.71453, acc = 0.71364


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 199/200 ] loss = 4.85777, acc = 0.62449


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 199/200 ] loss = 12.97639, acc = 0.71818


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 200/200 ] loss = 4.81580, acc = 0.62256


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 200/200 ] loss = 12.55282, acc = 0.70606


## **Testing** *(same as HW3)*

For inference, we need to make sure the model is in eval mode, and the order of the dataset should not be shuffled ("shuffle=False" in test_loader).

Last but not least, don't forget to save the predictions into a single CSV file.
The format of CSV file should follow the rules mentioned in the slides.

### **WARNING -- Keep in Mind**

Cheating includes but not limited to:
1.   using testing labels,
2.   submitting results to previous Kaggle competitions,
3.   sharing predictions with others,
4.   copying codes from any creatures on Earth,
5.   asking other people to do it for you.

Any violations bring you punishments from getting a discount on the final grade to failing the course.

It is your responsibility to check whether your code violates the rules.
When citing codes from the Internet, you should know what these codes exactly do.
You will **NOT** be tolerated if you break the rule and claim you don't know what these codes do.


In [16]:
### This block is same as HW3 ###
# Make sure the model is in eval mode.
# Some modules like Dropout or BatchNorm affect if the model is in training mode.
student_net = MobileNetV2().to(device)
student_net.load_state_dict(torch.load(checkpoint_path))
student_net.eval()

# Initialize a list to store the predictions.
predictions = []

# Iterate the testing set by batches.
for batch in tqdm(test_loader):
    # A batch consists of image data and corresponding labels.
    # But here the variable "labels" is useless since we do not have the ground-truth.
    # If printing out the labels, you will find that it is always 0.
    # This is because the wrapper (DatasetFolder) returns images and labels for each batch,
    # so we have to create fake labels to make it work normally.
    imgs, labels = batch

    # We don't need gradient in testing, and we don't even have labels to compute loss.
    # Using torch.no_grad() accelerates the forward process.
    with torch.no_grad():
        logits = student_net(imgs.to(device))

    # Take the class with greatest logit as prediction and record it.
    predictions.extend(logits.argmax(dim=-1).cpu().numpy().tolist())

HBox(children=(FloatProgress(value=0.0, max=14.0), HTML(value='')))




In [17]:
### This block is same as HW3 ###
# Save predictions into the file.
with open("predict.csv", "w") as f:

    # The first row must be "Id, Category"
    f.write("Id,Category\n")

    # For the rest of the rows, each image id corresponds to a predicted class.
    for i, pred in  enumerate(predictions):
         f.write(f"{i},{pred}\n")

In [18]:
from google.colab import files
files.download("predict.csv")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## **Statistics**

|Baseline|Accuracy|Training Time|
|-|-|-|
|Simple Baseline |0.59856|2 Hours|
|Medium Baseline |0.65412|2 Hours|
|Strong Baseline |0.72819|4 Hours|
|Boss Baseline |0.81003|Unmeasueable|

## **Learning Curve**

![img](https://lh5.googleusercontent.com/amMLGa7dkqvXGmsJlrVN49VfSjClk5d-n7nCi_Y3ROK4himsBSHhB7SpdWe7Zm06ctRO77VdDkD9u_aKfAh1tMW-KcyYX7vF7LPlKqOo2fVtt3SyfsLv0KTYDB0YbAk6ZhyOIKT8Zfg)



## **Q&A**

If you have any question about this colab, please send a email to ntu-ml-2021spring-ta@googlegroups.com

## **Backup Links**

In [19]:
# resnet_model 
# !gdown --id '1zH1x39Y8a0XyOORG7TWzAnFf_YPY8e-m' --output resnet_model.ckpt
# !gdown --id '1VBIeQKH4xRHfToUxuDxtEPsqz0MHvrgd' --output resnet_model.ckpt
# !gdown --id '1Er2azErvXWS5m1jboKN7BLxNXnuAatYw' --output resnet_model.ckpt
# !gdown --id '1Qya0vmf3nRl11IyxxF7nudDpZI_Q4Amh' --output resnet_model.ckpt
# !gdown --id '1fGOOb5ndljraBIkRkLp3bW9orR4YN97U' --output resnet_model.ckpt
# !gdown --id '1apHLvZBZ3GYEMxXxToGKF7qDLn1XbOfJ' --output resnet_model.ckpt
# !gdown --id '1vsDylNsLaAqxonop7Mw3dBAig0EO7tlF' --output resnet_model.ckpt
# !gdown --id '1V_hXJM_V9-10i6wldRyl0SOiivPp4SNt' --output resnet_model.ckpt
# !gdown --id '11HzaJM2M2yg6KYhLaWpWy8WmPIIvJgnk' --output resnet_model.ckpt

# food-11
# !gdown --id '1qdyNN0Ek4S5yi-pAqHes1yjj5cNkENCc' --output food-11.zip
# !gdown --id '1c0Q1EP6yIx0O2rqVMIVInIt8wFjLxmRh' --output food-11.zip
# !gdown --id '1hKO054nT1R8egcXY2-tgQbwX4EjowRLz' --output food-11.zip
# !gdown --id '1_7_uC1WUvX6H51gQaYmI4q3AezdQJhud' --output food-11.zip
# !gdown --id '12bz82Zpx0_7BDGXq4nRt7E_fMFmILoc9' --output food-11.zip
# !gdown --id '1oiqRKrDQXVBM5y63MeEaHxFmCIzNXx1Q' --output food-11.zip
# !gdown --id '1qaL43sl4qUMeCT1OVpk4aOFycnLL5ZJX' --output food-11.zip