# **Homework 3 - Convolutional Neural Network**

This is the example code of homework 3 of the machine learning course by Prof. Hung-yi Lee.

In this homework, you are required to build a convolutional neural network for image classification, possibly with some advanced training tips.


There are three levels here:

**Easy**: Build a simple convolutional neural network as the baseline. (2 pts)

**Medium**: Design a better architecture or adopt different data augmentations to improve the performance. (2 pts)

**Hard**: Utilize provided unlabeled data to obtain better results. (2 pts)

## **About the Dataset**

The dataset used here is food-11, a collection of food images in 11 classes.

For the requirement in the homework, TAs slightly modified the data.
Please DO NOT access the original fully-labeled training data or testing labels.

Also, the modified dataset is for this course only, and any further distribution or commercial use is forbidden.

In [1]:
import os

os.environ['CUDA_VISIBLE_DEVICES'] = '1'

In [2]:
# Download the dataset
# You may choose where to download the data.

# Google Drive
# !gdown --id '1awF7pZ9Dz7X1jn1_QAiKN-_v56veCEKy' --output food-11.zip

# Dropbox
# !wget https://www.dropbox.com/s/m9q6273jl3djall/food-11.zip -O food-11.zip

# Unzip the dataset.
# This may take some time.
# !unzip -q food-11.zip

## **Import Packages**

First, we need to import packages that will be used later.

In this homework, we highly rely on **torchvision**, a library of PyTorch.

In [3]:
# Import necessary packages.
import numpy as np
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
from PIL import Image
# "ConcatDataset" and "Subset" are possibly useful when doing semi-supervised learning.
from torch.utils.data import ConcatDataset, DataLoader, Subset
from torchvision.datasets import DatasetFolder

# This is for the progress bar.
from tqdm.notebook import tqdm

## **Dataset, Data Loader, and Transforms**

Torchvision provides lots of useful utilities for image preprocessing, data wrapping as well as data augmentation.

Here, since our data are stored in folders by class labels, we can directly apply **torchvision.datasets.DatasetFolder** for wrapping data without much effort.

Please refer to [PyTorch official website](https://pytorch.org/vision/stable/transforms.html) for details about different transforms.

In [4]:
# It is important to do data augmentation in training.
# However, not every augmentation is useful.
# Please think about what kind of augmentation is helpful for food recognition.
train_tfm = transforms.Compose([
    # Resize the image into a fixed shape (height = width = 128)
    transforms.Resize(224),
    transforms.CenterCrop(224),
    transforms.RandomRotation(60),
    transforms.ColorJitter(brightness=0.2, contrast=0.25, saturation=.15, hue=0.05),
    transforms.RandomVerticalFlip(p=0.1),
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.ToTensor(),
    transforms.RandomErasing(p=0.5, scale=(0.05, 0.15), ratio=(0.5, 1.5), value=0, inplace=False),
    transforms.Normalize(mean = (0.5, 0.5, 0.5), std = (0.25, 0.25, 0.25)),
])

# We don't need augmentations in testing and validation.
# All we need here is to resize the PIL image and transform it into Tensor.
test_tfm = transforms.Compose([
    transforms.Resize(224),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean = (0.5, 0.5, 0.5), std = (0.25, 0.25, 0.25)),
])


In [5]:
# Batch size for training, validation, and testing.
# A greater batch size usually gives a more stable gradient.
# But the GPU memory is limited, so please adjust it carefully.
batch_size = 128

# Construct datasets.
# The argument "loader" tells how torchvision reads the data.
data_folder = "/data/ML2021/hw3/"

train_set = DatasetFolder(data_folder+"food-11/training/labeled", loader=lambda x: Image.open(x),extensions="jpg", transform=train_tfm)
valid_set = DatasetFolder(data_folder+"food-11/validation", loader=lambda x: Image.open(x),extensions="jpg", transform=test_tfm)
unlabeled_set = DatasetFolder(data_folder+"food-11/training/unlabeled", loader=lambda x: Image.open(x), extensions="jpg", transform=train_tfm)
test_set = DatasetFolder(data_folder+"food-11/testing", loader=lambda x: Image.open(x), extensions="jpg", transform=test_tfm)

# Construct data loaders.
train_loader = DataLoader(train_set, batch_size=batch_size, shuffle=True, num_workers=8, pin_memory=True)
valid_loader = DataLoader(valid_set, batch_size=batch_size, shuffle=True, num_workers=2, pin_memory=True)
unlab_loader = DataLoader(unlabeled_set, batch_size=batch_size, shuffle=False, num_workers=8, pin_memory=True)
test_loader = DataLoader(test_set, batch_size=batch_size, shuffle=False)

## **Model**

The basic model here is simply a stack of convolutional layers followed by some fully-connected layers.

Since there are three channels for a color image (RGB), the input channels of the network must be three.
In each convolutional layer, typically the channels of inputs grow, while the height and width shrink (or remain unchanged, according to some hyperparameters like stride and padding).

Before fed into fully-connected layers, the feature map must be flattened into a single one-dimensional vector (for each image).
These features are then transformed by the fully-connected layers, and finally, we obtain the "logits" for each class.

### **WARNING -- You Must Know**
You are free to modify the model architecture here for further improvement.
However, if you want to use some well-known architectures such as ResNet50, please make sure **NOT** to load the pre-trained weights.
Using such pre-trained models is considered cheating and therefore you will be punished.
Similarly, it is your responsibility to make sure no pre-trained weights are used if you use **torch.hub** to load any modules.

For example, if you use ResNet-18 as your model:

model = torchvision.models.resnet18(pretrained=**False**) → This is fine.

model = torchvision.models.resnet18(pretrained=**True**)  → This is **NOT** allowed.

In [6]:
class BlockConv2d(nn.Module):
    def __init__(self, ch_in, ch_out, k, stride=1,
            act=None, pooling=None, use_bn=True, drop=0, is_dconv=False, is_pad=True):

        super(BlockConv2d, self).__init__()
        pad = k//2 if is_pad else 0
        if is_dconv:
            conv = nn.ConvTranspose2d(ch_in, ch_out, k, padding=pad, stride=stride, output_padding=stride-1)
        else:
            conv = nn.Conv2d(ch_in, ch_out, k, padding=pad, stride=stride)
            
        list = [
            conv
        ]
        if use_bn: list.append(nn.BatchNorm2d(ch_out))
        if act: list.append(act)
        if pooling: list.append(pooling)
        if drop > 0: list.append(nn.Dropout2d(drop))
        self.net = nn.Sequential(*list)

    def forward(self, x):
          return self.net(x)

class BlockLinear(nn.Module):
    def __init__(self, ch_in, ch_out, act=None, use_bn=True, drop=0):
        super(BlockLinear, self).__init__()
        list = [
            nn.Linear(ch_in, ch_out)
        ]
        if use_bn: list.append(nn.BatchNorm1d(ch_out))
        if act: list.append(act)
        if drop > 0: list.append(nn.Dropout(drop))
        self.net = nn.Sequential(*list)

    def forward(self, x):
          return self.net(x)

In [7]:
class AE(nn.Module):
    def __init__(self):
        super(AE, self).__init__()
        self.act = nn.ReLU()
        # 
        self.n_0 = [3] + [64, 256]
        self.k_0 = [5]*2
        self.p_0 = [4]*2
        self.l_0 = len(self.k_0)
        self.encoder = [
            BlockConv2d(
                self.n_0[i], self.n_0[i+1], self.k_0[i], 
                act=self.act, pooling=nn.MaxPool2d(self.p_0[i], self.p_0[i]),
                use_bn=True, drop=0.3)
            for i in range(self.l_0)]
        self.encoder = nn.Sequential(*self.encoder)

        # self.n_1 = self.n_0[::-1]
        self.n_1 = [256, 64]
        self.k_1 = [self.k_0[-1]]
        self.p_1 = [self.p_0[-1]]
        self.l_1 = len(self.k_1)

        self.decoder = [
            BlockConv2d(self.n_1[i], self.n_1[i+1], self.k_1[i], stride=self.p_1[i],
            act=self.act, use_bn=True, drop=0.1, is_dconv=True)
            for i in range(self.l_1)]
        # self.decoder = nn.ModuleList(self.decoder)
        self.decoder = nn.Sequential(*self.decoder)
        self.out = nn.ConvTranspose2d(self.n_1[-1], 3, 5, padding=5//2, stride=4, output_padding=3)
        
        
    def forward(self, x):
        x = self.encoder(x)
        # print(x.shape)
        x = self.decoder(x)
        # for d in self.decoder:
        #     x = d(x)
        #     print(x.shape)

        x = self.out(x)
        # print(x.shape)
        return x
        


In [8]:
# m = torchvision.models.resnet18(pretrained=False)
# for i, j in (m.state_dict().items()):
    # print(i, "\t", j.shape)
# m.fc

In [9]:
class Classifier(nn.Module):
    def __init__(self, encoder=None):
        super(Classifier, self).__init__()
        # The arguments for commonly used modules:
        # torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding)
        # torch.nn.MaxPool2d(kernel_size, stride, padding)

        # input image size: [3, 128, 128]
        self.act = nn.ReLU()
        if encoder:
            self.encoder = torchvision.models.resnet34(pretrained=False)
            self.encoder.fc = BlockLinear(512, 128, act=nn.ReLU(), use_bn=True, drop=0.2)
        else:
        # 
            self.n_0 = [3] + [64]*2 + [128]*2 + [256]
            self.k_0 = [3]*5
            self.p_0 = [1, 2]*2+[4]
            self.l_0 = len(self.k_0)
            self.encoder = [
                BlockConv2d(
                    self.n_0[i], self.n_0[i+1], self.k_0[i], 
                    act=self.act, pooling=nn.MaxPool2d(self.p_0[i], self.p_0[i]),
                    use_bn=True, drop=0.2)
                for i in range(self.l_0)]
            self.encoder = nn.Sequential(*self.encoder)

        # 512x8x8
        
        # self.n = [256*8*8] + [512, 128]
        # self.n = [1024] + [256]
        # self.l = len(self.n)
        # self.fc = [
            
        # ]
        # self.fc += [
        #     BlockLinear(self.n[i], self.n[i+1], act=nn.ReLU(), use_bn=True, drop=0.2)
        #     for i in range(self.l-1)
        # ]
        # self.fc = nn.Sequential(*self.fc)
        self.out = nn.Linear(128, 11)

    def forward(self, x):
        # input (x): [batch_size, 3, 128, 128]
        # output: [batch_size, 11]
        x = self.encoder(x)
        # x = x.flatten(1)
        # print(x.shape)
        # x = self.fc(x)
        x = self.out(x)
        return x

## **Training**

You can finish supervised learning by simply running the provided code without any modification.

The function "get_pseudo_labels" is used for semi-supervised learning.
It is expected to get better performance if you use unlabeled data for semi-supervised learning.
However, you have to implement the function on your own and need to adjust several hyperparameters manually.

For more details about semi-supervised learning, please refer to [Prof. Lee's slides](https://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/semi%20(v3).pdf).

Again, please notice that utilizing external data (or pre-trained model) for training is **prohibited**.

In [10]:
def get_pseudo_labels(dataset, model, threshold=0.5):
    # This functions generates pseudo-labels of a dataset using given model.
    # It returns an instance of DatasetFolder containing images whose prediction confidences exceed a given threshold.
    # You are NOT allowed to use any models trained on external data for pseudo-labeling.
    device = "cuda" if torch.cuda.is_available() else "cpu"

    # Make sure the model is in eval mode.
    model.eval()
    # Define softmax function.
    softmax = nn.Softmax(dim=-1)
    idx = []
    targets = []
    # Iterate over the dataset by batches.
    for i, batch in tqdm(enumerate(unlab_loader), leave=False, desc='PseudoLabels'):
        img, _ = batch
        # Forward the data
        # Using torch.no_grad() accelerates the forward process.
        with torch.no_grad():
            logits = model(img.to(device))

        # Obtain the probability distributions by applying softmax on logits.
        probs = softmax(logits)
        st = torch.topk(probs, 2, dim=1)
        
        # ---------- TODO ----------
        # Filter the data and construct a new dataset.
        probs1 = st[0][:, 0]
        # probs2 = st[0][:, 1]
        # select = (probs1 > threshold) & ((probs1-probs2) > .25)
        select = (probs1 > threshold)
        targets += st[1][select, 0].tolist()
        idx += (torch.where(select)[0] + batch_size*i).tolist()

    new = Subset(dataset, idx)
    new.targets = targets
    # # Turn off the eval mode.
    model.train()
    return new

In [11]:
# device = "cuda" if torch.cuda.is_available() else "cpu"

# ae_model = AE().to(device)
# ae_criterion = nn.MSELoss(reduction='mean')
# ae_optimizer = torch.optim.Adam(ae_model.parameters(), lr=5e-4, weight_decay=1e-7)

# n_epochs = 20
# ae_loader = DataLoader(ConcatDataset([train_set, unlabeled_set]), batch_size=batch_size, shuffle=True, num_workers=8, pin_memory=True)
# for epoch in range(n_epochs):
#     ae_model.train()
#     ae_loss = []
#     # for batch in train_loader:
#     for batch in tqdm(ae_loader, desc='AutoEncoder', leave=False):
#         imgs = batch[0].to(device)
#         out = ae_model(imgs)
#         # print(imgs.shape, out.shape)
#         loss = ae_criterion(out, imgs)
#         ae_optimizer.zero_grad()
#         loss.backward()
#         ae_optimizer.step()
#         ae_loss.append(loss.item())
#     # 
#     ae_loss = np.mean(ae_loss)
#     print(f"[ AE\t| {epoch + 1:03d}/{n_epochs:03d} ] loss = {ae_loss:.5f}")


In [12]:
log_path = 'log'
if log_path in os.listdir():
    os.remove(log_path)
    print('remove log')
log = open(log_path, 'a')

remove log


In [13]:
device = "cuda" if torch.cuda.is_available() else "cpu"
model_path = "./model.ckpt"

try:
    model = Classifier(1).to(device)
except NameError:
    print("Use new encoder")
    model = Classifier().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=5e-4, weight_decay=3e-3)
lr_scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(
    optimizer, mode='max', factor=0.8, patience=8, threshold=3e-4, min_lr=5e-5, verbose=True
)


n_epochs = 200
best_acc = 0
valid_acc = 0
# Whether to do semi-supervised learning.
do_semi = True
semi_flg = False
semi_count = 0
loader = train_loader
for epoch in range(n_epochs):
    # ---------- TODO ----------
    # In each epoch, relabel the unlabeled dataset for semi-supervised learning.
    # Then you can combine the labeled dataset and pseudo-labeled dataset for the training.
    # if (~semi_flg) and valid_acc >= .5:
    if False:
        # Obtain pseudo-labels for unlabeled data using trained model.
        pseudo_set = get_pseudo_labels(unlabeled_set, model, threshold=0.9)

        # Construct a new dataset and a data loader for training.
        # This is used in semi-supervised learning only.
        concat_dataset = ConcatDataset([train_set, pseudo_set])
        loader = DataLoader(concat_dataset, batch_size=batch_size, shuffle=True, num_workers=8, pin_memory=True)
        semi_flg = True
        log_str = f"Use pseudo label: {len(concat_dataset)}"
        print(log_str)
        log.write(log_str+"\n")

    if semi_count >= 2:
        loader = train_loader
        semi_flg = False
        semi_count = 0
        log_str = "Abort pseudo label"
        print(log_str)
        log.write(log_str+"\n")

    # ---------- Training ----------
    model.train()

    train_loss = []
    train_accs = []

    # for batch in train_loader:
    for batch in tqdm(loader, desc='Train', leave=False):

        imgs, labels = batch
        logits = model(imgs.to(device))
        loss = criterion(logits, labels.to(device))
        optimizer.zero_grad()
        loss.backward()
        grad_norm = nn.utils.clip_grad_norm_(model.parameters(), max_norm=10)
        optimizer.step()

        acc = (logits.argmax(dim=-1) == labels.to(device)).float().mean()
        train_loss.append(loss.item())
        train_accs.append(acc)

    train_loss = sum(train_loss) / len(train_loss)
    train_acc = sum(train_accs) / len(train_accs)

    log_str1 = (f"[ Train | {epoch + 1:03d}/{n_epochs:03d} ] loss = {train_loss:.5f}, acc = {train_acc:.5f}")
    # print(log)
    # ---------- Validation ----------
    model.eval()
    
    valid_loss = []
    valid_accs = []

    # for batch in valid_loader:
    for batch in tqdm(valid_loader, desc='Valid', leave=False):
        imgs, labels = batch
        imgs = imgs.to(device)
        with torch.no_grad():
          logits = model(imgs)

        loss = criterion(logits, labels.to(device))
        acc = (logits.argmax(dim=-1) == labels.to(device)).float().mean()
        valid_loss.append(loss.item())
        valid_accs.append(acc)

    valid_loss = sum(valid_loss) / len(valid_loss)
    valid_acc = sum(valid_accs) / len(valid_accs)
    if valid_acc >= best_acc:
        best_acc = valid_acc
        best_model = model.state_dict()
        torch.save(model.state_dict(), model_path)
        log_str = ('saving model with acc {:.3f}'.format(best_acc))
        print(log_str)
        log.write(log_str+"\n")
        semi_count = 0
    else:
        if semi_flg: semi_count += 1

    log_str1 += f"\t[ Valid\t| {epoch + 1:03d}/{n_epochs:03d} ]  loss = {valid_loss:.5f}, acc = {valid_acc:.5f}"
    print(log_str1)
    log.write(log_str1+"\n")
    log.flush()
    lr_scheduler.step(valid_acc)
    # print(f"[ Valid\t| {epoch + 1:03d}/{n_epochs:03d} ] ae_loss = {valid_ae:.5f}, loss = {valid_loss:.5f}, acc = {valid_acc:.5f}")

Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.134
[ Train | 001/200 ] loss = 2.26348, acc = 0.18781	[ Valid	| 001/200 ]  loss = 3.89402, acc = 0.13359


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.247
[ Train | 002/200 ] loss = 2.09713, acc = 0.25062	[ Valid	| 002/200 ]  loss = 2.33481, acc = 0.24740


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 003/200 ] loss = 2.07886, acc = 0.27656	[ Valid	| 003/200 ]  loss = 2.53025, acc = 0.19062


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.254
[ Train | 004/200 ] loss = 2.04318, acc = 0.27687	[ Valid	| 004/200 ]  loss = 2.22737, acc = 0.25365


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 005/200 ] loss = 1.99389, acc = 0.29688	[ Valid	| 005/200 ]  loss = 2.64558, acc = 0.22865


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 006/200 ] loss = 2.00670, acc = 0.28813	[ Valid	| 006/200 ]  loss = 2.34388, acc = 0.23203


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.276
[ Train | 007/200 ] loss = 1.94840, acc = 0.31344	[ Valid	| 007/200 ]  loss = 1.95279, acc = 0.27552


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.281
[ Train | 008/200 ] loss = 1.96620, acc = 0.30719	[ Valid	| 008/200 ]  loss = 2.20801, acc = 0.28099


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.336
[ Train | 009/200 ] loss = 1.95945, acc = 0.30531	[ Valid	| 009/200 ]  loss = 2.10767, acc = 0.33594


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.365
[ Train | 010/200 ] loss = 1.88202, acc = 0.32031	[ Valid	| 010/200 ]  loss = 1.83375, acc = 0.36536


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 011/200 ] loss = 1.92827, acc = 0.32156	[ Valid	| 011/200 ]  loss = 3.11779, acc = 0.19401


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 012/200 ] loss = 1.86487, acc = 0.35313	[ Valid	| 012/200 ]  loss = 1.96257, acc = 0.33880


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 013/200 ] loss = 1.89368, acc = 0.32375	[ Valid	| 013/200 ]  loss = 1.87740, acc = 0.36458


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.410
[ Train | 014/200 ] loss = 1.81273, acc = 0.36344	[ Valid	| 014/200 ]  loss = 1.70303, acc = 0.41042


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 015/200 ] loss = 1.82284, acc = 0.36937	[ Valid	| 015/200 ]  loss = 1.86521, acc = 0.37839


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 016/200 ] loss = 1.80496, acc = 0.36937	[ Valid	| 016/200 ]  loss = 1.79525, acc = 0.36380


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 017/200 ] loss = 1.77741, acc = 0.38094	[ Valid	| 017/200 ]  loss = 1.96582, acc = 0.34141


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 018/200 ] loss = 1.81020, acc = 0.36500	[ Valid	| 018/200 ]  loss = 1.90764, acc = 0.38099


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 019/200 ] loss = 1.73691, acc = 0.38844	[ Valid	| 019/200 ]  loss = 1.75664, acc = 0.39323


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 020/200 ] loss = 1.80752, acc = 0.37250	[ Valid	| 020/200 ]  loss = 1.90547, acc = 0.31589


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.430
[ Train | 021/200 ] loss = 1.72983, acc = 0.39656	[ Valid	| 021/200 ]  loss = 1.76406, acc = 0.43021


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 022/200 ] loss = 1.72911, acc = 0.39281	[ Valid	| 022/200 ]  loss = 1.77344, acc = 0.36589


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 023/200 ] loss = 1.65939, acc = 0.42063	[ Valid	| 023/200 ]  loss = 1.83242, acc = 0.35391


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 024/200 ] loss = 1.67513, acc = 0.41219	[ Valid	| 024/200 ]  loss = 1.88612, acc = 0.38385


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 025/200 ] loss = 1.65718, acc = 0.42031	[ Valid	| 025/200 ]  loss = 1.85097, acc = 0.38880


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 026/200 ] loss = 1.68789, acc = 0.41375	[ Valid	| 026/200 ]  loss = 1.90403, acc = 0.36953


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 027/200 ] loss = 1.69011, acc = 0.40031	[ Valid	| 027/200 ]  loss = 1.89747, acc = 0.37656


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 028/200 ] loss = 1.61741, acc = 0.43406	[ Valid	| 028/200 ]  loss = 1.70541, acc = 0.41432


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 029/200 ] loss = 1.63699, acc = 0.42594	[ Valid	| 029/200 ]  loss = 1.73000, acc = 0.40911


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 030/200 ] loss = 1.66601, acc = 0.42344	[ Valid	| 030/200 ]  loss = 1.97355, acc = 0.33255
Epoch    30: reducing learning rate of group 0 to 4.0000e-04.


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 031/200 ] loss = 1.65080, acc = 0.41813	[ Valid	| 031/200 ]  loss = 1.85748, acc = 0.37995


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.443
[ Train | 032/200 ] loss = 1.61232, acc = 0.44156	[ Valid	| 032/200 ]  loss = 1.69228, acc = 0.44271


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.446
[ Train | 033/200 ] loss = 1.57943, acc = 0.44281	[ Valid	| 033/200 ]  loss = 1.76581, acc = 0.44557


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.462
[ Train | 034/200 ] loss = 1.55657, acc = 0.45344	[ Valid	| 034/200 ]  loss = 1.57364, acc = 0.46172


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 035/200 ] loss = 1.52770, acc = 0.46469	[ Valid	| 035/200 ]  loss = 1.76005, acc = 0.42083


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 036/200 ] loss = 1.55138, acc = 0.45812	[ Valid	| 036/200 ]  loss = 1.82036, acc = 0.38411


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 037/200 ] loss = 1.57448, acc = 0.45312	[ Valid	| 037/200 ]  loss = 1.66569, acc = 0.44453


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 038/200 ] loss = 1.52271, acc = 0.45750	[ Valid	| 038/200 ]  loss = 1.81614, acc = 0.38828


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 039/200 ] loss = 1.49654, acc = 0.48219	[ Valid	| 039/200 ]  loss = 1.70460, acc = 0.41380


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 040/200 ] loss = 1.53632, acc = 0.45906	[ Valid	| 040/200 ]  loss = 1.58103, acc = 0.44583


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 041/200 ] loss = 1.53105, acc = 0.46562	[ Valid	| 041/200 ]  loss = 1.63714, acc = 0.44089


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 042/200 ] loss = 1.47397, acc = 0.49031	[ Valid	| 042/200 ]  loss = 1.86523, acc = 0.38854


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 043/200 ] loss = 1.49185, acc = 0.47969	[ Valid	| 043/200 ]  loss = 1.85085, acc = 0.44063
Epoch    43: reducing learning rate of group 0 to 3.2000e-04.


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.483
[ Train | 044/200 ] loss = 1.44835, acc = 0.49531	[ Valid	| 044/200 ]  loss = 1.53510, acc = 0.48307


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 045/200 ] loss = 1.36718, acc = 0.52938	[ Valid	| 045/200 ]  loss = 1.74350, acc = 0.44635


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 046/200 ] loss = 1.39701, acc = 0.52281	[ Valid	| 046/200 ]  loss = 1.66795, acc = 0.42292


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 047/200 ] loss = 1.35329, acc = 0.52344	[ Valid	| 047/200 ]  loss = 1.53802, acc = 0.45469


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 048/200 ] loss = 1.37247, acc = 0.53031	[ Valid	| 048/200 ]  loss = 1.65700, acc = 0.47396


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 049/200 ] loss = 1.34793, acc = 0.53812	[ Valid	| 049/200 ]  loss = 1.63217, acc = 0.43828


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.496
[ Train | 050/200 ] loss = 1.36568, acc = 0.52969	[ Valid	| 050/200 ]  loss = 1.49054, acc = 0.49635


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.499
[ Train | 051/200 ] loss = 1.32529, acc = 0.54219	[ Valid	| 051/200 ]  loss = 1.54018, acc = 0.49922


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.501
[ Train | 052/200 ] loss = 1.30450, acc = 0.55156	[ Valid	| 052/200 ]  loss = 1.58071, acc = 0.50052


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 053/200 ] loss = 1.32946, acc = 0.55750	[ Valid	| 053/200 ]  loss = 1.70952, acc = 0.45937


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.503
[ Train | 054/200 ] loss = 1.27544, acc = 0.56000	[ Valid	| 054/200 ]  loss = 1.58427, acc = 0.50313


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 055/200 ] loss = 1.28687, acc = 0.56906	[ Valid	| 055/200 ]  loss = 1.63089, acc = 0.48750


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.537
[ Train | 056/200 ] loss = 1.28243, acc = 0.56375	[ Valid	| 056/200 ]  loss = 1.39257, acc = 0.53698


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.553
[ Train | 057/200 ] loss = 1.20019, acc = 0.60344	[ Valid	| 057/200 ]  loss = 1.34946, acc = 0.55286


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 058/200 ] loss = 1.28908, acc = 0.56156	[ Valid	| 058/200 ]  loss = 1.69916, acc = 0.48229


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.553
[ Train | 059/200 ] loss = 1.20973, acc = 0.58750	[ Valid	| 059/200 ]  loss = 1.39038, acc = 0.55313


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 060/200 ] loss = 1.17614, acc = 0.58906	[ Valid	| 060/200 ]  loss = 1.43604, acc = 0.54922


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.567
[ Train | 061/200 ] loss = 1.16984, acc = 0.60656	[ Valid	| 061/200 ]  loss = 1.28221, acc = 0.56745


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 062/200 ] loss = 1.23110, acc = 0.58125	[ Valid	| 062/200 ]  loss = 1.36281, acc = 0.54115


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 063/200 ] loss = 1.21135, acc = 0.58562	[ Valid	| 063/200 ]  loss = 1.66026, acc = 0.49219


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 064/200 ] loss = 1.11276, acc = 0.63187	[ Valid	| 064/200 ]  loss = 1.53174, acc = 0.54870


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 065/200 ] loss = 1.14539, acc = 0.60531	[ Valid	| 065/200 ]  loss = 1.48880, acc = 0.54531


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 066/200 ] loss = 1.13766, acc = 0.61187	[ Valid	| 066/200 ]  loss = 1.53921, acc = 0.52891


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 067/200 ] loss = 1.11842, acc = 0.62250	[ Valid	| 067/200 ]  loss = 1.67257, acc = 0.51302


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 068/200 ] loss = 1.10558, acc = 0.62937	[ Valid	| 068/200 ]  loss = 1.49071, acc = 0.54453


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 069/200 ] loss = 1.10142, acc = 0.63063	[ Valid	| 069/200 ]  loss = 1.32063, acc = 0.53854


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 070/200 ] loss = 1.07967, acc = 0.63781	[ Valid	| 070/200 ]  loss = 1.45827, acc = 0.54297
Epoch    70: reducing learning rate of group 0 to 2.5600e-04.


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 071/200 ] loss = 1.03595, acc = 0.65187	[ Valid	| 071/200 ]  loss = 1.36775, acc = 0.55677


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.612
[ Train | 072/200 ] loss = 0.98573, acc = 0.66812	[ Valid	| 072/200 ]  loss = 1.17330, acc = 0.61172


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 073/200 ] loss = 0.98996, acc = 0.66781	[ Valid	| 073/200 ]  loss = 1.22192, acc = 0.60573


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 074/200 ] loss = 1.02459, acc = 0.65031	[ Valid	| 074/200 ]  loss = 1.33973, acc = 0.55729


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 075/200 ] loss = 0.98411, acc = 0.68437	[ Valid	| 075/200 ]  loss = 1.46476, acc = 0.52813


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 076/200 ] loss = 1.00163, acc = 0.66406	[ Valid	| 076/200 ]  loss = 1.31998, acc = 0.57708


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.630
[ Train | 077/200 ] loss = 0.92490, acc = 0.69281	[ Valid	| 077/200 ]  loss = 1.20204, acc = 0.63021


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 078/200 ] loss = 0.92447, acc = 0.69281	[ Valid	| 078/200 ]  loss = 1.34514, acc = 0.56484


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.653
[ Train | 079/200 ] loss = 0.94537, acc = 0.68062	[ Valid	| 079/200 ]  loss = 1.14973, acc = 0.65260


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 080/200 ] loss = 0.91412, acc = 0.69437	[ Valid	| 080/200 ]  loss = 1.10665, acc = 0.62865


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 081/200 ] loss = 0.92955, acc = 0.68844	[ Valid	| 081/200 ]  loss = 1.43103, acc = 0.55859


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 082/200 ] loss = 0.91531, acc = 0.70125	[ Valid	| 082/200 ]  loss = 1.41391, acc = 0.55078


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 083/200 ] loss = 0.91764, acc = 0.68969	[ Valid	| 083/200 ]  loss = 1.25689, acc = 0.59271


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 084/200 ] loss = 0.91337, acc = 0.68937	[ Valid	| 084/200 ]  loss = 1.33300, acc = 0.58255


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 085/200 ] loss = 0.88918, acc = 0.69969	[ Valid	| 085/200 ]  loss = 1.11114, acc = 0.64323


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 086/200 ] loss = 0.86396, acc = 0.72062	[ Valid	| 086/200 ]  loss = 1.10421, acc = 0.63958


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 087/200 ] loss = 0.84764, acc = 0.71406	[ Valid	| 087/200 ]  loss = 1.33145, acc = 0.58203


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 088/200 ] loss = 0.83334, acc = 0.71219	[ Valid	| 088/200 ]  loss = 1.31851, acc = 0.58854
Epoch    88: reducing learning rate of group 0 to 2.0480e-04.


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 089/200 ] loss = 0.81532, acc = 0.71969	[ Valid	| 089/200 ]  loss = 1.10171, acc = 0.65156


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.669
[ Train | 090/200 ] loss = 0.81630, acc = 0.72969	[ Valid	| 090/200 ]  loss = 1.08548, acc = 0.66927


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 091/200 ] loss = 0.83405, acc = 0.72281	[ Valid	| 091/200 ]  loss = 1.23182, acc = 0.61563


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.703
[ Train | 092/200 ] loss = 0.75303, acc = 0.74937	[ Valid	| 092/200 ]  loss = 0.97792, acc = 0.70312


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 093/200 ] loss = 0.74957, acc = 0.74656	[ Valid	| 093/200 ]  loss = 1.23373, acc = 0.63229


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 094/200 ] loss = 0.78468, acc = 0.73594	[ Valid	| 094/200 ]  loss = 1.53423, acc = 0.55938


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 095/200 ] loss = 0.76956, acc = 0.74625	[ Valid	| 095/200 ]  loss = 1.27343, acc = 0.61328


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 096/200 ] loss = 0.76042, acc = 0.76094	[ Valid	| 096/200 ]  loss = 1.20286, acc = 0.62839


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 097/200 ] loss = 0.74789, acc = 0.75500	[ Valid	| 097/200 ]  loss = 1.32656, acc = 0.60703


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 098/200 ] loss = 0.71735, acc = 0.77344	[ Valid	| 098/200 ]  loss = 1.18708, acc = 0.62943


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 099/200 ] loss = 0.73760, acc = 0.74437	[ Valid	| 099/200 ]  loss = 1.20297, acc = 0.63594


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 100/200 ] loss = 0.73319, acc = 0.76656	[ Valid	| 100/200 ]  loss = 1.27638, acc = 0.61823


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 101/200 ] loss = 0.69195, acc = 0.77250	[ Valid	| 101/200 ]  loss = 1.05980, acc = 0.67448
Epoch   101: reducing learning rate of group 0 to 1.6384e-04.


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 102/200 ] loss = 0.64436, acc = 0.78875	[ Valid	| 102/200 ]  loss = 1.26642, acc = 0.61745


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 103/200 ] loss = 0.63856, acc = 0.78187	[ Valid	| 103/200 ]  loss = 1.19908, acc = 0.61328


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 104/200 ] loss = 0.61058, acc = 0.80750	[ Valid	| 104/200 ]  loss = 1.04593, acc = 0.68385


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 105/200 ] loss = 0.61499, acc = 0.80187	[ Valid	| 105/200 ]  loss = 1.05912, acc = 0.67891


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 106/200 ] loss = 0.65804, acc = 0.79281	[ Valid	| 106/200 ]  loss = 1.34542, acc = 0.64193


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 107/200 ] loss = 0.65174, acc = 0.78688	[ Valid	| 107/200 ]  loss = 1.02832, acc = 0.67240


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 108/200 ] loss = 0.61454, acc = 0.79188	[ Valid	| 108/200 ]  loss = 1.05834, acc = 0.65339


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 109/200 ] loss = 0.60459, acc = 0.80375	[ Valid	| 109/200 ]  loss = 1.08489, acc = 0.64792


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 110/200 ] loss = 0.61066, acc = 0.80125	[ Valid	| 110/200 ]  loss = 1.05155, acc = 0.65469
Epoch   110: reducing learning rate of group 0 to 1.3107e-04.


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 111/200 ] loss = 0.60666, acc = 0.81250	[ Valid	| 111/200 ]  loss = 1.29898, acc = 0.60807


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.715
[ Train | 112/200 ] loss = 0.57641, acc = 0.81219	[ Valid	| 112/200 ]  loss = 0.98106, acc = 0.71484


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 113/200 ] loss = 0.55122, acc = 0.82687	[ Valid	| 113/200 ]  loss = 1.11660, acc = 0.67891


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 114/200 ] loss = 0.57339, acc = 0.81969	[ Valid	| 114/200 ]  loss = 1.24728, acc = 0.63698


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 115/200 ] loss = 0.55441, acc = 0.81625	[ Valid	| 115/200 ]  loss = 1.24131, acc = 0.65391


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.730
[ Train | 116/200 ] loss = 0.50573, acc = 0.83656	[ Valid	| 116/200 ]  loss = 0.87107, acc = 0.73021


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 117/200 ] loss = 0.52855, acc = 0.82875	[ Valid	| 117/200 ]  loss = 1.00162, acc = 0.69740


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 118/200 ] loss = 0.53450, acc = 0.82406	[ Valid	| 118/200 ]  loss = 1.20643, acc = 0.63047


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.730
[ Train | 119/200 ] loss = 0.52888, acc = 0.83594	[ Valid	| 119/200 ]  loss = 0.90110, acc = 0.73021


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 120/200 ] loss = 0.51212, acc = 0.83219	[ Valid	| 120/200 ]  loss = 0.91383, acc = 0.72266


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 121/200 ] loss = 0.47068, acc = 0.85125	[ Valid	| 121/200 ]  loss = 1.13240, acc = 0.68411


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 122/200 ] loss = 0.52604, acc = 0.82094	[ Valid	| 122/200 ]  loss = 1.21107, acc = 0.65026


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 123/200 ] loss = 0.54366, acc = 0.82406	[ Valid	| 123/200 ]  loss = 0.95158, acc = 0.71016


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 124/200 ] loss = 0.49926, acc = 0.84000	[ Valid	| 124/200 ]  loss = 0.96508, acc = 0.72526


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.734
[ Train | 125/200 ] loss = 0.49124, acc = 0.83969	[ Valid	| 125/200 ]  loss = 0.94052, acc = 0.73359


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 126/200 ] loss = 0.49403, acc = 0.83750	[ Valid	| 126/200 ]  loss = 1.21364, acc = 0.65703


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 127/200 ] loss = 0.52901, acc = 0.83562	[ Valid	| 127/200 ]  loss = 1.08979, acc = 0.69609


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 128/200 ] loss = 0.50102, acc = 0.84031	[ Valid	| 128/200 ]  loss = 1.10171, acc = 0.68151


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 129/200 ] loss = 0.51352, acc = 0.84187	[ Valid	| 129/200 ]  loss = 1.26348, acc = 0.66615


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 130/200 ] loss = 0.43577, acc = 0.85469	[ Valid	| 130/200 ]  loss = 1.11097, acc = 0.66016


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 131/200 ] loss = 0.48244, acc = 0.83687	[ Valid	| 131/200 ]  loss = 1.16585, acc = 0.66979


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.735
[ Train | 132/200 ] loss = 0.46509, acc = 0.85250	[ Valid	| 132/200 ]  loss = 0.95806, acc = 0.73490


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 133/200 ] loss = 0.47926, acc = 0.85031	[ Valid	| 133/200 ]  loss = 1.14625, acc = 0.69922


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 134/200 ] loss = 0.45032, acc = 0.85125	[ Valid	| 134/200 ]  loss = 0.93489, acc = 0.69688


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 135/200 ] loss = 0.47345, acc = 0.84719	[ Valid	| 135/200 ]  loss = 1.16689, acc = 0.67526


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 136/200 ] loss = 0.45733, acc = 0.85531	[ Valid	| 136/200 ]  loss = 1.18278, acc = 0.65521


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 137/200 ] loss = 0.42947, acc = 0.86437	[ Valid	| 137/200 ]  loss = 1.09067, acc = 0.68672


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 138/200 ] loss = 0.43271, acc = 0.85250	[ Valid	| 138/200 ]  loss = 1.02806, acc = 0.70000


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 139/200 ] loss = 0.45496, acc = 0.84906	[ Valid	| 139/200 ]  loss = 0.98271, acc = 0.72109


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 140/200 ] loss = 0.45249, acc = 0.85000	[ Valid	| 140/200 ]  loss = 1.27861, acc = 0.66484


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 141/200 ] loss = 0.38303, acc = 0.87594	[ Valid	| 141/200 ]  loss = 1.22535, acc = 0.68307
Epoch   141: reducing learning rate of group 0 to 1.0486e-04.


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 142/200 ] loss = 0.39445, acc = 0.87531	[ Valid	| 142/200 ]  loss = 1.12873, acc = 0.68099


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 143/200 ] loss = 0.39812, acc = 0.87125	[ Valid	| 143/200 ]  loss = 1.09736, acc = 0.67708


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 144/200 ] loss = 0.37008, acc = 0.88781	[ Valid	| 144/200 ]  loss = 1.14098, acc = 0.68359


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 145/200 ] loss = 0.36515, acc = 0.88750	[ Valid	| 145/200 ]  loss = 1.26768, acc = 0.65208


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 146/200 ] loss = 0.36404, acc = 0.88000	[ Valid	| 146/200 ]  loss = 0.99142, acc = 0.71276


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 147/200 ] loss = 0.35925, acc = 0.88781	[ Valid	| 147/200 ]  loss = 1.02094, acc = 0.68516


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 148/200 ] loss = 0.39238, acc = 0.87406	[ Valid	| 148/200 ]  loss = 1.06571, acc = 0.70156


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 149/200 ] loss = 0.38131, acc = 0.87656	[ Valid	| 149/200 ]  loss = 1.06757, acc = 0.70573


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 150/200 ] loss = 0.37716, acc = 0.88500	[ Valid	| 150/200 ]  loss = 1.09171, acc = 0.67630
Epoch   150: reducing learning rate of group 0 to 8.3886e-05.


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 151/200 ] loss = 0.32058, acc = 0.90219	[ Valid	| 151/200 ]  loss = 1.07469, acc = 0.69870


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 152/200 ] loss = 0.33116, acc = 0.89594	[ Valid	| 152/200 ]  loss = 1.05432, acc = 0.68411


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 153/200 ] loss = 0.29133, acc = 0.90531	[ Valid	| 153/200 ]  loss = 1.07289, acc = 0.72005


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 154/200 ] loss = 0.34309, acc = 0.89219	[ Valid	| 154/200 ]  loss = 1.09035, acc = 0.69401


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 155/200 ] loss = 0.33986, acc = 0.89344	[ Valid	| 155/200 ]  loss = 1.00168, acc = 0.69844


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.740
[ Train | 156/200 ] loss = 0.31507, acc = 0.89937	[ Valid	| 156/200 ]  loss = 0.97289, acc = 0.73958


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 157/200 ] loss = 0.29541, acc = 0.91469	[ Valid	| 157/200 ]  loss = 1.01911, acc = 0.72266


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 158/200 ] loss = 0.27873, acc = 0.91781	[ Valid	| 158/200 ]  loss = 1.05813, acc = 0.69323


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 159/200 ] loss = 0.31314, acc = 0.90594	[ Valid	| 159/200 ]  loss = 1.01659, acc = 0.73724


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 160/200 ] loss = 0.29892, acc = 0.90937	[ Valid	| 160/200 ]  loss = 1.01083, acc = 0.71302


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 161/200 ] loss = 0.31422, acc = 0.89781	[ Valid	| 161/200 ]  loss = 1.07604, acc = 0.70000


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 162/200 ] loss = 0.31445, acc = 0.89375	[ Valid	| 162/200 ]  loss = 0.98522, acc = 0.71927


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 163/200 ] loss = 0.31527, acc = 0.90156	[ Valid	| 163/200 ]  loss = 0.94568, acc = 0.72708


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 164/200 ] loss = 0.31391, acc = 0.89969	[ Valid	| 164/200 ]  loss = 1.24313, acc = 0.65417


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 165/200 ] loss = 0.29694, acc = 0.91094	[ Valid	| 165/200 ]  loss = 1.10274, acc = 0.70547
Epoch   165: reducing learning rate of group 0 to 6.7109e-05.


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 166/200 ] loss = 0.26572, acc = 0.90562	[ Valid	| 166/200 ]  loss = 0.95366, acc = 0.73438


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.745
[ Train | 167/200 ] loss = 0.28676, acc = 0.91906	[ Valid	| 167/200 ]  loss = 0.92954, acc = 0.74531


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 168/200 ] loss = 0.24735, acc = 0.92531	[ Valid	| 168/200 ]  loss = 0.99552, acc = 0.72708


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 169/200 ] loss = 0.26229, acc = 0.92125	[ Valid	| 169/200 ]  loss = 1.07626, acc = 0.69557


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 170/200 ] loss = 0.25072, acc = 0.91594	[ Valid	| 170/200 ]  loss = 0.95704, acc = 0.73073


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 171/200 ] loss = 0.25696, acc = 0.92687	[ Valid	| 171/200 ]  loss = 0.91250, acc = 0.73411


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 172/200 ] loss = 0.26655, acc = 0.92312	[ Valid	| 172/200 ]  loss = 0.99417, acc = 0.72760


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 173/200 ] loss = 0.27994, acc = 0.91656	[ Valid	| 173/200 ]  loss = 0.98818, acc = 0.73646


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 174/200 ] loss = 0.27834, acc = 0.92219	[ Valid	| 174/200 ]  loss = 0.93842, acc = 0.72656


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 175/200 ] loss = 0.23983, acc = 0.92656	[ Valid	| 175/200 ]  loss = 1.04833, acc = 0.72786


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 176/200 ] loss = 0.25858, acc = 0.91625	[ Valid	| 176/200 ]  loss = 0.99287, acc = 0.70911
Epoch   176: reducing learning rate of group 0 to 5.3687e-05.


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.751
[ Train | 177/200 ] loss = 0.28930, acc = 0.91969	[ Valid	| 177/200 ]  loss = 0.95705, acc = 0.75052


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 178/200 ] loss = 0.25882, acc = 0.92844	[ Valid	| 178/200 ]  loss = 0.94625, acc = 0.72292


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.763
[ Train | 179/200 ] loss = 0.20018, acc = 0.94500	[ Valid	| 179/200 ]  loss = 0.85966, acc = 0.76328


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 180/200 ] loss = 0.26476, acc = 0.92625	[ Valid	| 180/200 ]  loss = 0.93347, acc = 0.74792


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 181/200 ] loss = 0.24245, acc = 0.93281	[ Valid	| 181/200 ]  loss = 0.93766, acc = 0.73802


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 182/200 ] loss = 0.20953, acc = 0.94000	[ Valid	| 182/200 ]  loss = 0.97799, acc = 0.75130


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.784
[ Train | 183/200 ] loss = 0.23504, acc = 0.92844	[ Valid	| 183/200 ]  loss = 0.82828, acc = 0.78411


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 184/200 ] loss = 0.22581, acc = 0.93031	[ Valid	| 184/200 ]  loss = 1.01972, acc = 0.70547


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 185/200 ] loss = 0.20898, acc = 0.93750	[ Valid	| 185/200 ]  loss = 0.98282, acc = 0.72786


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 186/200 ] loss = 0.19404, acc = 0.94437	[ Valid	| 186/200 ]  loss = 0.88946, acc = 0.76406


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 187/200 ] loss = 0.18966, acc = 0.94656	[ Valid	| 187/200 ]  loss = 0.95093, acc = 0.74479


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 188/200 ] loss = 0.24105, acc = 0.92312	[ Valid	| 188/200 ]  loss = 1.01512, acc = 0.73932


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 189/200 ] loss = 0.19564, acc = 0.94500	[ Valid	| 189/200 ]  loss = 0.85594, acc = 0.75885


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 190/200 ] loss = 0.25362, acc = 0.92500	[ Valid	| 190/200 ]  loss = 1.01721, acc = 0.72943


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 191/200 ] loss = 0.20116, acc = 0.93875	[ Valid	| 191/200 ]  loss = 1.04398, acc = 0.72135


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 192/200 ] loss = 0.19863, acc = 0.93750	[ Valid	| 192/200 ]  loss = 1.00214, acc = 0.76693
Epoch   192: reducing learning rate of group 0 to 5.0000e-05.


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 193/200 ] loss = 0.21238, acc = 0.93656	[ Valid	| 193/200 ]  loss = 0.90697, acc = 0.74792


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 194/200 ] loss = 0.18139, acc = 0.94406	[ Valid	| 194/200 ]  loss = 0.94516, acc = 0.73307


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 195/200 ] loss = 0.18256, acc = 0.95406	[ Valid	| 195/200 ]  loss = 0.92801, acc = 0.76146


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 196/200 ] loss = 0.17491, acc = 0.94969	[ Valid	| 196/200 ]  loss = 1.07709, acc = 0.74141


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 197/200 ] loss = 0.20162, acc = 0.93906	[ Valid	| 197/200 ]  loss = 1.11974, acc = 0.72786


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 198/200 ] loss = 0.23276, acc = 0.92469	[ Valid	| 198/200 ]  loss = 0.95614, acc = 0.75495


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 199/200 ] loss = 0.18374, acc = 0.94437	[ Valid	| 199/200 ]  loss = 0.96522, acc = 0.73099


Train:   0%|          | 0/25 [00:00<?, ?it/s]

Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Train | 200/200 ] loss = 0.19482, acc = 0.93937	[ Valid	| 200/200 ]  loss = 0.99726, acc = 0.73490


## **Testing**

For inference, we need to make sure the model is in eval mode, and the order of the dataset should not be shuffled ("shuffle=False" in test_loader).

Last but not least, don't forget to save the predictions into a single CSV file.
The format of CSV file should follow the rules mentioned in the slides.

### **WARNING -- Keep in Mind**

Cheating includes but not limited to:
1.   using testing labels,
2.   submitting results to previous Kaggle competitions,
3.   sharing predictions with others,
4.   copying codes from any creatures on Earth,
5.   asking other people to do it for you.

Any violations bring you punishments from getting a discount on the final grade to failing the course.

It is your responsibility to check whether your code violates the rules.
When citing codes from the Internet, you should know what these codes exactly do.
You will **NOT** be tolerated if you break the rule and claim you don't know what these codes do.


it = iter(test_loader)
a = next(it)[0].to(device)
ae_model.eval()
b = ae_model(a)
import cv2
b = b.to('cpu').detach().numpy().transpose([0, 2, 3, 1])
a = a.to('cpu').detach().numpy().transpose([0, 2, 3, 1])
def norm(img):
    for i in range(3):
        ch = img[:, :, i]
        a = ch.min()
        b = ch.max()
        img[:, :, i] = (ch-a)/(b-a)
    return (img*255).astype('uint8')
for i, img in enumerate(b):
    cv2.imwrite(f"img/{i}.png", norm(img))
    cv2.imwrite(f"img/{i}_.png", norm(a[i]))
    if i == 0:
        break

In [14]:
# Make sure the model is in eval mode.
# Some modules like Dropout or BatchNorm affect if the model is in training mode.
model = Classifier(1).to(device)
model.load_state_dict(torch.load(model_path))
model.eval()

# Initialize a list to store the predictions.
predictions = []

# Iterate the testing set by batches.
for batch in tqdm(test_loader):
    # A batch consists of image data and corresponding labels.
    # But here the variable "labels" is useless since we do not have the ground-truth.
    # If printing out the labels, you will find that it is always 0.
    # This is because the wrapper (DatasetFolder) returns images and labels for each batch,
    # so we have to create fake labels to make it work normally.
    imgs, labels = batch

    # We don't need gradient in testing, and we don't even have labels to compute loss.
    # Using torch.no_grad() accelerates the forward process.
    with torch.no_grad():
        logits = model(imgs.to(device))

    # Take the class with greatest logit as prediction and record it.
    predictions.extend(logits.argmax(dim=-1).cpu().numpy().tolist())

  0%|          | 0/27 [00:00<?, ?it/s]

In [15]:
# Save predictions into the file.
with open("predict.csv", "w") as f:

    # The first row must be "Id, Category"
    f.write("Id,Category\n")

    # For the rest of the rows, each image id corresponds to a predicted class.
    for i, pred in  enumerate(predictions):
         f.write(f"{i},{pred}\n")