# **Homework 3 - Convolutional Neural Network**

This is the example code of homework 3 of the machine learning course by Prof. Hung-yi Lee.

In this homework, you are required to build a convolutional neural network for image classification, possibly with some advanced training tips.


There are three levels here:

**Easy**: Build a simple convolutional neural network as the baseline. (2 pts)

**Medium**: Design a better architecture or adopt different data augmentations to improve the performance. (2 pts)

**Hard**: Utilize provided unlabeled data to obtain better results. (2 pts)

## **About the Dataset**

The dataset used here is food-11, a collection of food images in 11 classes.

For the requirement in the homework, TAs slightly modified the data.
Please DO NOT access the original fully-labeled training data or testing labels.

Also, the modified dataset is for this course only, and any further distribution or commercial use is forbidden.

In [1]:
import os

os.environ['CUDA_VISIBLE_DEVICES'] = '0'

In [2]:
# Download the dataset
# You may choose where to download the data.

# Google Drive
# !gdown --id '1awF7pZ9Dz7X1jn1_QAiKN-_v56veCEKy' --output food-11.zip

# Dropbox
# !wget https://www.dropbox.com/s/m9q6273jl3djall/food-11.zip -O food-11.zip

# Unzip the dataset.
# This may take some time.
# !unzip -q food-11.zip

## **Import Packages**

First, we need to import packages that will be used later.

In this homework, we highly rely on **torchvision**, a library of PyTorch.

In [3]:
# Import necessary packages.
import numpy as np
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
from PIL import Image
# "ConcatDataset" and "Subset" are possibly useful when doing semi-supervised learning.
from torch.utils.data import ConcatDataset, DataLoader, Subset
from torchvision.datasets import DatasetFolder

# This is for the progress bar.
from tqdm.notebook import tqdm

## **Dataset, Data Loader, and Transforms**

Torchvision provides lots of useful utilities for image preprocessing, data wrapping as well as data augmentation.

Here, since our data are stored in folders by class labels, we can directly apply **torchvision.datasets.DatasetFolder** for wrapping data without much effort.

Please refer to [PyTorch official website](https://pytorch.org/vision/stable/transforms.html) for details about different transforms.

In [4]:
# It is important to do data augmentation in training.
# However, not every augmentation is useful.
# Please think about what kind of augmentation is helpful for food recognition.
train_tfm = transforms.Compose([
    # Resize the image into a fixed shape (height = width = 128)
    # transforms.Resize(256),
    transforms.RandomRotation(180),
    transforms.RandomResizedCrop(224),
    # transforms.RandomCrop(224),
    # transforms.CenterCrop(224),
    # transforms.ColorJitter(brightness=0.3, contrast=0.3, saturation=.2, hue=0.08),
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.RandomApply(
                [transforms.ColorJitter(brightness=0.4, contrast=0.4,saturation=0.2, hue=0.1)],
                p=0.8
    ),
    transforms.RandomGrayscale(p=0.2),
    transforms.GaussianBlur(kernel_size=9, sigma=(0.1, 2.0)),
    transforms.ToTensor(),
    transforms.RandomErasing(p=0.5, scale=(0.05, 0.2), ratio=(0.5, 1.5), value=0, inplace=False),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

# We don't need augmentations in testing and validation.
# All we need here is to resize the PIL image and transform it into Tensor.
test_tfm = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])


In [5]:
class RandomTransformDataset:
    def __init__(self, dataset, transform, random_time):
        self.dataset = dataset
        self.transform = transform
        self.random_time = random_time
    
    def __len__(self):
        return len(self.dataset)

    def __getitem__(self, idx):
        jpg, ans = self.dataset[idx]
        imgs = [self.transform(jpg) for i in range(self.random_time)]
        # (RANDOM_NUM, 3, shape0, shape1)
        return torch.stack(imgs, 0)
    
    def merge_batch(self, imgs, ans=None):
        # imgs: (batch, RANDOM_NUM, 3, shape0, shape1) -> (batch*RANDOM_NUM, ...)
        # ans:  (batch) -> (batch*RANDOM_NUM)
        # if ans:
        #     return imgs.reshape(-1, *imgs.shape[2:]), torch.repeat_interleave(ans, self.random_time)
        # else:
        return imgs.reshape(-1, *imgs.shape[2:])
    
    def merge_predict(self, predicts):
        # (batch*RANOM_NUM, 11)
        res = []
        labels = torch.argmax(predicts, dim=1).squeeze()
        # (batch*RANOM_NUM)
        for prob in torch.split(labels, self.random_time):
            # (RANDOM_NUM)
            # voting
            res.append(torch.argmax(torch.bincount(prob)))
        # (batch)
        return  res

class CLRDataset:
    def __init__(self, dataset, transformA , transformB):
        self.dataset = dataset
        self.transformA = transformA
        self.transformB = transformB
    
    def __len__(self):
        return len(self.dataset)

    def __getitem__(self, idx):
        jpg, ans = self.dataset[idx]
        imgs = [self.transformA(jpg), self.transformB(jpg)]
        # (RANDOM_NUM, 3, shape0, shape1)
        return torch.stack(imgs, 0), ans


In [6]:
# Batch size for training, validation, and testing.
# A greater batch size usually gives a more stable gradient.
# But the GPU memory is limited, so please adjust it carefully.
batch_size = 128

# Construct datasets.
# The argument "loader" tells how torchvision reads the data.
data_folder = "/data/ML2021/hw3/"

train_set = DatasetFolder(data_folder+"food-11/training/labeled", loader=lambda x: Image.open(x),extensions="jpg", transform=train_tfm)
# clr_train_set = CLRDataset(train_set, train_tfm, train_tfm)
valid_set = DatasetFolder(data_folder+"food-11/validation", loader=lambda x: Image.open(x), extensions="jpg", transform=test_tfm)
# test_set = DatasetFolder(data_folder+"food-11/testing", loader=lambda x: Image.open(x), extensions="jpg", transform=test_tfm)

unlabeled_set = DatasetFolder(data_folder+"food-11/training/unlabeled", loader=lambda x: Image.open(x), extensions="jpg", transform=None)
unlabeled_set.classes = train_set.classes
unlabeled_set.class_to_idx = train_set.class_to_idx
clr_unlab_set = CLRDataset(unlabeled_set, train_tfm, train_tfm)

# Construct data loaders.
# TODO: CLR on labeled data
train_loader = DataLoader(train_set, batch_size=batch_size, shuffle=True, num_workers=8, pin_memory=True, drop_last=True)
unlab_loader = DataLoader(clr_unlab_set, batch_size=batch_size//2, shuffle=True, num_workers=16, pin_memory=True, drop_last=True)
valid_loader = DataLoader(valid_set, batch_size=batch_size, shuffle=True, num_workers=2, pin_memory=True)
# test_loader = DataLoader(test_set, batch_size=batch_size, shuffle=False)

In [7]:
RANDOM_NUM = 16
test_set = DatasetFolder(data_folder+"food-11/testing", loader=lambda x: Image.open(x), extensions="jpg", transform=None)
rd_test_set = RandomTransformDataset(test_set, train_tfm, random_time=RANDOM_NUM)
rd_test_loader = DataLoader(rd_test_set, batch_size=batch_size//RANDOM_NUM*2, shuffle=False, num_workers=16)
test_loader = DataLoader(DatasetFolder(data_folder+"food-11/testing", loader=lambda x: Image.open(x), extensions="jpg", transform=test_tfm), batch_size=batch_size, shuffle=False, num_workers=16)


## **Model**

The basic model here is simply a stack of convolutional layers followed by some fully-connected layers.

Since there are three channels for a color image (RGB), the input channels of the network must be three.
In each convolutional layer, typically the channels of inputs grow, while the height and width shrink (or remain unchanged, according to some hyperparameters like stride and padding).

Before fed into fully-connected layers, the feature map must be flattened into a single one-dimensional vector (for each image).
These features are then transformed by the fully-connected layers, and finally, we obtain the "logits" for each class.

### **WARNING -- You Must Know**
You are free to modify the model architecture here for further improvement.
However, if you want to use some well-known architectures such as ResNet50, please make sure **NOT** to load the pre-trained weights.
Using such pre-trained models is considered cheating and therefore you will be punished.
Similarly, it is your responsibility to make sure no pre-trained weights are used if you use **torch.hub** to load any modules.

For example, if you use ResNet-18 as your model:

model = torchvision.models.resnet18(pretrained=**False**) → This is fine.

model = torchvision.models.resnet18(pretrained=**True**)  → This is **NOT** allowed.

In [8]:
class BlockConv2d(nn.Module):
    def __init__(self, ch_in, ch_out, k, stride=1,
            act=None, pooling=None, use_bn=True, drop=0, is_dconv=False, is_pad=True):

        super(BlockConv2d, self).__init__()
        pad = k//2 if is_pad else 0
        if is_dconv:
            conv = nn.ConvTranspose2d(ch_in, ch_out, k, padding=pad, stride=stride, output_padding=stride-1)
        else:
            conv = nn.Conv2d(ch_in, ch_out, k, padding=pad, stride=stride)
            
        list = [
            conv
        ]
        if use_bn: list.append(nn.BatchNorm2d(ch_out))
        if act: list.append(act)
        if pooling: list.append(pooling)
        if drop > 0: list.append(nn.Dropout2d(drop))
        self.net = nn.Sequential(*list)

    def forward(self, x):
          return self.net(x)

class BlockLinear(nn.Module):
    def __init__(self, ch_in, ch_out, act=None, use_bn=True, drop=0, bias=True):
        super(BlockLinear, self).__init__()
        list = [
            nn.Linear(ch_in, ch_out, bias=bias)
        ]
        if use_bn: list.append(nn.BatchNorm1d(ch_out))
        if act: list.append(act)
        if drop > 0: list.append(nn.Dropout(drop))
        self.net = nn.Sequential(*list)

    def forward(self, x):
          return self.net(x)

In [9]:
# m = torchvision.models.resnext50_32x4d(pretrained=False)
# for i, j in (m.state_dict().items()):
    # print(i, "\t", j.shape)
# m.fc

In [10]:
class Classifier(nn.Module):
    def __init__(self):
        super(Classifier, self).__init__()
        # input image size: [3, 128, 128]
        self.dim = 2048

        self.encoder = torchvision.models.resnext50_32x4d(pretrained=False, zero_init_residual=True)
        # self.encoder.fc = BlockLinear(2048, self.dim, act=nn.ReLU(), use_bn=True, drop=.5)
        # self.encoder.fc = nn.Identity()
        
        self.act = nn.ReLU()
        self.n = [self.dim, self.dim*2, self.dim*2]
        self.l = len(self.n)
        fc = [
            BlockLinear(self.n[i], self.n[i+1], act=nn.ReLU(), use_bn=True, drop=0, bias=False)
            for i in range(self.l-2)
        ]
        fc += [nn.Linear(self.n[-2], self.n[-1], bias=False)]
        self.encoder.fc = nn.Sequential(*fc)

        self.n2 = [self.n[-1], 1024, 256]
        self.l2 = len(self.n2)
        fc2 = [
            BlockLinear(self.n2[i], self.n2[i+1], act=nn.ReLU(), use_bn=True, drop=0)
            for i in range(self.l2-1)
        ]
        self.classifier = nn.Sequential(*fc2)

        self.out = nn.Linear(self.n2[-1], 11)

    def forward(self, x):
        # input (x): [batch_size, 3, 128, 128]
        # output: [batch_size, 11]
        x = self.encoder(x)
        x = self.classifier(x)
        x = self.out(x)
        return x
    
    def predict(self, x):
        """ Predict class by representation """
        return self.out(self.classifier(x))
    
    def repr(self, x):
        """ Encode to the representation """
        return self.encoder(x)
    

## **Training**

You can finish supervised learning by simply running the provided code without any modification.

The function "get_pseudo_labels" is used for semi-supervised learning.
It is expected to get better performance if you use unlabeled data for semi-supervised learning.
However, you have to implement the function on your own and need to adjust several hyperparameters manually.

For more details about semi-supervised learning, please refer to [Prof. Lee's slides](https://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/semi%20(v3).pdf).

Again, please notice that utilizing external data (or pre-trained model) for training is **prohibited**.

In [11]:
device = "cuda" if torch.cuda.is_available() else "cpu"

In [12]:
# Explicitly define the psudolabels in Subset class
def get_pseudosample(self, idx):
    return self.dataset[self.indices[idx]][0], self.labels[idx]

def init_pseudolabel(self, dataset, labels, indices):
    self.dataset = dataset
    self.labels = labels
    self.indices = indices
Subset.__getitem__ = get_pseudosample
Subset.__init__ = init_pseudolabel

In [13]:
def get_pseudo_labels(dataset, model, threshold=0.5):
    # This functions generates pseudo-labels of a dataset using given model.
    # It returns an instance of DatasetFolder containing images whose prediction confidences exceed a given threshold.
    # You are NOT allowed to use any models trained on external data for pseudo-labeling.
    device = "cuda" if torch.cuda.is_available() else "cpu"

    # Make sure the model is in eval mode.
    model.eval()
    # Define softmax function.
    softmax = nn.Softmax(dim=-1)
    idx = []
    targets = []
    # Iterate over the dataset by batches.
    for i, batch in tqdm(enumerate(unlab_loader), leave=False, desc='PseudoLabels'):
        img, _ = batch
        with torch.no_grad():
            logits = model(img.to(device))

        # Obtain the probability distributions by applying softmax on logits.
        probs = softmax(logits)
        st = torch.topk(probs, 2, dim=1)
        
        # ---------- TODO ----------
        # Filter the data and construct a new dataset.
        probs1 = st[0][:, 0]
        # probs2 = st[0][:, 1]
        # select = (probs1 > threshold) & ((probs1-probs2) > .25)
        select = (probs1 > threshold)
        targets += st[1][select, 0].tolist()
        idx += (torch.where(select)[0] + batch_size*i).tolist()

    # custom subset
    new = Subset(dataset, targets, idx)
    model.train()
    return new

In [14]:
log_path = 'log'
if log_path in os.listdir():
    os.remove(log_path)
    print('remove log')
log = open(log_path, 'a')

remove log


In [15]:
def train(imgs, is_supervised=False, ans=None, criterion=None, clr_lambda=1):
    imgs = imgs.to(device)
    reprA = model.repr(imgs[:, 0])
    reprB = model.repr(imgs[:, 1])

    reprA_norm = (reprA - reprA.mean(0)) / (reprA.std(0)+1e-7)
    reprB_norm = (reprB - reprB.mean(0)) / (reprB.std(0)+1e-7)

    # covariance matrix
    c = torch.mm(reprA_norm.T, reprB_norm) / len(reprB_norm)
    # loss: offdiag -> 0
    c_offdiag = clr_reg*(torch.triu(c, diagonal=1)**2).sum()
    # loss: diag -> 1
    c_diag = ((c.diag()-1)**2).sum()
    loss = clr_lambda*(c_offdiag+c_diag)
    
    if is_supervised:
        pred = model.predict(torch.cat([reprA, reprB], 0))
        ans = ans.repeat(2)
        # loss = criterion(pred, ans.to(device))
        loss += criterion(pred, ans.to(device))
        return loss, pred, ans, c_diag.item(), c_offdiag.item()
    return loss, c_diag.item(), c_offdiag.item()

In [16]:
model_path = "./model.ckpt"

model = Classifier().to(device)
# model.load_state_dict(torch.load(model_path))

lr = 1e-3
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.AdamW(model.parameters(), lr=lr, weight_decay=5e-4)
lr_scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(
    optimizer, mode='min', factor=0.5, patience=5, threshold=2e-4, min_lr=1e-5, verbose=True,
)



n_epochs = 200
# classifier | CLR | classifier
n_phase1 = 50
n_phase2 = 100
best_acc = 0
valid_acc = 0

best_semi = 0
semi_flg = False
semi_count = 0
clr_reg = 1e-3
# clr_reg = 2/(model.dim*(model.dim-1))
clr_lambda = 1/1000
loader = train_loader
for epoch in range(n_epochs):
#     if epoch == n_pretrain:
#         optimizer.param_groups[0]['lr'] = lr
#         lr_scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(
#     optimizer, mode='min', factor=0.8, patience=5, threshold=2e-4, min_lr=1e-5, verbose=True
# )

    model.train()

    # ---------- CLR ----------
    # if False:
    if n_phase1 <= epoch < n_phase2:
        train_loss = []
        diag_loss = []
        offdiag_loss = []
        for batch in tqdm(unlab_loader, desc='CLR', leave=False):
            loss, c_diag, c_offdiag = train(batch[0])
            optimizer.zero_grad()
            loss.backward()
            # grad_norm = nn.utils.clip_grad_norm_(model.parameters(), max_norm=10)
            optimizer.step()
            train_loss.append(loss.item())
            diag_loss.append(c_diag)
            offdiag_loss.append(c_offdiag)

        train_loss = np.mean(train_loss)
        diag_loss = np.mean(diag_loss)
        offdiag_loss = np.mean(offdiag_loss)

        log_str = (f"[ CLR   | {epoch + 1:03d}/{n_epochs:03d} ] loss = {train_loss:.5f}, diag = {diag_loss:5.2f}, offdiag = {offdiag_loss:5.2f}")
        log.write(log_str+'\n')
        log.flush()
        print(log_str)

    # ---------- Training ----------

    if not(n_phase1 <= epoch < n_phase2):
        train_loss = []
        train_accs = []
        # diag_loss = []
        # offdiag_loss = []
        # # for batch in train_loader:
        for batch in tqdm(train_loader, desc='Train', leave=False):

            imgs, labels = batch
            # loss, logits, labels, c_diag, c_offdiag = train(
            #     imgs, 
            #     is_supervised=True, ans=labels, criterion=criterion, clr_lambda=clr_lambda
            # )
            logits = model(imgs.to(device))
            loss = criterion(logits, labels.to(device))
            optimizer.zero_grad()
            loss.backward()
            grad_norm = nn.utils.clip_grad_norm_(model.parameters(), max_norm=10)
            optimizer.step()

            acc = (logits.argmax(dim=-1) == labels.to(device)).float().mean()
            train_loss.append(loss.item())
            train_accs.append(acc)
            # diag_loss.append(c_diag)
            # offdiag_loss.append(c_offdiag)

        train_loss = sum(train_loss) / len(train_loss)
        train_acc = sum(train_accs) / len(train_accs)
        # diag_loss = np.mean(diag_loss)
        # offdiag_loss = np.mean(offdiag_loss)
        # train_loss = 1
        # train_acc = 1

        # log_str = (f"[ Train | {epoch + 1:03d}/{n_epochs:03d} ] loss = {train_loss:.5f}, acc = {train_acc:.5f}, diag = {diag_loss:5.2f}, offdiag = {offdiag_loss:5.2f}")
        log_str = (f"[ Train | {epoch + 1:03d}/{n_epochs:03d} ] loss = {train_loss:.5f}, acc = {train_acc:.5f}")
        log.write(log_str+'\n')
        print(log_str)
    
    # ---------- Validation ----------
    model.eval()
    
    valid_loss = []
    valid_accs = []

    # for batch in valid_loader:
    for batch in tqdm(valid_loader, desc='Valid', leave=False):
        imgs, labels = batch

        with torch.no_grad():
          logits = model(imgs.to(device))

        loss = criterion(logits, labels.to(device))
        acc = (logits.argmax(dim=-1) == labels.to(device)).float().mean()
        valid_loss.append(loss.item())
        valid_accs.append(acc)

    valid_loss = sum(valid_loss) / len(valid_loss)
    valid_acc = sum(valid_accs) / len(valid_accs)
    if valid_acc >= best_acc:
        best_acc = valid_acc
        best_model = model.state_dict()
        torch.save(model.state_dict(), model_path)
        log_str = ('saving model with acc {:.3f}'.format(best_acc))
        print(log_str)
        log.write(log_str+"\n")
        semi_count = 0
    
    # semi early stop
    # if semi_flg:
    #     if valid_acc > best_semi:
    #         best_semi = valid_acc
    #         semi_count = 0
    #     else:
    #         semi_count += 1

    log_str = f"[ Valid | {epoch + 1:03d}/{n_epochs:03d} ]  loss = {valid_loss:.5f}, acc = {valid_acc:.5f}"
    log.write(log_str+'\n')
    print(log_str)
    # print(log_str1)
    # log.write(log_str1+"\n")
    log.flush()
    if not(n_phase1 <= epoch < n_phase2):
        lr_scheduler.step(valid_loss)
    # print(f"[ Valid\t| {epoch + 1:03d}/{n_epochs:03d} ] ae_loss = {valid_ae:.5f}, loss = {valid_loss:.5f}, acc = {valid_acc:.5f}")

Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 001/200 ] loss = 2.35270, acc = 0.16569


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.179
[ Valid | 001/200 ]  loss = 2.25937, acc = 0.17943


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 002/200 ] loss = 2.25558, acc = 0.19564


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.204
[ Valid | 002/200 ]  loss = 2.22573, acc = 0.20443


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 003/200 ] loss = 2.22479, acc = 0.20638


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.275
[ Valid | 003/200 ]  loss = 2.05140, acc = 0.27500


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 004/200 ] loss = 2.19729, acc = 0.21322


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 004/200 ]  loss = 2.11687, acc = 0.21927


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 005/200 ] loss = 2.17012, acc = 0.23177


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 005/200 ]  loss = 2.06884, acc = 0.25469


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 006/200 ] loss = 2.15348, acc = 0.23763


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 006/200 ]  loss = 2.08035, acc = 0.25182


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 007/200 ] loss = 2.15867, acc = 0.22656


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 007/200 ]  loss = 2.07372, acc = 0.25104


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 008/200 ] loss = 2.13031, acc = 0.24870


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.296
[ Valid | 008/200 ]  loss = 2.05065, acc = 0.29557


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 009/200 ] loss = 2.13813, acc = 0.24382


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 009/200 ]  loss = 2.06969, acc = 0.29115


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 010/200 ] loss = 2.08962, acc = 0.26172


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.322
[ Valid | 010/200 ]  loss = 1.92897, acc = 0.32214


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 011/200 ] loss = 2.09354, acc = 0.26237


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 011/200 ]  loss = 2.16252, acc = 0.23724


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 012/200 ] loss = 2.08441, acc = 0.26628


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 012/200 ]  loss = 1.99582, acc = 0.29583


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 013/200 ] loss = 2.07129, acc = 0.27181


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 013/200 ]  loss = 2.11548, acc = 0.27344


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 014/200 ] loss = 2.06618, acc = 0.26725


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 014/200 ]  loss = 2.02011, acc = 0.28932


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 015/200 ] loss = 2.04852, acc = 0.28190


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 015/200 ]  loss = 2.13890, acc = 0.28385


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 016/200 ] loss = 2.02059, acc = 0.28516


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 016/200 ]  loss = 2.15934, acc = 0.27630
Epoch    16: reducing learning rate of group 0 to 5.0000e-04.


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 017/200 ] loss = 1.98770, acc = 0.30143


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.345
[ Valid | 017/200 ]  loss = 1.97346, acc = 0.34505


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 018/200 ] loss = 1.96951, acc = 0.31283


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.399
[ Valid | 018/200 ]  loss = 1.79299, acc = 0.39870


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 019/200 ] loss = 1.95084, acc = 0.29818


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 019/200 ]  loss = 2.11764, acc = 0.31745


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 020/200 ] loss = 1.96135, acc = 0.29948


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 020/200 ]  loss = 1.89362, acc = 0.34193


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 021/200 ] loss = 1.95765, acc = 0.30990


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 021/200 ]  loss = 1.75641, acc = 0.39740


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 022/200 ] loss = 1.94724, acc = 0.31771


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 022/200 ]  loss = 1.82799, acc = 0.35521


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 023/200 ] loss = 1.94815, acc = 0.30892


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 023/200 ]  loss = 1.90413, acc = 0.32839


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 024/200 ] loss = 1.93472, acc = 0.29948


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 024/200 ]  loss = 1.82578, acc = 0.36276


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 025/200 ] loss = 1.92686, acc = 0.32259


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 025/200 ]  loss = 1.85782, acc = 0.35208


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 026/200 ] loss = 1.90990, acc = 0.32292


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.404
[ Valid | 026/200 ]  loss = 1.77490, acc = 0.40443


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 027/200 ] loss = 1.89340, acc = 0.31966


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 027/200 ]  loss = 1.91138, acc = 0.32422
Epoch    27: reducing learning rate of group 0 to 2.5000e-04.


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 028/200 ] loss = 1.85731, acc = 0.35254


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 028/200 ]  loss = 1.79365, acc = 0.40000


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 029/200 ] loss = 1.85617, acc = 0.33561


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 029/200 ]  loss = 1.75426, acc = 0.39427


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 030/200 ] loss = 1.86535, acc = 0.34180


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 030/200 ]  loss = 1.78923, acc = 0.39557


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 031/200 ] loss = 1.85841, acc = 0.34766


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 031/200 ]  loss = 1.74579, acc = 0.40156


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 032/200 ] loss = 1.83644, acc = 0.35059


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.438
[ Valid | 032/200 ]  loss = 1.69977, acc = 0.43750


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 033/200 ] loss = 1.81497, acc = 0.36556


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 033/200 ]  loss = 1.73136, acc = 0.41693


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 034/200 ] loss = 1.80893, acc = 0.36230


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 034/200 ]  loss = 1.75819, acc = 0.42188


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 035/200 ] loss = 1.82903, acc = 0.34538


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 035/200 ]  loss = 1.79833, acc = 0.34896


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 036/200 ] loss = 1.81789, acc = 0.35645


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.444
[ Valid | 036/200 ]  loss = 1.72159, acc = 0.44401


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 037/200 ] loss = 1.80820, acc = 0.35645


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.453
[ Valid | 037/200 ]  loss = 1.68149, acc = 0.45286


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 038/200 ] loss = 1.80531, acc = 0.36621


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 038/200 ]  loss = 1.63152, acc = 0.42031


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 039/200 ] loss = 1.81169, acc = 0.35026


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 039/200 ]  loss = 1.67289, acc = 0.40391


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 040/200 ] loss = 1.79449, acc = 0.36589


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 040/200 ]  loss = 1.73123, acc = 0.40990


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 041/200 ] loss = 1.78375, acc = 0.37598


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 041/200 ]  loss = 1.73485, acc = 0.38984


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 042/200 ] loss = 1.78709, acc = 0.37500


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 042/200 ]  loss = 1.63602, acc = 0.43984


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 043/200 ] loss = 1.79713, acc = 0.36556


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 043/200 ]  loss = 1.66369, acc = 0.42604


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 044/200 ] loss = 1.77348, acc = 0.37305


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 044/200 ]  loss = 1.71602, acc = 0.40703
Epoch    44: reducing learning rate of group 0 to 1.2500e-04.


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 045/200 ] loss = 1.76549, acc = 0.37760


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.454
[ Valid | 045/200 ]  loss = 1.70048, acc = 0.45443


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 046/200 ] loss = 1.73797, acc = 0.39225


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.468
[ Valid | 046/200 ]  loss = 1.57395, acc = 0.46849


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 047/200 ] loss = 1.73040, acc = 0.39681


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 047/200 ]  loss = 1.62369, acc = 0.44401


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 048/200 ] loss = 1.72791, acc = 0.38379


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 048/200 ]  loss = 1.66751, acc = 0.43646


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 049/200 ] loss = 1.73392, acc = 0.38379


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 049/200 ]  loss = 1.58684, acc = 0.44115


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 050/200 ] loss = 1.73019, acc = 0.38835


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 050/200 ]  loss = 1.64891, acc = 0.44245


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 051/200 ] loss = 1299.57213, diag = 733.04, offdiag = 566.54


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 051/200 ]  loss = 2.05946, acc = 0.37734


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 052/200 ] loss = 1152.45454, diag = 566.86, offdiag = 585.59


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 052/200 ]  loss = 2.15668, acc = 0.30911


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 053/200 ] loss = 1124.44308, diag = 575.96, offdiag = 548.48


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 053/200 ]  loss = 2.09657, acc = 0.33906


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 054/200 ] loss = 1074.09199, diag = 570.45, offdiag = 503.64


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 054/200 ]  loss = 2.04260, acc = 0.34167


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 055/200 ] loss = 1044.66660, diag = 553.05, offdiag = 491.61


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 055/200 ]  loss = 2.05816, acc = 0.29505


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 056/200 ] loss = 1031.50400, diag = 553.90, offdiag = 477.60


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 056/200 ]  loss = 2.00430, acc = 0.33177


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 057/200 ] loss = 1000.25784, diag = 526.69, offdiag = 473.57


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 057/200 ]  loss = 2.09278, acc = 0.30026


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 058/200 ] loss = 997.62654, diag = 534.06, offdiag = 463.57


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 058/200 ]  loss = 2.04326, acc = 0.29818


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 059/200 ] loss = 974.01709, diag = 518.03, offdiag = 455.99


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 059/200 ]  loss = 1.99155, acc = 0.32786


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 060/200 ] loss = 972.64100, diag = 522.02, offdiag = 450.62


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 060/200 ]  loss = 2.00844, acc = 0.33594


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 061/200 ] loss = 953.84502, diag = 511.45, offdiag = 442.40


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 061/200 ]  loss = 2.00264, acc = 0.34479


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 062/200 ] loss = 940.91424, diag = 508.00, offdiag = 432.92


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 062/200 ]  loss = 1.92762, acc = 0.34453


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 063/200 ] loss = 932.12771, diag = 508.49, offdiag = 423.64


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 063/200 ]  loss = 1.98035, acc = 0.33646


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 064/200 ] loss = 915.62306, diag = 494.22, offdiag = 421.41


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 064/200 ]  loss = 2.01016, acc = 0.33750


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 065/200 ] loss = 930.58063, diag = 514.36, offdiag = 416.22


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 065/200 ]  loss = 1.84480, acc = 0.36432


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 066/200 ] loss = 901.73420, diag = 488.20, offdiag = 413.53


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 066/200 ]  loss = 2.02405, acc = 0.29635


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 067/200 ] loss = 893.89436, diag = 480.90, offdiag = 413.00


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 067/200 ]  loss = 2.00840, acc = 0.31901


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 068/200 ] loss = 885.93114, diag = 483.36, offdiag = 402.57


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 068/200 ]  loss = 1.93325, acc = 0.34531


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 069/200 ] loss = 880.69956, diag = 473.05, offdiag = 407.65


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 069/200 ]  loss = 1.93839, acc = 0.33229


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 070/200 ] loss = 886.69329, diag = 486.44, offdiag = 400.25


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 070/200 ]  loss = 1.99108, acc = 0.31823


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 071/200 ] loss = 875.14375, diag = 477.33, offdiag = 397.82


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 071/200 ]  loss = 2.03377, acc = 0.28724


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 072/200 ] loss = 871.72366, diag = 475.39, offdiag = 396.34


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 072/200 ]  loss = 1.93105, acc = 0.33568


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 073/200 ] loss = 850.37652, diag = 451.16, offdiag = 399.21


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 073/200 ]  loss = 1.89222, acc = 0.34635


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 074/200 ] loss = 857.51899, diag = 465.84, offdiag = 391.68


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 074/200 ]  loss = 1.92933, acc = 0.32005


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 075/200 ] loss = 851.62565, diag = 462.37, offdiag = 389.26


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 075/200 ]  loss = 1.97205, acc = 0.31745


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 076/200 ] loss = 849.42780, diag = 460.47, offdiag = 388.96


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 076/200 ]  loss = 1.96457, acc = 0.31380


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 077/200 ] loss = 853.45629, diag = 462.75, offdiag = 390.70


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 077/200 ]  loss = 1.95883, acc = 0.34635


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 078/200 ] loss = 838.09476, diag = 453.45, offdiag = 384.65


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 078/200 ]  loss = 2.00112, acc = 0.31276


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 079/200 ] loss = 837.57045, diag = 454.85, offdiag = 382.72


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 079/200 ]  loss = 2.04868, acc = 0.29271


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 080/200 ] loss = 822.99588, diag = 438.89, offdiag = 384.11


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 080/200 ]  loss = 2.04095, acc = 0.29505


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 081/200 ] loss = 812.82477, diag = 427.45, offdiag = 385.37


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 081/200 ]  loss = 2.01148, acc = 0.30182


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 082/200 ] loss = 812.17648, diag = 432.66, offdiag = 379.52


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 082/200 ]  loss = 2.00931, acc = 0.30573


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 083/200 ] loss = 829.42645, diag = 448.77, offdiag = 380.65


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 083/200 ]  loss = 2.02148, acc = 0.31276


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 084/200 ] loss = 818.61355, diag = 441.71, offdiag = 376.90


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 084/200 ]  loss = 2.05974, acc = 0.28594


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 085/200 ] loss = 799.53615, diag = 421.41, offdiag = 378.13


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 085/200 ]  loss = 2.01960, acc = 0.29453


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 086/200 ] loss = 812.88785, diag = 441.22, offdiag = 371.67


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 086/200 ]  loss = 2.07796, acc = 0.30052


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 087/200 ] loss = 804.84828, diag = 430.08, offdiag = 374.77


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 087/200 ]  loss = 2.04740, acc = 0.28255


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 088/200 ] loss = 802.00587, diag = 428.96, offdiag = 373.05


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 088/200 ]  loss = 2.10314, acc = 0.27266


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 089/200 ] loss = 774.29032, diag = 403.21, offdiag = 371.08


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 089/200 ]  loss = 2.02662, acc = 0.28099


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 090/200 ] loss = 780.52997, diag = 410.00, offdiag = 370.53


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 090/200 ]  loss = 2.08752, acc = 0.28177


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 091/200 ] loss = 784.00342, diag = 418.41, offdiag = 365.60


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 091/200 ]  loss = 2.14696, acc = 0.25781


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 092/200 ] loss = 792.28977, diag = 423.17, offdiag = 369.12


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 092/200 ]  loss = 2.09403, acc = 0.30104


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 093/200 ] loss = 780.10362, diag = 413.88, offdiag = 366.23


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 093/200 ]  loss = 2.05508, acc = 0.27969


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 094/200 ] loss = 772.89297, diag = 407.75, offdiag = 365.14


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 094/200 ]  loss = 2.07030, acc = 0.27995


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 095/200 ] loss = 787.93619, diag = 422.29, offdiag = 365.64


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 095/200 ]  loss = 2.08835, acc = 0.27526


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 096/200 ] loss = 777.84758, diag = 415.57, offdiag = 362.27


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 096/200 ]  loss = 2.03287, acc = 0.28099


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 097/200 ] loss = 762.29678, diag = 397.68, offdiag = 364.62


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 097/200 ]  loss = 2.12110, acc = 0.25938


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 098/200 ] loss = 757.89802, diag = 399.38, offdiag = 358.52


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 098/200 ]  loss = 2.10669, acc = 0.28047


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 099/200 ] loss = 751.78918, diag = 394.34, offdiag = 357.45


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 099/200 ]  loss = 2.16016, acc = 0.25964


CLR:   0%|          | 0/106 [00:00<?, ?it/s]

[ CLR   | 100/200 ] loss = 758.93176, diag = 403.68, offdiag = 355.25


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 100/200 ]  loss = 2.10020, acc = 0.26771


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 101/200 ] loss = 1.80643, acc = 0.36589


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 101/200 ]  loss = 1.61287, acc = 0.46120


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 102/200 ] loss = 1.69806, acc = 0.40397


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 102/200 ]  loss = 1.56274, acc = 0.46068


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 103/200 ] loss = 1.67267, acc = 0.41276


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 103/200 ]  loss = 1.52816, acc = 0.46250


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 104/200 ] loss = 1.68255, acc = 0.40723


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.473
[ Valid | 104/200 ]  loss = 1.52485, acc = 0.47266


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 105/200 ] loss = 1.64589, acc = 0.42253


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 105/200 ]  loss = 1.52108, acc = 0.46250


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 106/200 ] loss = 1.60877, acc = 0.43392


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 106/200 ]  loss = 1.56035, acc = 0.46120


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 107/200 ] loss = 1.63101, acc = 0.42253


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.494
[ Valid | 107/200 ]  loss = 1.48280, acc = 0.49401


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 108/200 ] loss = 1.60844, acc = 0.43490


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.505
[ Valid | 108/200 ]  loss = 1.45665, acc = 0.50547


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 109/200 ] loss = 1.61107, acc = 0.43717


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 109/200 ]  loss = 1.46837, acc = 0.50104


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 110/200 ] loss = 1.59631, acc = 0.43717


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 110/200 ]  loss = 1.49975, acc = 0.49271


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 111/200 ] loss = 1.60177, acc = 0.43262


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 111/200 ]  loss = 1.44746, acc = 0.48958


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 112/200 ] loss = 1.59572, acc = 0.43262


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 112/200 ]  loss = 1.46468, acc = 0.48568


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 113/200 ] loss = 1.60038, acc = 0.42741


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 113/200 ]  loss = 1.46643, acc = 0.49089


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 114/200 ] loss = 1.59821, acc = 0.43392


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 114/200 ]  loss = 1.49328, acc = 0.45365


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 115/200 ] loss = 1.59101, acc = 0.43848


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 115/200 ]  loss = 1.46759, acc = 0.47474


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 116/200 ] loss = 1.56719, acc = 0.44076


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 116/200 ]  loss = 1.47486, acc = 0.47865


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 117/200 ] loss = 1.55081, acc = 0.45833


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 117/200 ]  loss = 1.47490, acc = 0.47500
Epoch    67: reducing learning rate of group 0 to 6.2500e-05.


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 118/200 ] loss = 1.54539, acc = 0.45085


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 118/200 ]  loss = 1.49589, acc = 0.49609


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 119/200 ] loss = 1.55992, acc = 0.44531


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 119/200 ]  loss = 1.44488, acc = 0.48385


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 120/200 ] loss = 1.54308, acc = 0.45182


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.510
[ Valid | 120/200 ]  loss = 1.42623, acc = 0.51016


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 121/200 ] loss = 1.55639, acc = 0.45703


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 121/200 ]  loss = 1.48851, acc = 0.47891


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 122/200 ] loss = 1.54077, acc = 0.45150


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 122/200 ]  loss = 1.46639, acc = 0.49297


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 123/200 ] loss = 1.55002, acc = 0.45964


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 123/200 ]  loss = 1.45197, acc = 0.49557


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 124/200 ] loss = 1.52967, acc = 0.45671


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 124/200 ]  loss = 1.42288, acc = 0.49740


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 125/200 ] loss = 1.53338, acc = 0.45117


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.512
[ Valid | 125/200 ]  loss = 1.45558, acc = 0.51224


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 126/200 ] loss = 1.51087, acc = 0.45898


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 126/200 ]  loss = 1.45841, acc = 0.48776


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 127/200 ] loss = 1.55391, acc = 0.45898


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.527
[ Valid | 127/200 ]  loss = 1.39631, acc = 0.52682


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 128/200 ] loss = 1.52640, acc = 0.46973


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 128/200 ]  loss = 1.43088, acc = 0.50573


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 129/200 ] loss = 1.53773, acc = 0.46257


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 129/200 ]  loss = 1.48886, acc = 0.47630


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 130/200 ] loss = 1.50973, acc = 0.46615


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 130/200 ]  loss = 1.51801, acc = 0.49688


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 131/200 ] loss = 1.52922, acc = 0.45410


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 131/200 ]  loss = 1.45065, acc = 0.49297


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 132/200 ] loss = 1.51965, acc = 0.46517


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 132/200 ]  loss = 1.42820, acc = 0.50078


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 133/200 ] loss = 1.52699, acc = 0.48730


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 133/200 ]  loss = 1.49929, acc = 0.49505
Epoch    83: reducing learning rate of group 0 to 3.1250e-05.


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 134/200 ] loss = 1.50884, acc = 0.46517


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 134/200 ]  loss = 1.46650, acc = 0.50208


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 135/200 ] loss = 1.53002, acc = 0.46191


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 135/200 ]  loss = 1.44525, acc = 0.51927


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 136/200 ] loss = 1.50762, acc = 0.45638


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 136/200 ]  loss = 1.38509, acc = 0.52109


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 137/200 ] loss = 1.49703, acc = 0.48633


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 137/200 ]  loss = 1.48624, acc = 0.48854


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 138/200 ] loss = 1.50464, acc = 0.47266


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 138/200 ]  loss = 1.43516, acc = 0.50391


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 139/200 ] loss = 1.50458, acc = 0.47038


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 139/200 ]  loss = 1.48079, acc = 0.51224


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 140/200 ] loss = 1.51373, acc = 0.46875


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 140/200 ]  loss = 1.47598, acc = 0.48984


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 141/200 ] loss = 1.50660, acc = 0.47363


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 141/200 ]  loss = 1.49408, acc = 0.49115


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 142/200 ] loss = 1.51411, acc = 0.46712


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 142/200 ]  loss = 1.46391, acc = 0.48411
Epoch    92: reducing learning rate of group 0 to 1.5625e-05.


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 143/200 ] loss = 1.51972, acc = 0.46484


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 143/200 ]  loss = 1.46727, acc = 0.51484


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 144/200 ] loss = 1.51808, acc = 0.46777


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 144/200 ]  loss = 1.44294, acc = 0.50651


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 145/200 ] loss = 1.49068, acc = 0.47754


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 145/200 ]  loss = 1.48950, acc = 0.48802


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 146/200 ] loss = 1.50028, acc = 0.46973


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 146/200 ]  loss = 1.46109, acc = 0.47708


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 147/200 ] loss = 1.48931, acc = 0.46322


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 147/200 ]  loss = 1.49143, acc = 0.48802


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 148/200 ] loss = 1.47455, acc = 0.48796


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 148/200 ]  loss = 1.48368, acc = 0.49115
Epoch    98: reducing learning rate of group 0 to 1.0000e-05.


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 149/200 ] loss = 1.51581, acc = 0.45964


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 149/200 ]  loss = 1.45147, acc = 0.49688


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 150/200 ] loss = 1.48236, acc = 0.48047


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 150/200 ]  loss = 1.49854, acc = 0.49427


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 151/200 ] loss = 1.51033, acc = 0.45736


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 151/200 ]  loss = 1.44798, acc = 0.51172


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 152/200 ] loss = 1.49935, acc = 0.46680


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 152/200 ]  loss = 1.37540, acc = 0.52318


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 153/200 ] loss = 1.48344, acc = 0.47917


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 153/200 ]  loss = 1.46030, acc = 0.49557


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 154/200 ] loss = 1.48124, acc = 0.48372


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 154/200 ]  loss = 1.40370, acc = 0.52500


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 155/200 ] loss = 1.47972, acc = 0.48535


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 155/200 ]  loss = 1.45819, acc = 0.49818


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 156/200 ] loss = 1.48962, acc = 0.47656


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 156/200 ]  loss = 1.48039, acc = 0.50208


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 157/200 ] loss = 1.47056, acc = 0.48079


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 157/200 ]  loss = 1.47410, acc = 0.50391


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 158/200 ] loss = 1.51137, acc = 0.46875


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 158/200 ]  loss = 1.43838, acc = 0.50651


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 159/200 ] loss = 1.49775, acc = 0.47038


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 159/200 ]  loss = 1.42553, acc = 0.52188


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 160/200 ] loss = 1.49796, acc = 0.47103


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 160/200 ]  loss = 1.48730, acc = 0.48229


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 161/200 ] loss = 1.48073, acc = 0.47591


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 161/200 ]  loss = 1.47395, acc = 0.49375


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 162/200 ] loss = 1.44575, acc = 0.49349


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 162/200 ]  loss = 1.37809, acc = 0.52448


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 163/200 ] loss = 1.47995, acc = 0.48014


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 163/200 ]  loss = 1.45384, acc = 0.48854


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 164/200 ] loss = 1.48957, acc = 0.46419


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 164/200 ]  loss = 1.48734, acc = 0.49115


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 165/200 ] loss = 1.48990, acc = 0.46647


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 165/200 ]  loss = 1.38773, acc = 0.52500


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 166/200 ] loss = 1.49112, acc = 0.47038


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 166/200 ]  loss = 1.48852, acc = 0.47969


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 167/200 ] loss = 1.48392, acc = 0.47689


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 167/200 ]  loss = 1.44329, acc = 0.49505


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 168/200 ] loss = 1.47600, acc = 0.47396


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 168/200 ]  loss = 1.45850, acc = 0.51224


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 169/200 ] loss = 1.49846, acc = 0.46973


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 169/200 ]  loss = 1.45692, acc = 0.49505


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 170/200 ] loss = 1.46083, acc = 0.48242


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 170/200 ]  loss = 1.48858, acc = 0.49688


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 171/200 ] loss = 1.48419, acc = 0.48535


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 171/200 ]  loss = 1.43526, acc = 0.49688


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 172/200 ] loss = 1.47115, acc = 0.47070


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 172/200 ]  loss = 1.36582, acc = 0.52370


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 173/200 ] loss = 1.48802, acc = 0.49251


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 173/200 ]  loss = 1.44035, acc = 0.52318


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 174/200 ] loss = 1.48279, acc = 0.47461


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 174/200 ]  loss = 1.47713, acc = 0.50990


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 175/200 ] loss = 1.50625, acc = 0.47396


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 175/200 ]  loss = 1.40204, acc = 0.49948


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 176/200 ] loss = 1.46617, acc = 0.49219


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

saving model with acc 0.530
[ Valid | 176/200 ]  loss = 1.37222, acc = 0.53021


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 177/200 ] loss = 1.50581, acc = 0.46484


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 177/200 ]  loss = 1.47756, acc = 0.49193


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 178/200 ] loss = 1.47629, acc = 0.48438


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 178/200 ]  loss = 1.40791, acc = 0.52318


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 179/200 ] loss = 1.49242, acc = 0.47559


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 179/200 ]  loss = 1.45962, acc = 0.50208


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 180/200 ] loss = 1.52552, acc = 0.46159


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 180/200 ]  loss = 1.42698, acc = 0.51094


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 181/200 ] loss = 1.46563, acc = 0.48275


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 181/200 ]  loss = 1.40353, acc = 0.52448


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 182/200 ] loss = 1.46621, acc = 0.49674


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 182/200 ]  loss = 1.49328, acc = 0.48359


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 183/200 ] loss = 1.47987, acc = 0.47396


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 183/200 ]  loss = 1.47198, acc = 0.49505


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 184/200 ] loss = 1.47989, acc = 0.47201


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 184/200 ]  loss = 1.44169, acc = 0.50911


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 185/200 ] loss = 1.49143, acc = 0.47786


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 185/200 ]  loss = 1.51021, acc = 0.48594


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 186/200 ] loss = 1.47728, acc = 0.47721


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 186/200 ]  loss = 1.45030, acc = 0.51354


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 187/200 ] loss = 1.47630, acc = 0.47819


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 187/200 ]  loss = 1.37887, acc = 0.52760


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 188/200 ] loss = 1.46783, acc = 0.48600


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 188/200 ]  loss = 1.43938, acc = 0.48984


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 189/200 ] loss = 1.49799, acc = 0.47363


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 189/200 ]  loss = 1.41973, acc = 0.52057


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 190/200 ] loss = 1.48808, acc = 0.47917


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 190/200 ]  loss = 1.42296, acc = 0.51875


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 191/200 ] loss = 1.49718, acc = 0.46647


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 191/200 ]  loss = 1.42828, acc = 0.50781


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 192/200 ] loss = 1.48418, acc = 0.47135


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 192/200 ]  loss = 1.42616, acc = 0.51042


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 193/200 ] loss = 1.47970, acc = 0.47135


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 193/200 ]  loss = 1.41710, acc = 0.52500


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 194/200 ] loss = 1.49214, acc = 0.46549


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 194/200 ]  loss = 1.39718, acc = 0.51927


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 195/200 ] loss = 1.47774, acc = 0.47721


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 195/200 ]  loss = 1.42703, acc = 0.50651


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 196/200 ] loss = 1.46690, acc = 0.47298


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 196/200 ]  loss = 1.47113, acc = 0.50208


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 197/200 ] loss = 1.48292, acc = 0.47363


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 197/200 ]  loss = 1.43842, acc = 0.49245


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 198/200 ] loss = 1.48321, acc = 0.48112


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 198/200 ]  loss = 1.40105, acc = 0.51797


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 199/200 ] loss = 1.49292, acc = 0.47103


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 199/200 ]  loss = 1.41726, acc = 0.49688


Train:   0%|          | 0/24 [00:00<?, ?it/s]

[ Train | 200/200 ] loss = 1.48912, acc = 0.47233


Valid:   0%|          | 0/6 [00:00<?, ?it/s]

[ Valid | 200/200 ]  loss = 1.42137, acc = 0.50391


## **Testing**

For inference, we need to make sure the model is in eval mode, and the order of the dataset should not be shuffled ("shuffle=False" in test_loader).

Last but not least, don't forget to save the predictions into a single CSV file.
The format of CSV file should follow the rules mentioned in the slides.

### **WARNING -- Keep in Mind**

Cheating includes but not limited to:
1.   using testing labels,
2.   submitting results to previous Kaggle competitions,
3.   sharing predictions with others,
4.   copying codes from any creatures on Earth,
5.   asking other people to do it for you.

Any violations bring you punishments from getting a discount on the final grade to failing the course.

It is your responsibility to check whether your code violates the rules.
When citing codes from the Internet, you should know what these codes exactly do.
You will **NOT** be tolerated if you break the rule and claim you don't know what these codes do.


it = iter(test_loader)
a = next(it)[0].to(device)
ae_model.eval()
b = ae_model(a)
import cv2
b = b.to('cpu').detach().numpy().transpose([0, 2, 3, 1])
a = a.to('cpu').detach().numpy().transpose([0, 2, 3, 1])
def norm(img):
    for i in range(3):
        ch = img[:, :, i]
        a = ch.min()
        b = ch.max()
        img[:, :, i] = (ch-a)/(b-a)
    return (img*255).astype('uint8')
for i, img in enumerate(b):
    cv2.imwrite(f"img/{i}.png", norm(img))
    cv2.imwrite(f"img/{i}_.png", norm(a[i]))
    if i == 0:
        break

In [17]:
# Make sure the model is in eval mode.
# Some modules like Dropout or BatchNorm affect if the model is in training mode.
model_path = './model.ckpt'
device = "cuda" if torch.cuda.is_available() else "cpu"
model = Classifier().to(device)
model.load_state_dict(torch.load(model_path))
model.eval()

# Initialize a list to store the predictions.
predictions = []

# Iterate the testing set by batches.
for batch in tqdm(rd_test_loader):
    # imgs, _ = batch
    imgs = rd_test_set.merge_batch(batch)

    # We don't need gradient in testing, and we don't even have labels to compute loss.
    # Using torch.no_grad() accelerates the forward process.
    with torch.no_grad():
        logits = model(imgs.to(device))
    
    # Take the class with greatest logit as prediction and record it.
    # predictions.extend(logits.argmax(dim=-1).cpu().numpy().tolist())
    predictions.extend(rd_test_set.merge_predict(logits))

  0%|          | 0/210 [00:00<?, ?it/s]

In [18]:
# Save predictions into the file.
with open("predict.csv", "w") as f:

    # The first row must be "Id, Category"
    f.write("Id,Category\n")

    # For the rest of the rows, each image id corresponds to a predicted class.
    for i, pred in  enumerate(predictions):
         f.write(f"{i},{pred}\n")