## Deep Learning Coding Project 2: Image Classification

Before we start, please put your **Chinese** name and student ID in following format:

Name, 0000000000 // e.g.) 傅炜, 2021123123

赵瀚宏, 2023040163

## Introduction

We will use Python 3, [NumPy](https://numpy.org/), and [PyTorch](https://pytorch.org/) for this coding project. The example code has been tested under the latest stable release version.

### Task

In this notebook, you need to train a model to classify images. Given an image, you need to distinguish its category,
e.g., whether it is a horse or an automobile. There are total 10 classes:
airplane, automobile, bird, cat, deer, dog, frog, horse, ship and truck. We
release 40,000 images for training, 10,000 images for validation. Each image has
a shape of (3, 128, 128). We will evaluate your model in 10,000 images on the test set.

Download the dataset from [here](https://cloud.tsinghua.edu.cn/d/00e0704738e04d32978b/) and organize them into a folder named "cifar_10_4x".

<!-- Images can be classified as "No Finding" or **one or more types**. In the basic task, given an image, you only need to tell whether the X-ray indicates "Infiltration". In the bonus task, you need to tell whether *each* of the diseases exists.

Images are taken from the [ChestX-ray14 dataset](https://www.kaggle.com/nih-chest-xrays/data) and downsampled to (256, 256). We release 44872 gray scale images for training and validation. We will evaluate your model on 10285 images in the test set. The dataset is available [here](https://cloud.tsinghua.edu.cn/d/16d06a89c5b4459db703/) and organized as follows: `train` directory includes all images for training and validation, and each line of `train.txt` records the labels separated by "|". -->

### Coding

We provide a code template. You can add new cells and modify our example to train your own model. To run this code, you should:

+ implement your model (named `Net`) in `model.py`.
+ implement your training loop in this notebook

Your final submitted model should not be larger than **20M**. **Using any pretrained model is NOT permitted**.
Besides, before you submit your result, **make sure you can test your model using our evaluation cell.** Name your best model "cifar10_4x_best.pth".

### Report & Submission

Your report should include:

1. the details of your model
2. all the hyper-parameters
3. all the tricks or training techniques you use
4. the training curve of your submitted model.

Reporting additional ablation studies and how you improve your model are also encouraged.

You should submit:

+ all codes
+ the model checkpoint (only "cifar10_4x_best.pth")
+ your report (a separate "pdf")

to web learning. We will use the evaluation code in this notebook to evaluate your model on the test set.

### Grading

We will grade this coding project based on the performance of your model (70%) and your report (30%). Regarding the evaluation metric of your model, assume your test accuracy is $X$, then your score is

$\frac{min(X,H)−0.6}{H−0.6}×7$

where $H$ is accuracy of the model trained by TAs and $H=0.9$, i.e., you will get the full score if your test accuracy is above 90%.

**Bonus**: The best submission with the highest testing accuracy will get 1 bonus point for the final course grade.

**Avoid plagiarism! Any student who violates academic integrity will be seriously dealt with and receive an F for the course.**

## Code Template

We have masked the the training loop in this notebook for you to complete. You should also overwrite "model.py" and implement your own model.

In [2]:
%load_ext autoreload
%autoreload 2

### Setup Code

If you use Colab in this coding project, please uncomment the code, fill the `GOOGLE_DRIVE_PATH_AFTER_MYDRIVE` and run the following cells to mount your Google drive. Then, the notebook can find the required file. If you run the notebook locally, you can skip the following cells.

In [3]:
# from google.colab import drive
# drive.mount('/content/drive')

In [4]:
# import os

# # TODO: Fill in the Google Drive path where you uploaded the assignment
# # Example: If you create a 2022SP folder and put all the files under CP1 folder, then '2022SP/CP1'
# # GOOGLE_DRIVE_PATH_AFTER_MYDRIVE = '2022SP/CP1'
# GOOGLE_DRIVE_PATH_AFTER_MYDRIVE = None 
# GOOGLE_DRIVE_PATH = os.path.join('drive', 'MyDrive', GOOGLE_DRIVE_PATH_AFTER_MYDRIVE)
# print(os.listdir(GOOGLE_DRIVE_PATH))

In [5]:
# import sys
# sys.path.append(GOOGLE_DRIVE_PATH)

In [6]:
from dataset import CIFAR10_4x
from evaluation import evaluation

from model_dropout import Net  # this should be implemented by yourself

### Enjoy Your Coding Time!

In [7]:
import math
import os
import random
import matplotlib.pyplot as plt
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F

from torchvision import transforms
from PIL import Image


def set_seed(seed):
    seed = int(seed)
    if seed < 0 or seed > (2**32 - 1):
        raise ValueError("Seed must be between 0 and 2**32 - 1")
    else:
        random.seed(seed)
        np.random.seed(seed)
        torch.manual_seed(seed)
        torch.cuda.manual_seed(seed)
        torch.backends.cudnn.deterministic = True


device = 'cuda' if torch.cuda.is_available() else 'cpu'
set_seed(16)

In [8]:
data_root_dir = '.'

transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize([125 / 255, 124 / 255, 115 / 255],
                         [60 / 255, 59 / 255, 64 / 255])
    
])
train_transform = transforms.Compose([
    transforms.RandomHorizontalFlip(),
    transforms.RandomCrop(128,padding=4),
    transforms.RandomRotation(15),
    transforms.ToTensor(),
    transforms.Normalize([125 / 255, 124 / 255, 115 / 255],
                         [60 / 255, 59 / 255, 64 / 255])
])


In [9]:
# Re-run for new dataset

NUMOFWORKERS = 2
trainset = CIFAR10_4x(root=data_root_dir,
                      split="train", transform=train_transform)
trainloader = torch.utils.data.DataLoader(
    trainset, batch_size=32, shuffle=True, num_workers=NUMOFWORKERS, pin_memory=True)

validset = CIFAR10_4x(root=data_root_dir,
                      split='valid', transform=transform)
validloader = torch.utils.data.DataLoader(
    validset, batch_size=32, shuffle=False, num_workers=NUMOFWORKERS)

In [10]:
####################################
#                                  #
#                                  #
#                                  #
#                                  #
#        RUN THIS FOR NEW MODEL    #
#                                  #
#                                  #
#                                  # 
#                                  #
####################################
net = Net()
print("number of trained parameters: %d" % (
    sum([param.nelement() for param in net.parameters() if param.requires_grad])))
print("number of total parameters: %d" %
      (sum([param.nelement() for param in net.parameters()])))

criterion = nn.CrossEntropyLoss()
train_history = []
optimizer = torch.optim.Adam(net.parameters(), lr=1e-3)

net.to(device)

number of trained parameters: 2727626
number of total parameters: 2727626


In [10]:
# # NO EXECUTION
# model_dir = '.'
# if not os.path.exists(model_dir):
#     os.makedirs(model_dir)
# torch.save(net, os.path.join(model_dir, 'cifar10_4x_0.pth'))

# # check the model size
# os.system(' '.join(['du', '-h', os.path.join(model_dir, 'cifar10_4x_0.pth')]))

In [11]:
# assert(False)

def train(epochs:int):
    ##############################################################################
    #                  TODO: You need to complete the code here                  #
    ##############################################################################
    # YOUR CODE HERE
    # print(optimizer.param_groups[0])
    trainset = CIFAR10_4x(root=data_root_dir,
                      split="train", transform=train_transform)
    trainloader = torch.utils.data.DataLoader(
    trainset, batch_size=32, shuffle=True, num_workers=NUMOFWORKERS, pin_memory=True)
    results = []
    from tqdm import tqdm
    from evaluation import evaluation
    for epoch in range(epochs):
        net.train()
        with tqdm(trainloader) as bar:
            nums = trues = 0
            losses = 0
            for i,(xb,yb) in enumerate(bar):
                xb,yb = xb.to(device),yb.to(device)
                optimizer.zero_grad()
                preds = net(xb)
                with torch.no_grad():
                    _, answer = torch.max(preds, 1)
                    trues += (answer==yb).sum().item()
                    nums += len(yb)
                loss = criterion(preds,yb)
                losses += loss
                loss.backward()
                optimizer.step()
                if(i%100==0):
                    train_loss = losses/nums
                    bar.set_description(f'Epoch {epoch}/{epochs}, accuracy {100*trues/nums}, train loss {train_loss}')
        train_loss = losses/nums
        try:
            train_loss = train_loss.sum().item()
        except:
            pass
        train_acc = 100*trues/nums
        print(f'train accuracy: {train_acc}%')
        net.eval()
        nums = loss = 0
        with tqdm(validloader) as bar:
            with torch.no_grad():
                for i,(xb,yb) in enumerate(bar):
                    xb,yb = xb.to(device),yb.to(device)
                    preds = net(xb)
                    loss += criterion(preds,yb)
                    nums += len(yb)
        valid_loss = loss/nums
        print(f'valid loss: {valid_loss}')
        valid_acc =evaluation(net,validloader,device)
        print(f'valid accuracy: {valid_acc}%')
        results.append({'train accuracy':train_acc,'valid accuracy':valid_acc,'loss':(train_loss),'valid loss':(valid_loss).sum().item()})
    train_history.append({
        'optimizer':{k:v for k,v in optimizer.param_groups[0].items() if k!='params'},
        'epochs':epochs,
        'results':results
    })
    return train_acc,valid_acc
    # raise NotImplementedError()
    ##############################################################################
    #                              END OF YOUR CODE                              #
    ##############################################################################

In [12]:
torch.backends.cudnn.enabled = True

In [12]:
net = torch.load('./models/cifar10_4x_03191512_acc92.8.pth')
import json
past = json.load(open('./models/cifar10_4x_03191512_acc92.8.json','r'))
train_history = past['train_history']
print(past['test_acc'])

90.12999999999998


In [13]:
from model import Net
net_ori = Net()
net_ori.load_state_dict(net.state_dict())

<All keys matched successfully>

In [18]:
net_ori.to(device)

In [19]:
net_ori.state_dict()

OrderedDict([('conv1.0.weight',
              tensor([[[[-1.7440e-02, -8.1656e-03, -7.5222e-03,  ..., -1.2233e-02,
                         -1.5194e-02, -1.3172e-02],
                        [-8.8385e-03, -5.5660e-03, -5.4423e-03,  ..., -7.8221e-03,
                         -9.8965e-03, -1.1722e-02],
                        [-8.0159e-03, -8.4541e-03, -9.1748e-03,  ..., -9.3273e-03,
                         -7.5122e-03, -6.4457e-03],
                        ...,
                        [-9.2361e-03, -1.1890e-02, -1.1282e-02,  ..., -1.1115e-02,
                         -8.9423e-03, -8.9093e-03],
                        [-7.9806e-03, -1.0732e-02, -8.8167e-03,  ..., -8.0452e-03,
                         -8.4357e-03, -7.5814e-03],
                        [-9.4170e-03, -1.1087e-02, -7.4050e-03,  ..., -7.7279e-03,
                         -9.0887e-03, -1.3012e-02]],
              
                       [[ 2.1310e-03,  9.4590e-03,  6.0972e-03,  ...,  7.6739e-04,
                          1.69

In [17]:
torch.save(net_ori.state_dict(),'./models/cifar10_4x_03191512_acc92.8_newmodel_state_dict.pt')

In [38]:
optimizer = torch.optim.Adam(net.parameters(),lr=3e-5,weight_decay=2e-2) 
tacc,vacc = train(3)

Epoch 0/3, accuracy 94.75697335553706, train loss 0.006143929436802864: 100%|██████████| 1250/1250 [00:27<00:00, 44.97it/s]


train accuracy: 94.755%


100%|██████████| 313/313 [00:02<00:00, 107.41it/s]

valid loss: 0.009316935203969479





Accuracy of the network on the valid images: 91 %
valid accuracy: 91.12%


Epoch 1/3, accuracy 94.63207743547044, train loss 0.006257453002035618: 100%|██████████| 1250/1250 [00:28<00:00, 43.84it/s]


train accuracy: 94.58%


100%|██████████| 313/313 [00:02<00:00, 110.48it/s]

valid loss: 0.010325686074793339





Accuracy of the network on the valid images: 90 %
valid accuracy: 90.13%


Epoch 2/3, accuracy 94.51758950874272, train loss 0.006417965516448021: 100%|██████████| 1250/1250 [00:29<00:00, 42.34it/s]


train accuracy: 94.5275%


100%|██████████| 313/313 [00:02<00:00, 113.21it/s]

valid loss: 0.008552640676498413





Accuracy of the network on the valid images: 92 %
valid accuracy: 92.08%


In [39]:
optimizer = torch.optim.Adam(net.parameters(),lr=1.2e-5,weight_decay=2.2e-2) 
tacc,vacc = train(10)

Epoch 0/10, accuracy 95.72751873438801, train loss 0.00534795830026269: 100%|██████████| 1250/1250 [00:28<00:00, 43.66it/s] 


train accuracy: 95.74%


100%|██████████| 313/313 [00:02<00:00, 108.79it/s]

valid loss: 0.008120805025100708





Accuracy of the network on the valid images: 92 %
valid accuracy: 92.36%


Epoch 1/10, accuracy 96.35199833472106, train loss 0.004792569670826197: 100%|██████████| 1250/1250 [00:29<00:00, 42.82it/s]


train accuracy: 96.345%


100%|██████████| 313/313 [00:02<00:00, 112.48it/s]

valid loss: 0.007896470837295055





Accuracy of the network on the valid images: 92 %
valid accuracy: 92.52%


Epoch 2/10, accuracy 96.22450041631973, train loss 0.00494211632758379: 100%|██████████| 1250/1250 [00:29<00:00, 42.05it/s] 


train accuracy: 96.23%


100%|██████████| 313/313 [00:03<00:00, 103.57it/s]

valid loss: 0.008097859099507332





Accuracy of the network on the valid images: 92 %
valid accuracy: 92.6%


Epoch 3/10, accuracy 96.37021232306411, train loss 0.0047032940201461315: 100%|██████████| 1250/1250 [00:29<00:00, 42.58it/s]


train accuracy: 96.33%


100%|██████████| 313/313 [00:02<00:00, 110.73it/s]

valid loss: 0.008344592526555061





Accuracy of the network on the valid images: 92 %
valid accuracy: 92.06%


Epoch 4/10, accuracy 96.46128226477936, train loss 0.004728394560515881: 100%|██████████| 1250/1250 [00:29<00:00, 42.61it/s]


train accuracy: 96.455%


100%|██████████| 313/313 [00:02<00:00, 111.61it/s]

valid loss: 0.008223753422498703





Accuracy of the network on the valid images: 92 %
valid accuracy: 92.36%


Epoch 5/10, accuracy 96.4560782681099, train loss 0.004760431125760078: 100%|██████████| 1250/1250 [00:29<00:00, 42.84it/s] 


train accuracy: 96.445%


100%|██████████| 313/313 [00:02<00:00, 108.60it/s]

valid loss: 0.008273315615952015





Accuracy of the network on the valid images: 92 %
valid accuracy: 92.25%


Epoch 6/10, accuracy 96.52633222314738, train loss 0.004672984592616558: 100%|██████████| 1250/1250 [00:29<00:00, 42.72it/s]


train accuracy: 96.54%


100%|██████████| 313/313 [00:02<00:00, 104.67it/s]

valid loss: 0.007878397591412067





Accuracy of the network on the valid images: 92 %
valid accuracy: 92.7%


Epoch 7/10, accuracy 96.67204412989176, train loss 0.004619973711669445: 100%|██████████| 1250/1250 [00:28<00:00, 43.22it/s]


train accuracy: 96.6825%


100%|██████████| 313/313 [00:02<00:00, 109.93it/s]

valid loss: 0.008238489739596844





Accuracy of the network on the valid images: 92 %
valid accuracy: 92.36%


Epoch 8/10, accuracy 96.55755620316403, train loss 0.004694092553108931: 100%|██████████| 1250/1250 [00:29<00:00, 42.99it/s] 


train accuracy: 96.5375%


100%|██████████| 313/313 [00:02<00:00, 113.93it/s]

valid loss: 0.008294926024973392





Accuracy of the network on the valid images: 92 %
valid accuracy: 92.52%


Epoch 9/10, accuracy 96.5419442131557, train loss 0.004611068870872259: 100%|██████████| 1250/1250 [00:29<00:00, 42.95it/s] 


train accuracy: 96.5375%


100%|██████████| 313/313 [00:02<00:00, 112.70it/s]

valid loss: 0.0080189174041152





Accuracy of the network on the valid images: 92 %
valid accuracy: 92.8%


In [21]:
# for th in train_history[1:]:
#     result = th['results']
#     for res in result:
#         try:
#             del res['loss']
#         except KeyError:
#             pass

# MANUAL

In [40]:
from test import fulltest
import time

DO_TEST = True
test_acc = None
current_time = time.strftime("%m%d%H%M")
model_dir = '.'
try:
    model_pth = os.path.join(model_dir,'models', f'cifar10_4x_{current_time}_acc{vacc}.pth')
    torch.save(net, model_pth)
except: 
    torch.save(net.state_dict(),os.path.join(model_dir,'models', f'cifar10_4x_{current_time}_acc{vacc}_dict.pt'))
    print('only state dict saved!!')
else: 
    if DO_TEST:
        test_acc = fulltest(model_pth)
model_src = ''
with open('model.py','r') as f:
    model_src=f.read()
with open(os.path.join(model_dir,'models', f'cifar10_4x_{current_time}_acc{vacc}.json'),'w') as f:
    import json
    json.dump({
        'model_type':model_src,
        'train_history':train_history,
        'description':"""
    Don't overfit.
                """,
        'test_acc':None if test_acc is None else test_acc
    },f,indent=4)

The accuracy of model is 90.12999999999998 %


In [37]:
def iterate(j,lyer=0):
    if not isinstance(j,(dict,list,tuple)):
        if isinstance(j,torch.Tensor):
            print(j)
            j=j.item()
            # return True
    elif isinstance(j,dict):
        for i,(x,y) in enumerate(j.items()):
            # print('  '*lyer,y)
            b = iterate(y,lyer+1)
            if b and not isinstance(y,(dict,list,tuple)):
                j[x]=y.item()
                print(f'layer {lyer},find at entry {i}')
                # return True
        # return False
    else:
        for i,x in enumerate(j):
            # print('  '*lyer,x)
            b = iterate(x,lyer+1)
            if b and not isinstance(x,(dict,list,tuple)):
                j[i]=x.item()
                print(f'layer {lyer},find at entry {i}')
                # return True
        # return False
iterate(train_history)

tensor(0.0040, device='cuda:0', grad_fn=<DivBackward0>)
tensor(0.0040, device='cuda:0', grad_fn=<DivBackward0>)
tensor(0.0040, device='cuda:0', grad_fn=<DivBackward0>)
tensor(0.0040, device='cuda:0', grad_fn=<DivBackward0>)
tensor(0.0040, device='cuda:0', grad_fn=<DivBackward0>)
tensor(0.0040, device='cuda:0', grad_fn=<DivBackward0>)
tensor(0.0040, device='cuda:0', grad_fn=<DivBackward0>)
tensor(0.0040, device='cuda:0', grad_fn=<DivBackward0>)
tensor(0.0040, device='cuda:0', grad_fn=<DivBackward0>)
tensor(0.0040, device='cuda:0', grad_fn=<DivBackward0>)


In [38]:
newdic = {}
def replace(iterable):
    if isinstance(iterable,dict):
        for j in iterable:
            iterable[j]=replace(iterable)
    elif isinstance(iterable,list):
        for i in range(len(iterable)):
            iterable[i]=replace(iterable[i])
    else:
        return iterable.item() if isinstance(iterable,torch.Tensor) else iterable
replace(train_history)


RecursionError: maximum recursion depth exceeded while calling a Python object

# AUTOMIZATION

In [23]:
def load(vacc,test_acc):
    with open('model.py','r') as f:
        model_src=f.read()
    current_time = time.strftime("%m%d%H%M")
    model_dir = '.'
    with open(os.path.join(model_dir,'models', f'cifar10_4x_{current_time}_acc{vacc}.json'),'w') as f:
        import json
        json.dump({
            'model_type':model_src,
            'train_history':train_history,
            'description':"""Automatically ran by machine.
                    """,
            'test_acc':None if test_acc is None else test_acc
    },f,indent=4)
    return current_time

In [24]:
# # automatically try hyperparams
# from test import fulltest
# import time
# hp_lr = range(1e-4,2e-4,1e-5)
# hp_wd = range(1e-3,2e-3,1e-4)
# best_time = '03182306'
# best_acc = '92.36'
# best_test_acc = 0
# for lr in hp_lr:
#     for wd in hp_wd:
#         net = torch.load(f'./models/cifar10_4x_{best_time}_acc{best_acc}.pth')
#         import json
#         past = json.load(open(f'./models/cifar10_4x_{best_time}_acc{best_acc}.json','r'))
#         train_history = past['train_history']
#         print(past['test_acc'])
#         optimizer = torch.optim.Adam(net.parameters(),lr=1e-4,weight_decay=1.5e-3) 
#         for _ in range(2):
#             tacc,vacc = train(15)
#             test_acc = 0
#             if vacc<90: # hyper param too bad
#                 break
#             else:
#                 test_acc = fulltest(model_pth)
#                 if test_acc > best_test_acc:
#                     # load model
#                     best_test_acc = test_acc
#                     ctime = load(vacc,test_acc)
#                     best_acc = vacc
#                     best_time = ctime
     


In [26]:
# automization
# this code has bug
lr = 1e-4
weight_decay = 1.5e-3
cnt = 0
model_dir = '.'
best_test_acc = 90.00
best_file_name = 'cifar10_4x_03182306_acc92.39'
net = torch.load(os.path.join('./models',f'{best_file_name}.pth'))
import json
train_history = json.load(open(os.path.join('./models',f'{best_file_name}.json'),'r'))
while cnt<99:
    optimizer = torch.optim.Adam(net.parameters(),lr=lr,weight_decay=weight_decay) 
    tacc,vacc = train(10)
    import time
    current_time = time.strftime("%m%d%H%M")
    try:
        torch.save(net, os.path.join(model_dir,'models', f'cifar10_4x_{current_time}_acc{vacc}.pth'))
    except:
        torch.save(net.state_dict(),os.path.join(model_dir,'models', f'cifar10_4x_{current_time}_acc{vacc}_dict.pt'))
    test_acc = 0
    if vacc>=90:
        test_acc = fulltest(model_pth)
    model_src = ''
    with open('model.py','r') as f:
        model_src=f.read()
    with open(os.path.join(model_dir,'models', f'cifar10_4x_{current_time}_acc{vacc}.json'),'w') as f:
        import json
        json.dump({
            'model_type':model_src,
            'train_history':train_history,
            'description':'Auto-run by machine',
            'test_acc':test_acc
        },f,indent=4)
    if test_acc > best_test_acc:
        best_test_acc = test_acc
        best_file_name = f'cifar10_4x_{current_time}_acc{vacc}'
    elif test_acc < best_test_acc -1:
        net = torch.load(os.path.join('./models',f'{best_file_name}.pth'))
        train_history = torch.load(open(os.path.join('./models',f'{best_file_name}.json'),'r'))
    else:
        if tacc-vacc > 1.2:
            if lr > 3e-5:
                lr -= 0.5e-5
            if weight_decay < 2e-3: 
                weight_decay += 1e-4
    cnt+=1

Epoch 0/10, accuracy 95.05620316402998, train loss 0.004494397900998592: 100%|██████████| 1250/1250 [00:29<00:00, 42.45it/s]


train accuracy: 95.065%


100%|██████████| 313/313 [00:02<00:00, 118.78it/s]

valid loss: 0.007777239196002483





Accuracy of the network on the valid images: 92 %
valid accuracy: 92.19%


Epoch 1/10, accuracy 95.12645711906744, train loss 0.004572172649204731: 100%|██████████| 1250/1250 [00:29<00:00, 42.35it/s]


train accuracy: 95.0725%


100%|██████████| 313/313 [00:02<00:00, 114.09it/s]

valid loss: 0.00754680298268795





Accuracy of the network on the valid images: 92 %
valid accuracy: 92.55%


Epoch 2/10, accuracy 95.11865112406328, train loss 0.004561053588986397: 100%|██████████| 1250/1250 [00:29<00:00, 42.77it/s]


train accuracy: 95.1525%


100%|██████████| 313/313 [00:02<00:00, 115.38it/s]

valid loss: 0.007818366400897503





Accuracy of the network on the valid images: 92 %
valid accuracy: 92.25%


Epoch 3/10, accuracy 94.98855120732723, train loss 0.004580140579491854: 100%|██████████| 1250/1250 [00:28<00:00, 43.59it/s]


train accuracy: 94.955%


100%|██████████| 313/313 [00:02<00:00, 115.32it/s]

valid loss: 0.008238258771598339





Accuracy of the network on the valid images: 91 %
valid accuracy: 91.81%


Epoch 4/10, accuracy 94.90008326394671, train loss 0.004754957277327776: 100%|██████████| 1250/1250 [00:29<00:00, 42.54it/s]


train accuracy: 94.8625%


100%|██████████| 313/313 [00:02<00:00, 114.80it/s]

valid loss: 0.007548181805759668





Accuracy of the network on the valid images: 92 %
valid accuracy: 92.32%


Epoch 5/10, accuracy 94.8376353039134, train loss 0.004798813723027706: 100%|██████████| 1250/1250 [00:29<00:00, 42.56it/s]  


train accuracy: 94.8625%


100%|██████████| 313/313 [00:02<00:00, 116.87it/s]

valid loss: 0.007914402522146702





Accuracy of the network on the valid images: 92 %
valid accuracy: 92.11%


Epoch 6/10, accuracy 95.02497918401332, train loss 0.0046312701888382435: 100%|██████████| 1250/1250 [00:29<00:00, 42.99it/s]


train accuracy: 95.06%


100%|██████████| 313/313 [00:02<00:00, 114.20it/s]

valid loss: 0.007885714061558247





Accuracy of the network on the valid images: 92 %
valid accuracy: 92.11%


Epoch 7/10, accuracy 94.95732722731057, train loss 0.004652521573007107: 100%|██████████| 1250/1250 [00:28<00:00, 43.44it/s] 


train accuracy: 94.965%


100%|██████████| 313/313 [00:02<00:00, 113.59it/s]

valid loss: 0.007862955331802368





Accuracy of the network on the valid images: 92 %
valid accuracy: 92.37%


Epoch 8/10, accuracy 95.02237718567861, train loss 0.00464416341856122: 100%|██████████| 1250/1250 [00:29<00:00, 42.61it/s] 


train accuracy: 95.0325%


100%|██████████| 313/313 [00:02<00:00, 117.61it/s]

valid loss: 0.008387304842472076





Accuracy of the network on the valid images: 91 %
valid accuracy: 91.92%


Epoch 9/10, accuracy 94.81161532056619, train loss 0.0048386831767857075: 100%|██████████| 1250/1250 [00:28<00:00, 43.70it/s]


train accuracy: 94.78%


100%|██████████| 313/313 [00:02<00:00, 116.75it/s]

valid loss: 0.007918139919638634





Accuracy of the network on the valid images: 92 %
valid accuracy: 92.18%


AttributeError: 'dict' object has no attribute 'append'

In [None]:
# state_dicts = []
# for _ in range(10):
#     train(10)
#     state_dicts.append(net.state_dict())

In [None]:
from evaluation import evaluation
evaluation(net,validloader,device)

Accuracy of the network on the valid images: 78 %


78.22

In [None]:
# print(current_time)
model_path = os.path.join(model_dir,'models', f'cifar10_4x_best.pth')
torch.save(net, model_path)
assert(False)
try:
    torch.save(net, model_path)
except:
    torch.save(net.state_dict(),model_path)
    print('Saving has some errors')
# torch.save(net, os.path.join(model_dir, f'cifar10_4x_{time.localtime()}.pth'))

AssertionError: 

## Evaluation

Before submission, please run the following cell to make sure your model can be correctly graded.

In [None]:
!python evaluation.py

number of trained parameters: 2276170
number of total parameters: 2276170
can't load test set because [Errno 2] No such file or directory: '/root/DL/cifar_10_4x/test', load valid set now
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
 ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
 Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1132, in _try_get_data
    data = self._data_queue.get(timeout=timeout)
  File "/opt/conda/lib/python3.10/multiprocessing/queues.py", line 113, in get
    if not self._poll(timeout):
  File "/opt/conda/lib/python3.10/multiprocessing/connection.py", line 257, in poll
    return self._poll(timeout)
  File "/opt/conda/lib/python3.10/multiprocessing/connection.py", line 424, in _poll
    r = wait([self], timeout)
  File "/opt/conda/lib/python3.10/multiprocessing/connectio

In [None]:
# model save
acc = 91
import time
current_time = time.strftime("%m%d%H%M")
os.rename(model_path,os.path.join(model_dir,'models', f'cifar10_4x_{current_time}_acc{acc}.pth'))
model_src = ''

with open('model.py','r') as f:
    model_src=f.read()

with open(os.path.join(model_dir,'models', f'cifar10_4x_{current_time}_acc{acc}.json'),'w') as f:
    import json
    json.dump({
        'model_type':model_src,
        'train_history':train_history,
        'description':"""
New net with smaller size
    """
    },f,indent=4)