### ctxt Packing
Joonwoo Lee et al 2021 paces a channel of an image in a ctxt, 
using only 1024 slots out of 16384. (sparse packing)

### AvgPool
마지막에 AvgPool 하나와 FC가 하나 있음. 8x8 이미지를 8x8 kernel로 AvgPool해서 1 x 64-channel 이 됨. 
그 다음에 64개의 ctxt가 하나의 ctxt로 합쳐짐.  -- 어떻게 잘 합칠까? 

### Softmax
Approximate softmax는 계산량이 상당히 많음. 
그러다고 softmax 없이 training을 할 수는 없음 (argmax는 differentiable하지 않으므로)
1. softmax로 훈련한 뒤에 argmax로 교체해서 evaluate하거나 
2.그 전에 decrypt해서 plain text에 softmax 계산하거나


In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import matplotlib.pyplot as plt 
import torch
import numpy as np
from torchvision import datasets
import torchvision.transforms as transforms
from torch.utils.data.sampler import SubsetRandomSampler
from torch import nn
import torch.nn.functional as F

# check if CUDA is available
train_on_gpu = torch.cuda.is_available()

if not train_on_gpu:
    print('CUDA is not available.  Training on CPU ...')
    device = "cpu"
else:
    print('CUDA is available!  Training on GPU ...')
    device = "cuda"

CUDA is available!  Training on GPU ...


Prepare Train / test data sets

In [3]:
num_workers = 0
batch_size = 32
valid_size = 0.2


## Scale 
transform = transforms.Compose([
     transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (2.5, 2.5, 2.5))
     ])

train_data = datasets.CIFAR10('data', train=True,
                              download=True,
                              transform=transform
                             )
test_data = datasets.CIFAR10('data', train=False,
                             download=True, 
                             transform=transform
                            )

num_train = len(train_data)
indices = list(range(num_train))
np.random.shuffle(indices)
split = int(np.floor(valid_size * num_train))
train_idx, valid_idx = indices[split:], indices[:split]

train_sampler = SubsetRandomSampler(train_idx)
valid_sampler = SubsetRandomSampler(valid_idx)

# prepare data loaders (combine dataset and sampler)
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size,
    sampler=train_sampler, num_workers=num_workers)
valid_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, 
    sampler=valid_sampler, num_workers=num_workers)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size, 
    num_workers=num_workers)

# specify the image classes
classes = ['airplane', 'automobile', 'bird', 'cat', 'deer',
           'dog', 'frog', 'horse', 'ship', 'truck']

Files already downloaded and verified
Files already downloaded and verified


## For generating Netron diagram 

In [7]:
import torch.onnx

model = ConvNeuralNet(num_classes=10, activation = F.relu)
model.to(device)
torch.onnx.export(model, 
                  data, 
                  "Simple_CNN7.onnx",
                  input_names=["input"],
                  output_names=["output"],
                  opset_version=12)

model

## SIMPLE CNN MODEL with 3 activations

In [4]:
from fase.nn.models import ConvNeuralNet

In [1]:
from torch import nn

In [2]:
nn.Softmax?

In [5]:
def run_test():
    # track test loss
    test_loader = torch.utils.data.DataLoader(test_data, batch_size=32, 
        num_workers=num_workers)

    test_loss = 0.0
    class_correct = list(0. for i in range(10))
    class_total = list(0. for i in range(10))

    model.eval()
    # iterate over test data
    for data, target in test_loader:
        if train_on_gpu:
            data, target = data.cuda(), target.cuda()
        output = model(data)
        loss = criterion(output, target)
        test_loss += loss.item()*data.size(0)
        _, pred = torch.max(output, 1)    
        correct_tensor = pred.eq(target.data.view_as(pred))
        correct = np.squeeze(correct_tensor.numpy()) if not train_on_gpu else np.squeeze(correct_tensor.cpu().numpy())

        for i in range(len(data)):
            label = target.data[i]
            class_correct[label] += correct[i].item()
            class_total[label] += 1

    # average test loss
    test_loss = test_loss/len(test_loader.dataset)
    print('Test Loss: {:.6f}\n'.format(test_loss))

    for i in range(10):
        if class_total[i] > 0:
            print('Test Accuracy of %5s: %2d%% (%2d/%2d)' % (
                classes[i], 100 * class_correct[i] / class_total[i],
                np.sum(class_correct[i]), np.sum(class_total[i])))
        else:
            print('Test Accuracy of %5s: N/A (no training examples)' % (classes[i]))

    print('\nTest Accuracy (Overall): %2d%% (%2d/%2d)\n' % (
        100. * np.sum(class_correct) / np.sum(class_total),
        np.sum(class_correct), np.sum(class_total)))

In [31]:
from approximate import approx_sign
from approximate import approx_relu

xfactor = 40

activation = lambda x : xfactor * approx_relu(x/xfactor, degree = 5, repeat=4)

import torch.optim as optim
model = ConvNeuralNet(num_classes=10, activation = activation)
model.to(device)
# Set Loss function with criterion
criterion = nn.CrossEntropyLoss()

# Set optimizer with optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=0.001, weight_decay = 0.005, momentum = 0.9)  
valid_loss_min = np.Inf 

## Training

In [32]:
n_epochs = 40
train_losslist=[]

for epoch in range(1, n_epochs+1):

    # keep track of training and validation loss
    train_loss = 0.0
    valid_loss = 0.0
    
    model.to(device)
    model.train()
    
    for data, target in train_loader:
        if train_on_gpu:
            data, target = data.cuda(), target.cuda()
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()
        train_loss += loss.item()*data.size(0)
        
    model.eval()
    for data, target in valid_loader:
        if train_on_gpu:
            data, target = data.cuda(), target.cuda()
        #print("VALID-------------------------------------------------------------------")
        output = model(data)
        loss = criterion(output, target)
        valid_loss += loss.item()*data.size(0)
    
    # calculate average losses
    train_loss = train_loss/len(train_loader.dataset)
    valid_loss = valid_loss/len(valid_loader.dataset)
    train_losslist.append(train_loss)
        
    print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}'.format(
        epoch, train_loss, valid_loss))
    
    if valid_loss <= valid_loss_min:
        print('Validation loss decreased ({:.6f} --> {:.6f}).  Saving model ...'.format(
        valid_loss_min,
        valid_loss))
        torch.save(model.state_dict(), 'SimpleCNN_ReLU_minimax_v2_max.pt')
        valid_loss_min = valid_loss
        # 
        run_test()

Epoch: 1 	Training Loss: 1.140274 	Validation Loss: 0.269360
Validation loss decreased (inf --> 0.269360).  Saving model ...
Test Loss: 1.360481

Test Accuracy of airplane: 50% (506/1000)
Test Accuracy of automobile: 44% (444/1000)
Test Accuracy of  bird: 14% (142/1000)
Test Accuracy of   cat: 52% (521/1000)
Test Accuracy of  deer: 65% (653/1000)
Test Accuracy of   dog: 44% (441/1000)
Test Accuracy of  frog: 85% (857/1000)
Test Accuracy of horse: 59% (593/1000)
Test Accuracy of  ship: 57% (572/1000)
Test Accuracy of truck: 59% (596/1000)

Test Accuracy (Overall): 53% (5325/10000)

Epoch: 2 	Training Loss: 0.803178 	Validation Loss: 0.189093
Validation loss decreased (0.269360 --> 0.189093).  Saving model ...
Test Loss: 0.964806

Test Accuracy of airplane: 62% (623/1000)
Test Accuracy of automobile: 65% (655/1000)
Test Accuracy of  bird: 38% (380/1000)
Test Accuracy of   cat: 48% (486/1000)
Test Accuracy of  deer: 76% (766/1000)
Test Accuracy of   dog: 62% (620/1000)
Test Accuracy of  f

In [21]:
# Load the best one
model.load_state_dict(torch.load("SimpleCNN_ReLU_minimax_2.pt"))

<All keys matched successfully>

In [25]:
#apprx_swish = chebyshev.Chebyshev.fit(xx, swish(xx), deg=42)
model = ConvNeuralNet(num_classes=10, activation = F.relu)
model.to(device)
model.load_state_dict(torch.load("SimpleCNN_ReLU_minimax_2.pt"))
run_test()

# 16 degree 정도로 비슷한 성능이 유지되는 모델과 apprx 조합을 찾아야함. 

Test Loss: 1.967733

Test Accuracy of airplane: 91% (916/1000)
Test Accuracy of automobile: 45% (457/1000)
Test Accuracy of  bird: 15% (151/1000)
Test Accuracy of   cat: 61% (616/1000)
Test Accuracy of  deer:  3% (33/1000)
Test Accuracy of   dog: 11% (113/1000)
Test Accuracy of  frog:  1% (13/1000)
Test Accuracy of horse: 36% (361/1000)
Test Accuracy of  ship: 30% (308/1000)
Test Accuracy of truck: 85% (859/1000)

Test Accuracy (Overall): 38% (3827/10000)



In [81]:
run_test()

Test Loss: 1.076213

Test Accuracy of airplane: 72% (724/1000)
Test Accuracy of automobile: 73% (738/1000)
Test Accuracy of  bird: 50% (507/1000)
Test Accuracy of   cat: 59% (594/1000)
Test Accuracy of  deer: 36% (365/1000)
Test Accuracy of   dog: 54% (548/1000)
Test Accuracy of  frog: 78% (788/1000)
Test Accuracy of horse: 59% (598/1000)
Test Accuracy of  ship: 55% (555/1000)
Test Accuracy of truck: 80% (802/1000)

Test Accuracy (Overall): 62% (6219/10000)


Test Loss: 0.744811

Test Accuracy of airplane: 83% (836/1000)
Test Accuracy of automobile: 84% (848/1000)
Test Accuracy of  bird: 59% (593/1000)
Test Accuracy of   cat: 58% (586/1000)
Test Accuracy of  deer: 64% (641/1000)
Test Accuracy of   dog: 68% (688/1000)
Test Accuracy of  frog: 84% (848/1000)
Test Accuracy of horse: 77% (771/1000)
Test Accuracy of  ship: 81% (812/1000)
Test Accuracy of truck: 81% (814/1000)

Test Accuracy (Overall): 74% (7437/10000)

## 성능 

relu + maxpool: ~62%

relu + avgpool: ~58% -- OK, maxpool -> avgpool은 큰 문제 없음. 

approx. relu + avgpool: 52% !! 

approx. relu + avgpool + BN (2Conv + 3FC, 20 epoch): 58% 정도? 

approx. relu + avgpool + BN (2Conv + 2FC, 50 epoch: 59% 