## Part A
We are training a 3-layered MLP to classify ctg_data.

#### Starter
We are expecting that you have activated the correct conda environment.  

Firstly, lets check Pytroch. 

In [1]:
import torch
print(torch.__version__)
torch.cuda.is_available()

1.1.0


True

We are using torch>=1.0. If you have GPU, you should also see a `True` above.

### 1. Train a MLP
We are training a MLP to perform a classification task. Assume a learning rate `lr = 0.01`, L2 regularization with weight decay parameter `weight_decay = 10−6`, and `batch_size = 32`. 
#### Import libs

In [2]:
from train import train
from models.seq_net import SeqNet
from utils.data import split_test_data
from dataset import simple_dataset, cla

Using device: cuda


#### Define constant
As our dataset is fairly small, the epoch size is set to be a large number. Besides, epoch/second is relatively large (around 10 per second), frequently saving weight may cause a bottleneck to the throughput. Therefore, we set epoch number per weight saving to be a larger value. 

In [3]:
# epoch number
epoch = 10000
# batch size
batch = 32
# learning rate
lr = 0.01
# weight decay rate
weight_decay = 1e-6
# save per epoch
save_epoch = 500

#### Create dataset

In [6]:
dataset = simple_dataset.SimpleDataset('data/ctg_data_cleaned.csv', cla.cla_preprocessor)
train_dataset, test_dataset = split_test_data(dataset)

RuntimeError: invalid argument 3: Index tensor must either be empty or have same dimensions as output tensor at /opt/conda/conda-bld/pytorch_1556653183467/work/aten/src/TH/generic/THTensorEvenMoreMath.cpp:533

#### Create model and optimizer

In [None]:
model = SeqNet(10)
optimizer = torch.optim.SGD(model.parameters(), lr=lr, weight_decay=weight_decay)

#### Start training
We use 5-fold cross validation to obtain 5 models. After training, the training accuracy and validation accuracy per epoch of the 5 models are collected.

In [None]:
train_accs, val_accs = train(model=model, optimizer=optimizer, dataset=train_dataset,
                             save_dir='output', save_epoch=save_epoch, name='seqnet', log_dir='log',
                             epoch=epoch, batch=batch, device='cuda', fold_num=5)

#### Save Accuracy
The accuracies are saved, so that we can perform some analysis over them. This is always necessary, as training may takes a lot of times. By doing this, the training part does not need to re-run every time we reopened the notebook.

In [None]:
import pickle

with open('train_accs.pickle', 'wb') as f:
    pickle.dump(train_accs, f)

with open('val_accs.pickle', 'wb') as f:
    pickle.dump(val_accs, f)

### 2. Plot Graph

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# create x
x = np.arange(1, len(train_accs[0]) + 1)

def trim_axs(axs, N):
    """little helper to massage the axs list to have correct length..."""
    axs = axs.flat
    for ax in axs[N:]:
        ax.remove()
    return axs[:N]

Plot training accuracies and validation accuracies against epochs over 5 models generated by k-fold cross validation.

In [None]:
figsize = (10, 8)
fig, axs = plt.subplots(2, 3, figsize=figsize)
# axs[1][2].legend(bbox_to_anchor=(1.05, 1), loc='lower right', borderaxespad=0.)
axs = trim_axs(axs, 5)
t_l, v_l = None, None
for idx, ax in enumerate(axs, 0):
    ax.set_title('fold {}'.format(idx))
    t_l = ax.plot(x, train_accs[idx], ls='-', ms=4)
    v_l = ax.plot(x, val_accs[idx], ls='-', ms=4)
plt.figlegend([t_l, v_l], labels=['train accuracy', 'validation accuracy'], loc=(0.75, 0.4))
plt.show()

### 3. Find Optimal Batch Size