This tutorial explains the pipeline of YAMLF with CIFAR10 dataset. First, let's set the automatic reload for jupyter notebook.

In [1]:
# To automatically reload the functions
%load_ext autoreload
%autoreload 2

Use `pip install yamlf` to install

In [10]:
!pip install yamlf --upgrade

Collecting yamlf
  Downloading yamlf-0.1.4.tar.gz (10 kB)
Building wheels for collected packages: yamlf
  Building wheel for yamlf (setup.py) ... [?25ldone
[?25h  Created wheel for yamlf: filename=yamlf-0.1.4-py3-none-any.whl size=12042 sha256=bacc89f63282fbe46bd5834ba85c637d382f2050c352135437c9364172f0c7b0
  Stored in directory: /home/virk/.cache/pip/wheels/d2/0a/fa/d5e5a5cc8262c9788ece5f64a84cc6251267dcc4d686ba51fb
Successfully built yamlf
Installing collected packages: yamlf
  Attempting uninstall: yamlf
    Found existing installation: yamlf 0.1.3
    Uninstalling yamlf-0.1.3:
      Successfully uninstalled yamlf-0.1.3
Successfully installed yamlf-0.1.4


# Setting Hyperparameters
There are two ways to set hyperparameters:
 1. create a python dict:
 `{batchsize: 64, device: "cuda", chkpt_dir: "chkpts"}`
 2. use default_settings.DefaultSettings class:
        from default_settings import DefaultSettings
        defs = DefaultSettings.init("data", "chkpts")

In [25]:
from yamlf.default_settings import DefaultSettings

defs = DefaultSettings.init()
defs

{'low_storage': False,
 'DN': 'inputs',
 'LN': 'targets',
 'scpc': None,
 'ltoc': None,
 'ctol': None,
 'datadir': PosixPath('data'),
 'chkptdir': PosixPath('chkpts'),
 'chkpt_filename': 'chkpt.pt',
 'wts_filename': 'wts.pt',
 'tbwriter': <torch.utils.tensorboard.writer.SummaryWriter at 0x7f3ac9760370>,
 'num_folds': 5,
 'batchsize': 64,
 'num_workers': 8,
 'epochs': 5,
 'init_epoch': 0,
 'lr': 0.001,
 'moms': (0.95, 0.85),
 'wd': 0.0,
 'dropout': 0.1,
 'train_iters': None,
 'val_iters': None,
 'device': device(type='cuda')}

# Load Data
Next is data loading class and setting dataloaders. This tutorial uses data loading from torchvision CIFAR10 dataset. So, there is no need to define a dataset class.

`yamlf.vision` script contains class and functions for Computer Vision tasks such as Classification, localization, and segmentation.
There are two ways to define dataloaders to pass it to trainer:
 1. create a dict of dataloaders like
 `dls = {
    "train": torch.utils.data.DataLoader(trainset, ...),
    "val": torch.utils.data.DataLoader(valset, ...),
    "test": torch.utils.data.DataLoader(testset, ...)
    }`
 2. use `yamlf.vision.LoadData` class as given below.

In [12]:
from yamlf.vision import *

In [13]:
train_tsfms = tv.transforms.Compose([
    tv.transforms.ToTensor(),
])
test_tsfms  = tv.transforms.Compose([
    tv.transforms.ToTensor(),
])

trainset = tv.datasets.CIFAR10(root='./data', train=True, download=True, transform=train_tsfms)
testset  = tv.datasets.CIFAR10(root='./data', train=False, download=True, transform=test_tsfms)
classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

defs["num_cls"] = len(classes)

Files already downloaded and verified
Files already downloaded and verified


In [14]:
# defs are defaults settings dict that contains batchsize,
# num_workers, etc. parameters
dls = LoadData(trainset, testset, defs=defs)
dls

yamlf.vision.LoadData class
trainset -> num samples: 50000, num batches: 782
valset -> num samples: 10000, num batches: 157

# Model or Network
define a model to train.

`yamlf.utils.net_stats` is a useful function for brief network information

In [15]:
from yamlf.utils import net_stats

In [16]:
# The convention for model is net in this framework. You can also use model if you like.
net = tv.models.resnet18(num_classes=10)
net_stats(net)

NETWORK STATS:
20 convs
20 batchnorms
1 dense
# parameters: 11.182M


# Training and validation
The *Trainer* class is inspired by fast.ai.

Just import the Trainer and provide data, network, loss. Some values such as optimizer and learning rate scheduler has default values.

In [17]:
from yamlf.nn_trainer import Trainer
import torch.nn as nn

Apex recommended for faster mixed precision training: https://github.com/NVIDIA/apex


In [18]:
model = Trainer(dls, net, nn.CrossEntropyLoss(), metrics='acc')

After initializing the Trainer class, we can check if everything works fine by training for few iterations than running the whole epoch of data. It is useful to check if code is working for full training. This can be achieve by setting `model.fit(..., train_iters=5, val_iters=5)` as shown below:

In [19]:
model.fit(2, 1e-3, train_iters=5, val_iters=5)

epoch,train_loss,train_acc,val_loss,val_acc,time
0,2.50695,0.09375,2.32304,0.05625,0.9179391860961914 secs
1,2.4672,0.125,2.34151,0.05625,0.588057279586792 secs


After that, we can run full training as:

In [20]:
model.fit(epochs=5, lr=1e-3)

epoch,train_loss,train_acc,val_loss,val_acc,time
0,1.58919,0.42913,1.73872,0.3964,0:00:45
1,1.15279,0.59407,1.27447,0.57046,0:00:45
2,0.86077,0.69901,0.86003,0.70253,0:00:45
3,0.61575,0.78517,0.71177,0.75836,0:00:45
4,0.41157,0.85556,0.65688,0.78125,0:00:42


If low_stroage is not True in default settings dict then a checkpoint folder will be created which contains tensorboard logs, last epoch training checkpoint.
Also, model weights can be saved manually by calling `model.save_weights()`. Similarly, `model.load_weights()` can be used to load model weights

In [21]:
model.save_weights()

In [22]:
# Start new training and check weights
net = tv.models.resnet18(num_classes=10)
model = Trainer(dls, net, nn.CrossEntropyLoss(), metrics='acc')
model.test_dl = model.val_dl
model.test()

{'acc': 0.10101512738853503}

In [23]:
# Load saved weights and test
model.load_weights()
model.test_dl = model.val_dl
model.test()

{'acc': 0.78125}

# Training visualizations
Loss, metrics, and learning rate during training is accessable using tensorboard as follows

In [33]:
defs["tbwriter"].log_dir

PosixPath('chkpts/tblogs/30Jun2020-04:14:01')

In [36]:
# open the given link (http://localhost:6006/) in browser
!tensorboard --logdir=chkpts/tblogs

TensorFlow installation not found - running with reduced feature set.
Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.2.1 at http://localhost:6006/ (Press CTRL+C to quit)
^C
