# 2. Creating Jobs

In [2]:
import pandas as pd

import tfcaidm
from tfcaidm import Jobs

## Setup

Recall that each configuration defines a set of runs or experiments. Thus after invoking a jobs creation `n` number of runs are created and queued onto the cluster or your local workstation.

In [3]:
YAML_PATH = "/home/brandon/tfcaidm-pkg/configs/ymls/xr_pna/pipeline.yml"

In [4]:
%%time

# --- Get hyperparameters
runs = Jobs(path=YAML_PATH)

# --- Hyperparameters for N runs
all_hyperparams = runs.get_params()

# --- Hyperparameters dataframe for visualization
df_hyperparams = pd.DataFrame(all_hyperparams)

CPU times: user 25.7 ms, sys: 4.91 ms, total: 30.6 ms
Wall time: 27.5 ms


In [5]:
df_hyperparams

Unnamed: 0,env/path/root,env/path/name,env/path/client,model/model,model/conv_type,model/pool_type,model/eblock,model/elayer,model/dblock,model/depth,...,train/trainer/seed,train/trainer/n_folds,train/trainer/batch_size,train/trainer/iters,train/trainer/steps,train/trainer/valid_freq,train/trainer/lr,train/trainer/lr_alpha,train/trainer/lr_decay,train/trainer/callbacks
0,exp,xr_pna,/home/brandon/tfcaidm-pkg/configs/ymls/xr_pna/...,unet,conv,conv,conv,1,conv,4,...,0,1,8,3000,100,5,8e-05,0.25,0.97,"[checkpoint, lr_scheduler, tensorboard]"


In [6]:
# ---- Hyperparameters for run #1
hyperparams = all_hyperparams[0]

### Dict tools

In [7]:
from tfcaidm.jobs import params

In [8]:
hp = params.HyperParameters(hyperparams)

Useful object to convert between

* `csv` (flattened / unnested)
* `dict` (unflattened / nested) 

formats to allow for easy integration between storing results and utilizing different config variables.

In [9]:
hp.flatten(hyperparams)

{'env/path/root': 'exp',
 'env/path/name': 'xr_pna',
 'env/path/client': '/home/brandon/tfcaidm-pkg/configs/ymls/xr_pna/client.yml',
 'model/model': 'unet',
 'model/conv_type': 'conv',
 'model/pool_type': 'conv',
 'model/eblock': 'conv',
 'model/elayer': 1,
 'model/dblock': 'conv',
 'model/depth': 4,
 'model/width': 32,
 'model/width_scaling': 1,
 'model/kernel_size': [3, 3, 3],
 'model/strides': [2, 2, 2],
 'model/bneck': 2,
 'model/branches': 4,
 'model/atrous_rate': 6,
 'model/order': 'rnc',
 'model/norm': 'bnorm',
 'model/activ': 'leaky',
 'model/attn_msk': 'softmax',
 'train/xs/dat': None,
 'train/ys/pna/mask_id': 'msk',
 'train/ys/pna/remove_bg': True,
 'train/ys/pna/mask_weight': 1,
 'train/ys/pna/output_weight': 5,
 'train/ys/pna/head': 'decoder_classifier',
 'train/ys/pna/n_classes': 2,
 'train/ys/pna/loss': 'sce',
 'train/ys/pna/metric': 'dice',
 'train/trainer/seed': 0,
 'train/trainer/n_folds': 1,
 'train/trainer/batch_size': 8,
 'train/trainer/iters': 3000,
 'train/trainer

In [10]:
hp.unflatten(hyperparams)

{'env': {'path': {'root': 'exp',
   'name': 'xr_pna',
   'client': '/home/brandon/tfcaidm-pkg/configs/ymls/xr_pna/client.yml'}},
 'model': {'model': 'unet',
  'conv_type': 'conv',
  'pool_type': 'conv',
  'eblock': 'conv',
  'elayer': 1,
  'dblock': 'conv',
  'depth': 4,
  'width': 32,
  'width_scaling': 1,
  'kernel_size': [3, 3, 3],
  'strides': [2, 2, 2],
  'bneck': 2,
  'branches': 4,
  'atrous_rate': 6,
  'order': 'rnc',
  'norm': 'bnorm',
  'activ': 'leaky',
  'attn_msk': 'softmax'},
 'train': {'xs': {'dat': None},
  'ys': {'pna': {'mask_id': 'msk',
    'remove_bg': True,
    'mask_weight': 1,
    'output_weight': 5,
    'head': 'decoder_classifier',
    'n_classes': 2,
    'loss': 'sce',
    'metric': 'dice'}},
  'trainer': {'seed': 0,
   'n_folds': 1,
   'batch_size': 8,
   'iters': 3000,
   'steps': 100,
   'valid_freq': 5,
   'lr': 8e-05,
   'lr_alpha': 0.25,
   'lr_decay': 0.97,
   'callbacks': ['checkpoint', 'lr_scheduler', 'tensorboard']}}}

## Submitting training jobs

The sole purpose of creating jobs is to enable training of various deep learning models at scale. They can be created to run either locally or on the caidm compute clusters, both examples are shown below.

<strong> The main thing to take note of is that files for job submission must be separate from the actual code running the training loop. <strong>

### Setup

```python
"""Training setup

Args:
    producer (__file__): must be set to __file__
    consumer (string): file path in the same dir as producer
    root (string): base dir of experiments
    name (string): name of experiment
    libraries (list of tuples (lib, version)): optional libs to pip install
"""

runs.setup(
    producer=__file__,
    consumer="__init__.py",
    root="nb",
    name="xr_pna",
    libraries=[],
)

```

### Local training

```python
"""Local training

Args:
    run (bool): flag to start training
"""

runs.train_local(run=False)

```

### Cluster training

```python
"""Cluster training

Args:
    gpu (string): name of gpu to run, based on regex matching
    num_gpus (int): number of gpus to use for training
    run (bool): flag to start training
"""

runs.train_cluster(gpu="titan|rtx", num_gpus=1, run=False)

```