# 3. Data client

This library extends the client defined in `jarvis-md`.

In [1]:
from __init__ import set_path

set_path()

In [16]:
import pandas as pd

from tfcaidm import Jobs
from tfcaidm import JClient
from tfcaidm import Dataset

## Setup

To initialize the client, a set of hyperparameters need to be passed in.

In [3]:
YAML_PATH = "/home/brandon/tfcaidm-pkg/configs/ymls/xr_pna/pipeline.yml"

In [4]:
# --- Get hyperparameters
runs = Jobs(path=YAML_PATH)

# --- Hyperparameters for N runs
all_hyperparams = runs.get_params()

# ---- Hyperparameters for run #1
hyperparams = all_hyperparams[0]

The `jarvis client` is used as an interface to locate and preprocess a dataset. It's path is also stored in the hyperparams dict as a path.

More information on the `jarvis client` can be viewed [here](https://github.com/peterchang77/dl_tutor/blob/master/jarvis/configs/client/client-use.ipynb).

### Jarvis client

In [11]:
client_path = runs.config["env"]["path"]["client"]
runs.load_yaml(client_path)

{'_id': {'project': 'xr/pna', 'version': None},
 '_db': '/data/ymls/db-sum-pub-01k-512.yml',
 'batch': {'fold': 0,
  'size': 8,
  'sampling': {'cohort-neg': 0.5, 'cohort-pna': 0.5}},
 'specs': {'xs': {'dat': {'dtype': 'float32',
    'loads': 'dat-512',
    'norms': {'shift': '@mean', 'scale': '@std'},
    'rands': {'shift': {'lower': 0.9, 'upper': 1.1},
     'scale': {'lower': 0.9, 'upper': 1.1}},
    'shape': [1, 512, 512, 1]},
   'msk': {'dtype': 'float32',
    'loads': 'lng-512',
    'norms': None,
    'shape': [1, 512, 512, 1]}},
  'ys': {'pna': {'dtype': 'uint8',
    'loads': 'pna-512',
    'norms': None,
    'shape': [1, 512, 512, 1]}},
  'load_kwargs': {'verbose': False}}}

Most importantly, the client uses a combination of the `_id` and `_db` field to search for the actual dataset. The client is a high-level outputs for data management and loading, not the actual location of the data!

---

<strong>tfcaidm</strong> uses an inherited version of the jarvis client which uses all the features of the original jarvis client plus extra features.

In [21]:
# --- Create a jarvis client object
path = hyperparams["env/path/client"]
jclient = JClient(path, hyperparams=hyperparams)

A slightly less verbose alternative for `JClient` is `Dataset`, which handles all of the hyperparameter parsing requirements.

In [22]:
jclient = Dataset(hyperparams).get_client(fold=0)

## Inspecting

Several attributes and methods are provided by jclient which include hyperparameter settings and dataset information and statistics.

### Hyperparameters

In [23]:
jclient.hyperparams

{'env': {'path': {'root': 'exp',
   'name': 'xr_pna',
   'client': '/home/brandon/tfcaidm-pkg/configs/ymls/xr_pna/client.yml'}},
 'model': {'model': 'unet',
  'conv_type': 'conv',
  'pool_type': 'conv',
  'eblock': 'conv',
  'elayer': 1,
  'dblock': 'conv',
  'depth': 4,
  'width': 32,
  'width_scaling': 1,
  'kernel_size': [3, 3, 3],
  'strides': [2, 2, 2],
  'bneck': 2,
  'branches': 4,
  'atrous_rate': 6,
  'order': 'rnc',
  'norm': 'bnorm',
  'activ': 'leaky',
  'attn_msk': 'softmax'},
 'train': {'xs': {'dat': None},
  'ys': {'pna': {'mask_id': 'msk',
    'remove_bg': True,
    'mask_weight': 1,
    'output_weight': 5,
    'head': 'decoder_classifier',
    'n_classes': 2,
    'loss': 'sce',
    'metric': 'dice'}},
  'trainer': {'seed': 0,
   'n_folds': 1,
   'batch_size': 8,
   'iters': 3000,
   'steps': 100,
   'valid_freq': 5,
   'lr': 8e-05,
   'lr_alpha': 0.25,
   'lr_decay': 0.97,
   'callbacks': ['checkpoint', 'lr_scheduler', 'tensorboard']}}}

### Dataset information

#### DataFrame

The client object has some built in features to view dataset statistics in the form of a pandas dataframe and python dictionaries.

In [24]:
jclient.db.header

Unnamed: 0_level_0,class,area,cohort-neg,cohort-pna,valid
sid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
000924cf-0f8d-42bd-9158-1af53881a557,0,,True,False,2
0010f549-b242-4e94-87a8-57d79de215fc,0,,True,False,3
0022995a-45eb-4cfa-9a59-cd15f5196c64,0,,True,False,4
0025d2de-bd78-4d36-9f72-e15a5e22ca82,0,,True,False,1
00293de0-a530-41dc-9621-0b3def01d06d,0,,True,False,0
...,...,...,...,...,...
160786ef-dd0f-4c51-8268-b6faa3cfe59b,2,,False,True,0
160d4148-6c88-47fb-ad49-b4965eb8a931,2,,False,True,0
16105557-552b-498e-af02-9f0285876567,2,,False,True,2
1614564a-fe0d-43dc-87ae-299d859959f6,2,,False,True,0


In [25]:
jclient.db.fnames

Unnamed: 0_level_0,dat-512,lng-512,pna-512,box-512
sid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
000924cf-0f8d-42bd-9158-1af53881a557,/proc/512/000924cf-0f8d-42bd-9158-1af53881a557...,/proc/512/000924cf-0f8d-42bd-9158-1af53881a557...,/proc/512/000924cf-0f8d-42bd-9158-1af53881a557...,/proc/512/000924cf-0f8d-42bd-9158-1af53881a557...
0010f549-b242-4e94-87a8-57d79de215fc,/proc/512/0010f549-b242-4e94-87a8-57d79de215fc...,/proc/512/0010f549-b242-4e94-87a8-57d79de215fc...,/proc/512/0010f549-b242-4e94-87a8-57d79de215fc...,/proc/512/0010f549-b242-4e94-87a8-57d79de215fc...
0022995a-45eb-4cfa-9a59-cd15f5196c64,/proc/512/0022995a-45eb-4cfa-9a59-cd15f5196c64...,/proc/512/0022995a-45eb-4cfa-9a59-cd15f5196c64...,/proc/512/0022995a-45eb-4cfa-9a59-cd15f5196c64...,/proc/512/0022995a-45eb-4cfa-9a59-cd15f5196c64...
0025d2de-bd78-4d36-9f72-e15a5e22ca82,/proc/512/0025d2de-bd78-4d36-9f72-e15a5e22ca82...,/proc/512/0025d2de-bd78-4d36-9f72-e15a5e22ca82...,/proc/512/0025d2de-bd78-4d36-9f72-e15a5e22ca82...,/proc/512/0025d2de-bd78-4d36-9f72-e15a5e22ca82...
00293de0-a530-41dc-9621-0b3def01d06d,/proc/512/00293de0-a530-41dc-9621-0b3def01d06d...,/proc/512/00293de0-a530-41dc-9621-0b3def01d06d...,/proc/512/00293de0-a530-41dc-9621-0b3def01d06d...,/proc/512/00293de0-a530-41dc-9621-0b3def01d06d...
...,...,...,...,...
160786ef-dd0f-4c51-8268-b6faa3cfe59b,/proc/512/160786ef-dd0f-4c51-8268-b6faa3cfe59b...,/proc/512/160786ef-dd0f-4c51-8268-b6faa3cfe59b...,/proc/512/160786ef-dd0f-4c51-8268-b6faa3cfe59b...,/proc/512/160786ef-dd0f-4c51-8268-b6faa3cfe59b...
160d4148-6c88-47fb-ad49-b4965eb8a931,/proc/512/160d4148-6c88-47fb-ad49-b4965eb8a931...,/proc/512/160d4148-6c88-47fb-ad49-b4965eb8a931...,/proc/512/160d4148-6c88-47fb-ad49-b4965eb8a931...,/proc/512/160d4148-6c88-47fb-ad49-b4965eb8a931...
16105557-552b-498e-af02-9f0285876567,/proc/512/16105557-552b-498e-af02-9f0285876567...,/proc/512/16105557-552b-498e-af02-9f0285876567...,/proc/512/16105557-552b-498e-af02-9f0285876567...,/proc/512/16105557-552b-498e-af02-9f0285876567...
1614564a-fe0d-43dc-87ae-299d859959f6,/proc/512/1614564a-fe0d-43dc-87ae-299d859959f6...,/proc/512/1614564a-fe0d-43dc-87ae-299d859959f6...,/proc/512/1614564a-fe0d-43dc-87ae-299d859959f6...,/proc/512/1614564a-fe0d-43dc-87ae-299d859959f6...


#### Dictionaries

Some additional features include viewing model input and output shapes as well as the dataset size. These methods are based on the `db.header` attribute, so for additional use-cases refer to `db.header`.

In [26]:
xs = jclient.get_input_shapes()
ys = jclient.get_output_shapes()

In [27]:
xs, ys

({'dat': [1, 512, 512, 1]}, {'pna': [1, 512, 512, 1], 'msk': [1, 512, 512, 1]})

The `fold` arg can take on values {-1, 0, 1, 2, 3, 4} where -1 specifies training over the entire dataset.

In [28]:
jclient.dataset_size(fold=0)

{'train': '800', 'valid': '200'}

## Data access

The dataset is accessible in the form of python generators using the `create_generators` method.

In [29]:
gen_train, gen_valid = jclient.create_generators(test=True)

In [30]:
for i, (xs, ys) in enumerate(gen_train):
    if i == 10:
        break

[ 2021-11-19 13:46:48 ] [>...................] 1.375% : Iterating | 000011      

In [31]:
for k in xs:
    print(k, xs[k].shape)

dat (1, 1, 512, 512, 1)
msk (1, 1, 512, 512, 1)
pna (1, 1, 512, 512, 1)


In [32]:
for k in ys:
    print(k, ys[k].shape)