# Duqduq demo: large scale validation

This notebook shows how to use [duqtools](https://duqtools.readthedocs.org) to large scale validation.

It will go over the steps required to do uncertainty quantification from a sequence of data sets.

Where `duqtools` does UQ for a single data set, `duqduq` loops over multiple datasets to do UQ in sequence.

We define 2 directories:

- **duqduq directory**, this is where the duqtools and UQ config resides. This is also the directory we work in with duqduq.
- **run directory**, this is a directory where slurm has access and where all the simulation files and data are stored.

In [1]:
from pathlib import Path

duqtools_dir = Path('/afs/eufus.eu/user/g/g2ssmee/duqduq_demo')
duqtools_dir_done = Path('/afs/eufus.eu/user/g/g2ssmee/duqduq_demo_done')
run_dir = Path('/afs/eufus.eu/user/g/g2ssmee/jetto_runs/duqduq_long')

import os

os.chdir(duqtools_dir)

## `duqduq help`

The main interface for duqduq is via the CLI. You can run `duqduq --help` to give a list of available subcommands.

You will notice that the subcommands here mimic what is available in `duqtools`.

In [2]:
!duqduq --help

Usage: duqduq [OPTIONS] COMMAND [ARGS]...

  For more information, check out the documentation:

  https://duqtools.readthedocs.io/large_scale_validation

Options:
  --help  Show this message and exit.

Commands:
  create  Create data sets for large scale validation.
  merge   Merge data sets with error propagation.
  setup   Set up large scale validation.
  status  Check status large scale validation runs.
  submit  Submit large scale validation runs.


## `duqduq setup`

The starting point for `duqduq` is 2 files:

- `duqtools.template.yaml`, this is the template config that `duqduq setup` will use to generate the `duqtools.yaml`
- `data.csv`, each entry in this csv file corresponds to an IMAS data set

Below is an example `data.csv` file. This is how you tell `duqduq` which data to do UQ for.

In [3]:
%cat data.csv

,user,db,shot,run
data_01,g2aho,aug,36982,0002
data_02,g2aho,jet,75225,0002
data_03,g2aho,jet,90350,0002
data_04,g2aho,jet,92432,0002
data_05,g2aho,jet,94875,0001
data_06,g2aho,tcv,64958,0002
data_07,g2aho,west,54568,0001
data_08,g2aho,west,54728,0001
data_09,g2aho,west,55181,0001
data_10,g2aho,west,55525,0001


Below is an example `duqtools.template.yaml`.

The index of each entry in `data.csv` file will be used as the run name (`run.name`).

The details for each entry in `data.csv` will be written to the `template_data` section.

Machine/dataset specific parameters, such as major radius or the start time are grabbed from the IDS. 

For more information, see the [documentation](https://duqtools.readthedocs.io/en/latest/large_scale_validation/) for large scale validation.

In [4]:
%cat duqtools.template.yaml

tag: {{ run.name }}
create:
  runs_dir: /afs/eufus.eu/user/g/g2ssmee/jetto_runs/duqduq_long/{{ run.name }}
  template: /afs/eufus.eu/user/g/g2ssmee/jetto_runs/interpretive_esco02
  template_data:
    user: {{ handle.user }}
    db: {{ handle.db }}
    shot: {{ handle.shot }}
    run: {{ handle.run }}
  operations:
    - variable: major_radius
      operator: copyto
      {# Convert units from IDS (m) to Jetto.jset (cm) -#}
      value: {{ (variables.major_radius * 100) | round(4) }}
    - variable: b_field
      operator: copyto
      value: {{ variables.b_field | round(4) }}
    - variable: t_start
      operator: copyto
      value: {{ variables.t_start | round(4) }}
    - variable: t_end
      operator: copyto
      value: {{ (variables.t_start + 1.0) | round(4) }}
  sampler:
    method: latin-hypercube
    n_samples: 9
  dimensions:
    - variable: zeff
      operator: multiply
      values: [0.8, 1.0, 1.2]
    - variable: t_e
      operator: multiply

Running `duqduq setup` will generate a new directory for each dataset in `data.csv`. Each directory is in itself a valid **duqtools directory**.

In [5]:
!duqduq setup --yes --force


[31m[1mOperations in the Queue:[0m
- [32mSetup run[0m : data_01
- [32mSetup run[0m : data_02
- [32mSetup run[0m : data_03
- [32mSetup run[0m : data_04
- [32mSetup run[0m : data_05
- [32mSetup run[0m : data_06
- [32mSetup run[0m : data_07
- [32mSetup run[0m : data_08
- [32mSetup run[0m : data_09
- [32mSetup run[0m : data_10
[31m[1mApplying Operations[0m
10

  0%|                                                    | 0/10 [00:00<?, ?it/s][A
[32mSetup run[0m : data_10:                            | 0/10 [00:00<?, ?it/s][A
Progress: 100%|████████████████████████████████| 10/10 [00:00<00:00, 618.00it/s]


This is what the directory looks like after setup.

In [6]:
!tree .

.
├── data_01
│   └── duqtools.yaml
├── data_02
│   └── duqtools.yaml
├── data_03
│   └── duqtools.yaml
├── data_04
│   └── duqtools.yaml
├── data_05
│   └── duqtools.yaml
├── data_06
│   └── duqtools.yaml
├── data_07
│   └── duqtools.yaml
├── data_08
│   └── duqtools.yaml
├── data_09
│   └── duqtools.yaml
├── data_10
│   └── duqtools.yaml
├── data.csv
├── duqtools.log
└── duqtools.template.yaml

10 directories, 13 files


It creates a duqtools config in each of the subdirectories. At this stage you could modify each of the `duqtools.yaml` if you wish. The config is no different than for a single UQ run. This means you could do`cd data_01` and treat it as a single UQ run.

In [7]:
%cat data_01/duqtools.yaml

tag: data_01
create:
  runs_dir: /afs/eufus.eu/user/g/g2ssmee/jetto_runs/duqduq_long/data_01
  template: /afs/eufus.eu/user/g/g2ssmee/jetto_runs/interpretive_esco02
  template_data:
    user: g2aho
    db: aug
    shot: 36982
    run: 2
  sampler:
    method: latin-hypercube
    n_samples: 9
  dimensions:
    - variable: zeff
      operator: multiply
      values: [0.8, 1.0, 1.2]
    - variable: t_e
      operator: multiply
      values: [0.8, 1.0, 1.2]
    - variable: major_radius
      operator: copyto
      values: [ 165.0 ]
    - variable: b_field
      operator: copyto
      values: [ -2.5725 ]
    - variable: t_start
      operator: copyto
      values: [ 2.875 ]
    - variable: t_end
      operator: copyto
      values: [ 3.875 ]
system: jetto-v220922

## Create runs using `duqduq create`

This is the equivalent of `duqtools create`, but for a large number of runs.

It will take each of the duqtools configs generated and set up the jetto runs and imas data according to the specification.

Since this will take a long time, we will use the `--dry_run` option.

In [8]:
!duqduq create --force --dry-run


[31m[1mOperations in the Queue:[0m
- [32mCreating run[0m : /gss_efgw_work/work/g2ssmee/jetto/runs/duqduq_long/data_01/run_0000
- [32mCreating run[0m : /gss_efgw_work/work/g2ssmee/jetto/runs/duqduq_long/data_01/run_0001
- [32mCreating run[0m : /gss_efgw_work/work/g2ssmee/jetto/runs/duqduq_long/data_01/run_0002
- [32mCreating run[0m : /gss_efgw_work/work/g2ssmee/jetto/runs/duqduq_long/data_01/run_0003
- [32mCreating run[0m : /gss_efgw_work/work/g2ssmee/jetto/runs/duqduq_long/data_01/run_0004
- [32mCreating run[0m : /gss_efgw_work/work/g2ssmee/jetto/runs/duqduq_long/data_01/run_0005
- [32mCreating run[0m : /gss_efgw_work/work/g2ssmee/jetto/runs/duqduq_long/data_01/run_0006
- [32mCreating run[0m : /gss_efgw_work/work/g2ssmee/jetto/runs/duqduq_long/data_01/run_0007
- [32mCreating run[0m : /gss_efgw_work/work/g2ssmee/jetto/runs/duqduq_long/data_01/run_0008
- [32mCreating run[0m : /gss_efgw_work/work/g2ssmee/jetto/runs/duqduq_long/data_02/run_0000
- [32mCreating run[

## Submit to slurm using `duqduq submit`

Use `duqduq submit` to submit the jobs to slurm. This tool will find all jobs (`.llcmd` files in the subdirectories) and submit them to slurm.

Use the `--array` option to submit the jobs as a slurm array.

In [9]:
os.chdir(duqtools_dir_done)
!duqduq submit --array --max_jobs 10 --force --dry-run


[31m[1mOperations in the Queue:[0m
- [32mAdding to array[0m : Job('/gss_efgw_work/work/g2ssmee/jetto/runs/duqduq_long/data_01/run_0000')
- [32mAdding to array[0m : Job('/gss_efgw_work/work/g2ssmee/jetto/runs/duqduq_long/data_01/run_0001')
- [32mAdding to array[0m : Job('/gss_efgw_work/work/g2ssmee/jetto/runs/duqduq_long/data_01/run_0002')
- [32mAdding to array[0m : Job('/gss_efgw_work/work/g2ssmee/jetto/runs/duqduq_long/data_01/run_0003')
- [32mAdding to array[0m : Job('/gss_efgw_work/work/g2ssmee/jetto/runs/duqduq_long/data_01/run_0004')
- [32mAdding to array[0m : Job('/gss_efgw_work/work/g2ssmee/jetto/runs/duqduq_long/data_01/run_0005')
- [32mAdding to array[0m : Job('/gss_efgw_work/work/g2ssmee/jetto/runs/duqduq_long/data_01/run_0006')
- [32mAdding to array[0m : Job('/gss_efgw_work/work/g2ssmee/jetto/runs/duqduq_long/data_01/run_0007')
- [32mAdding to array[0m : Job('/gss_efgw_work/work/g2ssmee/jetto/runs/duqduq_long/data_01/run_0008')
- [32mAdding to array[0

## `duqduq status`

Query the status using `duqduq status`. This essentially parses all the `jetto.status` files in the run directory.

In [10]:
!duqduq status

Status codes:
[33m_[0m : no status, [32m.[0m : completed, [31mf[0m : failed, [33mr[0m : running, [33ms[0m : submitted, [33mu[0m : unknown

data_01 (data_01): [32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m
data_02 (data_02): [32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m
data_03 (data_03): [32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m
data_04 (data_04): [32m.[0m[32m.[0m[32m.[0m[31mf[0m[32m.[0m[32m.[0m[31mf[0m[32m.[0m[31mf[0m
data_05 (data_05): [31mf[0m[31mf[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[31mf[0m[32m.[0m[32m.[0m
data_07 (data_07): [32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m
data_08 (data_08): [32m.[0m[31mf[0m[32m.[0m[32m.[0m[31mf[0m[32m.[0m[32m.[0m[31mf[0m[32m.[0m
data_09 (data_09): [32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m

## Overview of LSV output directory

The output of `duqduq` differs from a single run in that there is an additional directory layer with the name of the data entry. The `logs` directory contains the slurm logs.

In [11]:
os.chdir(run_dir)
!tree -L 1

.
├── data_01
├── data_02
├── data_03
├── data_04
├── data_05
├── data_07
├── data_08
├── data_09
├── data_10
└── logs

10 directories, 0 files


Each directory is a run directory as you know it from a single UQ run.

In [12]:
!tree 'data_01' -L 1

data_01
├── duqtools.yaml
├── imasdb
├── run_0000
├── run_0001
├── run_0002
├── run_0003
├── run_0004
├── run_0005
├── run_0006
├── run_0007
├── run_0008
└── runs.yaml

10 directories, 2 files


## Merge data using `duqduq merge`.

In [13]:
os.chdir(duqtools_dir_done)
!duqduq merge --force --dry-run


[31m[1mOperations in the Queue:[0m
- [34mMerging all known variables[0m
- [34mdata_01[0m : Merging 9 datasets
- [34mTemplate for merge[0m : /gss_efgw_work/work/g2ssmee/jetto/runs/duqduq_long/data_01/run_0000/imasdb/aug/36982/2
- [32mMerging to[0m : /afs/eufus.eu/user/g/g2ssmee/jetto_runs/duqduq_long/data_01/imasdb/aug/36982/2
- [34mdata_02[0m : Merging 9 datasets
- [34mTemplate for merge[0m : /gss_efgw_work/work/g2ssmee/jetto/runs/duqduq_long/data_02/run_0000/imasdb/jet/75225/2
- [32mMerging to[0m : /afs/eufus.eu/user/g/g2ssmee/jetto_runs/duqduq_long/data_02/imasdb/jet/75225/2
- [34mdata_03[0m : Merging 9 datasets
- [34mTemplate for merge[0m : /gss_efgw_work/work/g2ssmee/jetto/runs/duqduq_long/data_03/run_0000/imasdb/jet/90350/2
- [32mMerging to[0m : /afs/eufus.eu/user/g/g2ssmee/jetto_runs/duqduq_long/data_03/imasdb/jet/90350/2
- [34mdata_04[0m : Merging 6 datasets
- [34mTemplate for merge[0m : /gss_efgw_work/work/g2ssmee/jetto/runs/duqduq_long/

Merged data are stored in in a local imasdb for each data entry in the run directory.

In [14]:
os.chdir(run_dir)
!tree 'data_01/imasdb'

data_01/imasdb
└── aug
    └── 3
        └── 0
            ├── ids_369820002.characteristics
            ├── ids_369820002.datafile
            └── ids_369820002.tree

3 directories, 3 files


## Data exploration with `duqtools dash`

The imas handles for each merged data set are stored in `merge_data.csv`. They can be visualized using the duqtools dashboard.


In [15]:
os.chdir(duqtools_dir_done)
!duqtools dash

[0m
[34m[1m  You can now view your Streamlit app in your browser.[0m
[0m
[34m  Local URL: [0m[1mhttp://localhost:8501[0m
[34m  Network URL: [0m[1mhttp://130.186.25.54:8501[0m
[0m
^C
[34m  Stopping...[0m
