# Create ISA-API Investigation from Datascriptor Study Design configuration

In this notebook I will show you how you can use a study design configuration is JSON format as produce by datascriptor (https://gitlab.com/datascriptor/datascriptor) to generate a single-study ISA investigation and how you can then serialise it in JSON and tabular (i.e. CSV) format.

Or study design configuration consists of:
[missing]

## 1. Setup

Let's import all the required libraries

In [1]:
from time import time
import os
import json

## ISA-API related imports
from isatools.model import Investigation, Study

## ISA-API create mode related imports
from isatools.create.models import StudyDesign
from isatools.create.connectors import generate_study_design_from_config

# serializer from ISA Investigation to JSON
from isatools.isajson import ISAJSONEncoder

# ISA-Tab serialisation
from isatools import isatab

LOG: <Logger isatools (DEBUG)>
/Users/philippe/.pyenv/versions/3.8.0/envs/isa-api-py38/src/isatools/isatools/net/resources/saxon9/saxon9he.jar


In [2]:
## ISA-API create mode related imports
from isatools.create import models
from isatools import isajson

## 2. Load the Study Design JSON configuration

First of all we load the study design configurator

In [3]:
with open(os.path.abspath(os.path.join(
    "ds-study-design-config", "study-design-3-repeated-treatments-datascriptor.json"
)), "r") as config_file:
    study_design_config = json.load(config_file)
study_design_config

{'treatmentPlan': {'screen': {'name': 'screen',
   'duration': None,
   'durationUnit': ''},
  'runIn': {'name': 'run-in', 'duration': None, 'durationUnit': ''},
  'washout': {'name': 'washout', 'duration': '5', 'durationUnit': 'days'},
  'followUp': {'name': 'follow-up', 'duration': 60, 'durationUnit': 'days'},
  'treatments': [{'agent': 'ibuprofen',
    'intensity': 8,
    'intensityUnit': 'mg',
    'duration': 10,
    'durationUnit': 'days'},
   {'agent': 'aspirin',
    'intensity': 5,
    'intensityUnit': 'mg',
    'duration': 10,
    'durationUnit': 'days'},
   {'agent': 'paracetamol',
    'intensity': 10,
    'intensityUnit': 'mg',
    'duration': 10,
    'durationUnit': 'days'}],
  'elementParams': {'agents': [],
   'intensities': [],
   'intensityUnit': '',
   'durations': [],
   'durationUnit': ''}},
 'generatedStudyDesign': {'name': 'test study',
  'description': 'this is the verbose description',
  'type': {'term': 'crossover design',
   'id': 'OBI:0500003',
   'iri': 'http:

## 3. Generate the ISA Study Design from the JSON configuration
To perform the conversion we just need to use the function `generate_isa_study_design_from_config` (name possibly subject to change, should we drop the "isa" and "datascriptor" qualifiers?)

In [4]:
study_design = generate_study_design_from_config(study_design_config)
assert isinstance(study_design, StudyDesign)

## 4. Generate the ISA Study from the StudyDesign and embed it into an ISA Investigation

The `StudyDesign.generate_isa_study()` method returns the complete ISA-API `Study` object.

In [5]:
start = time()
study = study_design.generate_isa_study()
end = time()
print('The generation of the study design took {:.2f} s.'.format(end - start))
assert isinstance(study, Study)
investigation = Investigation(studies=[study])

The generation of the study design took 0.24 s.


## 5. Serialize and save the JSON representation of the generated ISA Investigation

In [6]:
start = time()
inv_json = json.dumps(investigation, cls=ISAJSONEncoder, sort_keys=True, indent=4, separators=(',', ': '))
end = time()
print('The JSON serialisation of the ISA investigation took {:.2f} s.'.format(end - start))

The JSON serialisation of the ISA investigation took 0.14 s.


In [7]:
directory = os.path.abspath(os.path.join('output'))
if not os.path.exists(directory):
    os.makedirs(directory)
with open(os.path.abspath(os.path.join('output','isa-investigation-2-arms-nmr-ms.json')), 'w') as out_fp:
    json.dump(json.loads(inv_json), out_fp)

## 6. Dump the ISA Investigation to ISA-Tab

In [8]:
start = time()
isatab.dump(investigation, os.path.abspath(os.path.join('output')))
end = time()
print('The Tab serialisation of the ISA investigation took {:.2f} s.'.format(end - start))

The Tab serialisation of the ISA investigation took 3.17 s.


In [9]:
#%debug

To use them on the notebook we can also dump the tables to pandas DataFrames, using the `dump_tables_to_dataframes` function rather than dump

In [10]:
dataframes = isatab.dump_tables_to_dataframes(investigation)

In [11]:
len(dataframes)

17

## 7. Check the correctness of the ISA-Tab DataFrames 

We have 1 study file and 5 assay files:
* 1 assay file for Mass. Spec. (treatmennt arm, third epoch: surgery)
* 2 assay files for Mass. Spec. (both arms, fourth epoch: follow-up)
* 2 assay files for NMR (both arms, fourth epoch: follow-up)

In [12]:
for key in dataframes.keys():
    display(key)

's_study_01.txt'

'a_A0E2_ASSAY0_metabolite-profiling_mass-spectrometry.txt'

'a_A1E2_ASSAY0_metabolite-profiling_mass-spectrometry.txt'

'a_A1E0_ASSAY0_metabolite-profiling_mass-spectrometry.txt'

'a_A0E0_ASSAY0_metabolite-profiling_mass-spectrometry.txt'

'a_A3E4_ASSAY0_metabolite-profiling_mass-spectrometry.txt'

'a_A2E5_ASSAY0_metabolite-profiling_mass-spectrometry.txt'

'a_A2E4_ASSAY0_metabolite-profiling_mass-spectrometry.txt'

'a_A3E5_ASSAY0_metabolite-profiling_mass-spectrometry.txt'

'a_A3E2_ASSAY0_metabolite-profiling_mass-spectrometry.txt'

'a_A2E2_ASSAY0_metabolite-profiling_mass-spectrometry.txt'

'a_A2E0_ASSAY0_metabolite-profiling_mass-spectrometry.txt'

'a_A3E0_ASSAY0_metabolite-profiling_mass-spectrometry.txt'

'a_A0E4_ASSAY0_metabolite-profiling_mass-spectrometry.txt'

'a_A1E5_ASSAY0_metabolite-profiling_mass-spectrometry.txt'

'a_A1E4_ASSAY0_metabolite-profiling_mass-spectrometry.txt'

'a_A0E5_ASSAY0_metabolite-profiling_mass-spectrometry.txt'

We have 8 subjects in the control arm and 24 samples have been collected (3 blood samples per subject during the follow-up epoch)

We have 10 subjects in the control arm and 40 samples have been collected (3 blood samples per subject during the follow-up epoch and 1 liver sample per subject during the surgery epoch)

In [13]:
study_frame = dataframes['s_study_01.txt']
count_control_samples = len(study_frame[study_frame['Source Name'].apply(lambda el: 'control' in el)])
count_treatment_samples = len(study_frame[study_frame['Source Name'].apply(lambda el: 'treatment' in el)])
print("There are {} samples in the control arm (i.e. group)".format(count_control_samples))
print("There are {} samples in the treatment arm (i.e. group)".format(count_treatment_samples))

There are 0 samples in the control arm (i.e. group)
There are 0 samples in the treatment arm (i.e. group)


Each control samples is fractioned and 2 labelled extracts are produced (i.e. biological replicates).

```
[
    "labelling",
    {
        "#replicates": 2
    }
]
```

There are 2 possible combinations of Mass Spec Assay specified in our configuration template and 2 techincal replicate are produced for each combinations:

```
[
    "mass spectrometry",
    {
        "#replicates": 2,
        "instrument": [
            "Agilent QTQF 6510"
        ],
        "injection_mode": [
            "FIA",
            "LC"
        ],
        "acquisition_mode": [
            "positive mode"
        ]
    }
]
```

Two output raw spectral files are produced as the result of reach run

```
[
    "raw spectral data file",
    [
        {
            "node_type": "data file",
            "size": 2,
            "is_input_to_next_protocols": false
        }
    ]
]
```

As a total we expect

$$ N_{rows} = (N_{subjects} \times N_{samples}) \times (N_{biorepl} \times N_{combinations} \times N_{techrepl}) = (8 \times 3) \times (2 \times 2 \times 2 \times 2) = 24 \times 16 = 384 $$

Which we can verify for the mass spectrometry assay file of the control group.

In [15]:
dataframes['a_A0E2_ASSAY0_metabolite-profiling_mass-spectrometry.txt']

Unnamed: 0,Sample Name,Protocol REF,Performer,Extract Name,Characteristics[extract type],Protocol REF.1,Performer.1,Labeled Extract Name,Label,Protocol REF.2,Parameter Value[instrument],Parameter Value[injection_mode],Parameter Value[acquisition_mode],MS Assay Name,Performer.2,Raw Spectral Data File
0,GRP0_SBJ1_A0E2_SMP-saliva-1,extraction,Ellipsis,EXTR_A0E2_ASSAY0-3-1,polar fraction,labeling,Ellipsis,LBLEXTR_A0E2_ASSAY0-3-3,biotin,mass spectrometry,Agilent QTQF 6510,LC,negative mode,mass-spectrometry_A0E2_ASSAY0-3-8,Ellipsis,raw-spectral-data-file_A0E2_ASSAY0-3-9
1,GRP0_SBJ1_A0E2_SMP-saliva-1,extraction,Ellipsis,EXTR_A0E2_ASSAY0-3-1,polar fraction,labeling,Ellipsis,LBLEXTR_A0E2_ASSAY0-3-3,biotin,mass spectrometry,Agilent QTQF 6510,LC,positive mode,mass-spectrometry_A0E2_ASSAY0-3-4,Ellipsis,raw-spectral-data-file_A0E2_ASSAY0-3-5
2,GRP0_SBJ1_A0E2_SMP-saliva-1,extraction,Ellipsis,EXTR_A0E2_ASSAY0-3-1,polar fraction,labeling,Ellipsis,LBLEXTR_A0E2_ASSAY0-3-3,biotin,mass spectrometry,Agilent QTQF 6510,LC,negative mode,mass-spectrometry_A0E2_ASSAY0-3-10,Ellipsis,raw-spectral-data-file_A0E2_ASSAY0-3-11
3,GRP0_SBJ1_A0E2_SMP-saliva-1,extraction,Ellipsis,EXTR_A0E2_ASSAY0-3-1,polar fraction,labeling,Ellipsis,LBLEXTR_A0E2_ASSAY0-3-3,biotin,mass spectrometry,Agilent QTQF 6510,LC,positive mode,mass-spectrometry_A0E2_ASSAY0-3-6,Ellipsis,raw-spectral-data-file_A0E2_ASSAY0-3-7
4,GRP0_SBJ2_A0E2_SMP-saliva-1,extraction,Ellipsis,EXTR_A0E2_ASSAY0-4-1,polar fraction,labeling,Ellipsis,LBLEXTR_A0E2_ASSAY0-4-3,biotin,mass spectrometry,Agilent QTQF 6510,LC,negative mode,mass-spectrometry_A0E2_ASSAY0-4-8,Ellipsis,raw-spectral-data-file_A0E2_ASSAY0-4-9
5,GRP0_SBJ2_A0E2_SMP-saliva-1,extraction,Ellipsis,EXTR_A0E2_ASSAY0-4-1,polar fraction,labeling,Ellipsis,LBLEXTR_A0E2_ASSAY0-4-3,biotin,mass spectrometry,Agilent QTQF 6510,LC,positive mode,mass-spectrometry_A0E2_ASSAY0-4-6,Ellipsis,raw-spectral-data-file_A0E2_ASSAY0-4-7
6,GRP0_SBJ2_A0E2_SMP-saliva-1,extraction,Ellipsis,EXTR_A0E2_ASSAY0-4-1,polar fraction,labeling,Ellipsis,LBLEXTR_A0E2_ASSAY0-4-3,biotin,mass spectrometry,Agilent QTQF 6510,LC,positive mode,mass-spectrometry_A0E2_ASSAY0-4-4,Ellipsis,raw-spectral-data-file_A0E2_ASSAY0-4-5
7,GRP0_SBJ2_A0E2_SMP-saliva-1,extraction,Ellipsis,EXTR_A0E2_ASSAY0-4-1,polar fraction,labeling,Ellipsis,LBLEXTR_A0E2_ASSAY0-4-3,biotin,mass spectrometry,Agilent QTQF 6510,LC,negative mode,mass-spectrometry_A0E2_ASSAY0-4-10,Ellipsis,raw-spectral-data-file_A0E2_ASSAY0-4-11
8,GRP0_SBJ3_A0E2_SMP-saliva-1,extraction,Ellipsis,EXTR_A0E2_ASSAY0-2-1,polar fraction,labeling,Ellipsis,LBLEXTR_A0E2_ASSAY0-2-3,biotin,mass spectrometry,Agilent QTQF 6510,LC,positive mode,mass-spectrometry_A0E2_ASSAY0-2-6,Ellipsis,raw-spectral-data-file_A0E2_ASSAY0-2-7
9,GRP0_SBJ3_A0E2_SMP-saliva-1,extraction,Ellipsis,EXTR_A0E2_ASSAY0-2-1,polar fraction,labeling,Ellipsis,LBLEXTR_A0E2_ASSAY0-2-3,biotin,mass spectrometry,Agilent QTQF 6510,LC,negative mode,mass-spectrometry_A0E2_ASSAY0-2-8,Ellipsis,raw-spectral-data-file_A0E2_ASSAY0-2-9
