# Create ISA-API Investigation from Datascriptor Study Design configuration

In this notebook I will show you how you can use a study design configuration is JSON format as produce by datascriptor (https://gitlab.com/datascriptor/datascriptor) to generate a single-study ISA investigation and how you can then serialise it in JSON and tabular (i.e. CSV) format.

Or study design configuration consists of:
[missing]

## 1. Setup

Let's import all the required libraries

In [1]:
from time import time
import os
import json

## ISA-API related imports
from isatools.model import Investigation, Study

## ISA-API create mode related imports
from isatools.create.models import StudyDesign
from isatools.create.connectors import generate_study_design_from_config

# serializer from ISA Investigation to JSON
from isatools.isajson import ISAJSONEncoder

# ISA-Tab serialisation
from isatools import isatab

log_level: error


  if term_source_value is not '':
  if term_accession_value is not '':
  if unit_term_source_value is not '':
  if term_accession_value is not '':


LOG: <Logger isatools (DEBUG)>
/Users/massi/.pyenv/versions/3.8.2/envs/isa-create-notebook-3.8.2/src/isatools/isatools/net/resources/saxon9/saxon9he.jar
[{'measurement type': 'metabolite profiling', 'technology type': 'NMR spectroscopy', 'protocol type': 'NMR spectroscopy', 'parameter-like file': 'Acquisition Parameter File', 'node name': 'NMR Assay Name', 'key in json_configuration': 'nmr_raw_date_file', 'raw data file': 'Free Induction Decay Data File', 'derived data file': ['Derived Spectral Data File', 'Metabolite Assignment File']}, {'measurement type': 'targeted metabolite profiling', 'technology type': 'NMR spectroscopy', 'protocol type': 'NMR spectroscopy', 'parameter-like file': None, 'node name': 'NMR Assay Name', 'key in json_configuration': 'nmr_raw_date_file', 'raw data file': 'Free Induction Decay Data File', 'derived data file': ['Derived Spectral Data File', 'Metabolite Assignment File']}, {'measurement type': 'untargeted metabolite profiling', 'technology type': 'NMR s

In [2]:
## ISA-API create mode related imports
from isatools.create import models
from isatools import isajson

## 2. Load the Study Design JSON configuration

First of all we load the study design configurator

In [3]:
with open(os.path.abspath(os.path.join(
    "config", "study-design-3-repeated-treatments-datascriptor.json"
)), "r") as config_file:
    study_design_config = json.load(config_file)
study_design_config

{'treatmentPlan': {'screen': {'name': 'screen',
   'duration': None,
   'durationUnit': ''},
  'runIn': {'name': 'run-in', 'duration': None, 'durationUnit': ''},
  'washout': {'name': 'washout', 'duration': '5', 'durationUnit': 'days'},
  'followUp': {'name': 'follow-up', 'duration': 60, 'durationUnit': 'days'},
  'treatments': [{'agent': 'ibuprofen',
    'intensity': 8,
    'intensityUnit': 'mg',
    'duration': 10,
    'durationUnit': 'days'},
   {'agent': 'hydroxy',
    'intensity': 5,
    'intensityUnit': 'mg',
    'duration': 10,
    'durationUnit': 'days'},
   {'agent': 'paracetamol',
    'intensity': 10,
    'intensityUnit': 'mg',
    'duration': 10,
    'durationUnit': 'days'}],
  'elementParams': {'agents': [],
   'intensities': [],
   'intensityUnit': '',
   'durations': [],
   'durationUnit': ''}},
 'generatedStudyDesign': {'name': 'test study',
  'description': 'this is the verbose description',
  'type': {'term': 'crossover design',
   'id': 'OBI:0500003',
   'iri': 'http:

## 3. Generate the ISA Study Design from the JSON configuration
To perform the conversion we just need to use the function `generate_isa_study_design_from_config` (name possibly subject to change, should we drop the "isa" and "datascriptor" qualifiers?)

In [4]:
study_design = generate_study_design_from_config(study_design_config)
assert isinstance(study_design, StudyDesign)

## 4. Generate the ISA Study from the StudyDesign and embed it into an ISA Investigation

The `StudyDesign.generate_isa_study()` method returns the complete ISA-API `Study` object.

In [5]:
start = time()
study = study_design.generate_isa_study()
end = time()
print('The generation of the study design took {:.2f} s.'.format(end - start))
assert isinstance(study, Study)
investigation = Investigation(studies=[study])

The generation of the study design took 0.48 s.


## 5. Serialize and save the JSON representation of the generated ISA Investigation

In [6]:
start = time()
inv_json = json.dumps(investigation, cls=ISAJSONEncoder, sort_keys=True, indent=4, separators=(',', ': '))
end = time()
print('The JSON serialisation of the ISA investigation took {:.2f} s.'.format(end - start))

The JSON serialisation of the ISA investigation took 0.26 s.


In [7]:
directory = os.path.abspath(os.path.join('output'))
if not os.path.exists(directory):
    os.makedirs(directory)
with open(os.path.abspath(os.path.join('output','isa-investigation-2-arms-nmr-ms.json')), 'w') as out_fp:
    json.dump(json.loads(inv_json), out_fp)

## 6. Dump the ISA Investigation to ISA-Tab

In [8]:
start = time()
isatab.dump(investigation, os.path.abspath(os.path.join('output')))
end = time()
print('The Tab serialisation of the ISA investigation took {:.2f} s.'.format(end - start))

2020-08-21 13:04:46,572 [INFO]: isatab.py(_all_end_to_end_paths:1152) >> Found 270 paths!
2020-08-21 13:04:46,586 [INFO]: isatab.py(write_study_table_files:1330) >> Rendered 270 paths
2020-08-21 13:04:46,592 [INFO]: isatab.py(write_study_table_files:1337) >> Writing 270 rows
2020-08-21 13:04:46,836 [INFO]: isatab.py(_all_end_to_end_paths:1152) >> Found 40 paths!
2020-08-21 13:04:47,070 [INFO]: isatab.py(_all_end_to_end_paths:1152) >> Found 40 paths!
2020-08-21 13:04:47,956 [INFO]: isatab.py(_all_end_to_end_paths:1152) >> Found 160 paths!
2020-08-21 13:04:48,200 [INFO]: isatab.py(_all_end_to_end_paths:1152) >> Found 40 paths!
2020-08-21 13:04:48,433 [INFO]: isatab.py(_all_end_to_end_paths:1152) >> Found 40 paths!
2020-08-21 13:04:49,308 [INFO]: isatab.py(_all_end_to_end_paths:1152) >> Found 160 paths!
2020-08-21 13:04:49,554 [INFO]: isatab.py(_all_end_to_end_paths:1152) >> Found 40 paths!


The Tab serialisation of the ISA investigation took 4.52 s.


To use them on the notebook we can also dump the tables to pandas DataFrames, using the `dump_tables_to_dataframes` function rather than dump

In [9]:
dataframes = isatab.dump_tables_to_dataframes(investigation)

2020-08-21 13:04:53,304 [INFO]: isatab.py(_all_end_to_end_paths:1152) >> Found 270 paths!
2020-08-21 13:04:53,314 [INFO]: isatab.py(write_study_table_files:1330) >> Rendered 270 paths
2020-08-21 13:04:53,318 [INFO]: isatab.py(write_study_table_files:1337) >> Writing 270 rows
2020-08-21 13:04:53,543 [INFO]: isatab.py(_all_end_to_end_paths:1152) >> Found 40 paths!
2020-08-21 13:04:53,773 [INFO]: isatab.py(_all_end_to_end_paths:1152) >> Found 40 paths!
2020-08-21 13:04:54,635 [INFO]: isatab.py(_all_end_to_end_paths:1152) >> Found 160 paths!
2020-08-21 13:04:54,875 [INFO]: isatab.py(_all_end_to_end_paths:1152) >> Found 40 paths!
2020-08-21 13:04:55,100 [INFO]: isatab.py(_all_end_to_end_paths:1152) >> Found 40 paths!
2020-08-21 13:04:55,956 [INFO]: isatab.py(_all_end_to_end_paths:1152) >> Found 160 paths!
2020-08-21 13:04:56,189 [INFO]: isatab.py(_all_end_to_end_paths:1152) >> Found 40 paths!


In [10]:
len(dataframes)

8

## 7. Check the correctness of the ISA-Tab DataFrames 

We have 1 study file and 1 assay file (is this correct? check)

In [11]:
for key in dataframes.keys():
    display(key)

's_study_01.txt'

'a_CELL_Arm_0_5_ASSAY_GRAPH_000_metabolite-profiling_mass-spectrometry.txt'

'a_CELL_Arm_0_2_ASSAY_GRAPH_000_metabolite-profiling_mass-spectrometry.txt'

'a_CELL_Arm_0_4_ASSAY_GRAPH_000_metabolite-profiling_mass-spectrometry.txt'

'a_CELL_Arm_3_2_ASSAY_GRAPH_000_metabolite-profiling_mass-spectrometry.txt'

'a_CELL_Arm_2_5_ASSAY_GRAPH_000_metabolite-profiling_mass-spectrometry.txt'

'a_CELL_Arm_2_4_ASSAY_GRAPH_000_metabolite-profiling_mass-spectrometry.txt'

'a_CELL_Arm_2_2_ASSAY_GRAPH_000_metabolite-profiling_mass-spectrometry.txt'

We have 10 subjects in the each of the four selected arms (Arm_0, Arm_2, Arm_3, Arm_4) and 24 samples have been collected (3 blood samples per subject during the follow-up epoch)

In [12]:
study_frame = dataframes['s_study_01.txt']
count_arm0_samples = len(study_frame[study_frame['Source Name'].apply(lambda el: 'Arm_0' in el)])
count_arm2_samples = len(study_frame[study_frame['Source Name'].apply(lambda el: 'Arm_2' in el)])
count_arm3_samples = len(study_frame[study_frame['Source Name'].apply(lambda el: 'Arm_3' in el)])
count_arm4_samples = len(study_frame[study_frame['Source Name'].apply(lambda el: 'Arm_4' in el)])
print("There are {} samples in the Arm_0 arm (i.e. group)".format(count_arm0_samples))
print("There are {} samples in the Arm_2 arm (i.e. group)".format(count_arm2_samples))
print("There are {} samples in the Arm_3 arm (i.e. group)".format(count_arm3_samples))
print("There are {} samples in the Arm_4 arm (i.e. group)".format(count_arm4_samples))

There are 70 samples in the Arm_0 arm (i.e. group)
There are 70 samples in the Arm_2 arm (i.e. group)
There are 60 samples in the Arm_3 arm (i.e. group)
There are 70 samples in the Arm_4 arm (i.e. group)


In [14]:
dataframes['a_CELL_Arm_0_5_ASSAY_GRAPH_000_metabolite-profiling_mass-spectrometry.txt']

Unnamed: 0,Sample Name,Protocol REF,Performer,Extract Name,Characteristics[extract type],Protocol REF.1,Performer.1,Labeled Extract Name,Label,Protocol REF.2,Parameter Value[instrument],Parameter Value[injection_mode],Parameter Value[acquisition_mode],MS Assay Name,Performer.2,Raw Spectral Data File
0,GRP-Arm_0.SBJ-001.CEL-CELL_Arm_0_5.SMP-.blood.1,extraction,Ellipsis,extract_4-1,polar fraction,labelling,Ellipsis,labelled extract_4-3,biotin,mass spectrometry,Agilent QTQF 6510,FIA,positive mode,mass spectrometry_4-8,Ellipsis,raw spectral data file_4-9
1,GRP-Arm_0.SBJ-001.CEL-CELL_Arm_0_5.SMP-.blood.1,extraction,Ellipsis,extract_4-1,polar fraction,labelling,Ellipsis,labelled extract_4-3,biotin,mass spectrometry,Agilent QTQF 6510,LC,positive mode,mass spectrometry_4-4,Ellipsis,raw spectral data file_4-5
2,GRP-Arm_0.SBJ-001.CEL-CELL_Arm_0_5.SMP-.blood.1,extraction,Ellipsis,extract_4-1,polar fraction,labelling,Ellipsis,labelled extract_4-3,biotin,mass spectrometry,Agilent QTQF 6510,LC,positive mode,mass spectrometry_4-6,Ellipsis,raw spectral data file_4-7
3,GRP-Arm_0.SBJ-001.CEL-CELL_Arm_0_5.SMP-.blood.1,extraction,Ellipsis,extract_4-1,polar fraction,labelling,Ellipsis,labelled extract_4-3,biotin,mass spectrometry,Agilent QTQF 6510,FIA,positive mode,mass spectrometry_4-10,Ellipsis,raw spectral data file_4-11
4,GRP-Arm_0.SBJ-001.CEL-CELL_Arm_0_5.SMP-.saliva.1,extraction,Ellipsis,extract_22-1,polar fraction,labelling,Ellipsis,labelled extract_22-3,biotin,mass spectrometry,Agilent QTQF 6510,LC,positive mode,mass spectrometry_22-6,Ellipsis,raw spectral data file_22-7
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
155,GRP-Arm_0.SBJ-010.CEL-CELL_Arm_0_5.SMP-.saliva.2,extraction,Ellipsis,extract_17-1,polar fraction,labelling,Ellipsis,labelled extract_17-3,biotin,mass spectrometry,Agilent QTQF 6510,FIA,positive mode,mass spectrometry_17-10,Ellipsis,raw spectral data file_17-11
156,GRP-Arm_0.SBJ-010.CEL-CELL_Arm_0_5.SMP-.saliva.3,extraction,Ellipsis,extract_18-1,polar fraction,labelling,Ellipsis,labelled extract_18-3,biotin,mass spectrometry,Agilent QTQF 6510,LC,positive mode,mass spectrometry_18-4,Ellipsis,raw spectral data file_18-5
157,GRP-Arm_0.SBJ-010.CEL-CELL_Arm_0_5.SMP-.saliva.3,extraction,Ellipsis,extract_18-1,polar fraction,labelling,Ellipsis,labelled extract_18-3,biotin,mass spectrometry,Agilent QTQF 6510,FIA,positive mode,mass spectrometry_18-8,Ellipsis,raw spectral data file_18-9
158,GRP-Arm_0.SBJ-010.CEL-CELL_Arm_0_5.SMP-.saliva.3,extraction,Ellipsis,extract_18-1,polar fraction,labelling,Ellipsis,labelled extract_18-3,biotin,mass spectrometry,Agilent QTQF 6510,FIA,positive mode,mass spectrometry_18-10,Ellipsis,raw spectral data file_18-11
