# Create ISA-API Investigation from Datascriptor Study Design configuration

In this notebook I will show you how you can use a study design configuration is JSON format as produce by datascriptor (https://gitlab.com/datascriptor/datascriptor) to generate a single-study ISA investigation and how you can then serialise it in JSON and tabular (i.e. CSV) format.

Or study design configuration consists of:
- a 6-arm study design
- there is an observational factor with 3 values, which is age_group (young, middle-aged, elderly)
- a crossover of two treatments, a drug and a biological treatment
- three non-treatment phases: screen, washout and follow-up
- two sample types colllected: blood and saliva
- two assay types: metabolite profiling through (1) mass spectrometry on the saliva sample and (2) NMR spectroscopy on the blood samples. A combination of technical replicates and protocol parameters are defined in the assay configuration

## 1. Setup

Let's import all the required libraries

In [1]:
from time import time
import os
import json

In [2]:
## ISA-API related imports
from isatools.model import Investigation, Study

LOG: <Logger isatools (DEBUG)>


In [3]:
## ISA-API create mode related imports
from isatools.create.model import StudyDesign
from isatools.create.connectors import generate_study_design_from_config

# serializer from ISA Investigation to JSON
from isatools.isajson import ISAJSONEncoder

# ISA-Tab serialisation
from isatools import isatab

In [4]:
## ISA-API create mode related imports
from isatools.create import model
from isatools import isajson

## 2. Load the Study Design JSON configuration

First of all we load the study design configurator

In [5]:
with open(os.path.abspath(os.path.join(
    "config", "study-design-crossover-onto-annotated-ms-and-nnmr.json"
)), "r") as config_file:
    study_design_config = json.load(config_file)
study_design_config

{'observationalFactors': [{'name': 'age group',
   'values': ['young', 'middle-aged', 'elderly'],
   'isQuantitative': False,
   'unit': None}],
 'generatedSubjectGroups': [{'name': 'SubjectGroup_0',
   'type': 'Homo sapiens',
   'characteristics': [{'name': 'age group',
     'value': 'young',
     'unit': None,
     'isQuantitative': False}]},
  {'name': 'SubjectGroup_1',
   'type': 'Homo sapiens',
   'characteristics': [{'name': 'age group',
     'value': 'middle-aged',
     'unit': None,
     'isQuantitative': False}]},
  {'name': 'SubjectGroup_2',
   'type': 'Homo sapiens',
   'characteristics': [{'name': 'age group',
     'value': 'elderly',
     'unit': None,
     'isQuantitative': False}]}],
 'selectedSubjectGroups': [{'name': 'SubjectGroup_0',
   'type': 'Homo sapiens',
   'characteristics': [{'name': 'age group',
     'value': 'young',
     'unit': None,
     'isQuantitative': False}]},
  {'name': 'SubjectGroup_1',
   'type': 'Homo sapiens',
   'characteristics': [{'name': 'ag

## 3. Generate the ISA Study Design from the JSON configuration
To perform the conversion we just need to use the function `generate_isa_study_design_from_config` (name possibly subject to change, should we drop the "isa" and "datascriptor" qualifiers?)

In [6]:
study_design = generate_study_design_from_config(study_design_config)
assert isinstance(study_design, StudyDesign)

## 4. Generate the ISA Study from the StudyDesign and embed it into an ISA Investigation

The `StudyDesign.generate_isa_study()` method returns the complete ISA-API `Study` object.

In [7]:
start = time()
study = study_design.generate_isa_study()
end = time()
print('The generation of the study design took {:.2f} s.'.format(end - start))
assert isinstance(study, Study)
investigation = Investigation(studies=[study])

The generation of the study design took 6.45 s.


## 5. Serialize and save the JSON representation of the generated ISA Investigation

In [8]:
start = time()
inv_json = json.dumps(investigation, cls=ISAJSONEncoder, sort_keys=True, indent=4, separators=(',', ': '))
end = time()
print('The JSON serialisation of the ISA investigation took {:.2f} s.'.format(end - start))

The JSON serialisation of the ISA investigation took 1.72 s.


In [9]:
directory = os.path.abspath(os.path.join('output'))
if not os.path.exists(directory):
    os.makedirs(directory)
with open(os.path.abspath(os.path.join('output','isa-investigation-2-arms-nmr-ms.json')), 'w') as out_fp:
    json.dump(json.loads(inv_json), out_fp)

## 6. Dump the ISA Investigation to ISA-Tab

In [10]:
start = time()
isatab.dump(investigation, os.path.abspath(os.path.join('output')))
end = time()
print('The Tab serialisation of the ISA investigation took {:.2f} s.'.format(end - start))

The Tab serialisation of the ISA investigation took 96.57 s.


To use them on the notebook we can also dump the tables to pandas DataFrames, using the `dump_tables_to_dataframes` function rather than dump

In [11]:
dataframes = isatab.dump_tables_to_dataframes(investigation)

In [12]:
len(dataframes)

3

## 7. Check the correctness of the ISA-Tab DataFrames 

We have 1 study file and 1 assay file (is this correct? check)

In [13]:
for key in dataframes.keys():
    display(key)

's_study_01.txt'

'a_AT2_metabolite-profiling_NMR-spectroscopy.txt'

'a_AT1_metabolite-profiling_mass-spectrometry.txt'

We have 10 subjects in the each of the four selected arms (Arm_0, Arm_2, Arm_3, Arm_4) and 24 samples have been collected (3 blood samples per subject during the follow-up epoch)

In [14]:
study_frame = dataframes['s_study_01.txt']
count_arm0_samples = len(study_frame[study_frame['Source Name'].apply(lambda el: 'GRP0' in el)])
count_arm2_samples = len(study_frame[study_frame['Source Name'].apply(lambda el: 'GRP2' in el)])
count_arm3_samples = len(study_frame[study_frame['Source Name'].apply(lambda el: 'GRP3' in el)])
count_arm4_samples = len(study_frame[study_frame['Source Name'].apply(lambda el: 'GRP4' in el)])
print("There are {} samples in the GRP0 arm (i.e. group)".format(count_arm0_samples))
print("There are {} samples in the GRP2 arm (i.e. group)".format(count_arm2_samples))
print("There are {} samples in the GRP3 arm (i.e. group)".format(count_arm3_samples))
print("There are {} samples in the GRP4 arm (i.e. group)".format(count_arm4_samples))

There are 90 samples in the GRP0 arm (i.e. group)
There are 90 samples in the GRP2 arm (i.e. group)
There are 90 samples in the GRP3 arm (i.e. group)
There are 90 samples in the GRP4 arm (i.e. group)


In [15]:
dataframes['a_AT1_metabolite-profiling_mass-spectrometry.txt']

Unnamed: 0,Sample Name,Protocol REF,Performer,Extract Name,Characteristics[extract type],Term Accession Number,Protocol REF.1,Performer.1,Labeled Extract Name,Label,Protocol REF.2,Parameter Value[instrument],Term Accession Number.1,Parameter Value[injection_mode],Term Accession Number.2,Parameter Value[acquisition_mode],Term Accession Number.3,MS Assay Name,Performer.2,Raw Spectral Data File
0,GRP0_SBJ01_A0E0_SMP-saliva-1,extraction,Unknown,AT1-6-Extract2,polar fraction,polar fraction,labeling,Unknown,AT1-6-LE3,label_0,mass spectrometry,Agilent QTQF 6510,http://purl.obolibrary.org/obo/MS_1000676,FIA,http://purl.obolibrary.org/obo/MS_1000058,positive mode,http://purl.obolibrary.org/obo/MS_1002807,mass-spectrometry_AT1-6-<acquisition>10,Unknown,raw-spectral-data-file_AT1-6-10
1,GRP0_SBJ01_A0E0_SMP-saliva-1,extraction,Unknown,AT1-6-Extract2,polar fraction,polar fraction,labeling,Unknown,AT1-6-LE3,label_0,mass spectrometry,Agilent QTQF 6510,http://purl.obolibrary.org/obo/MS_1000676,FIA,http://purl.obolibrary.org/obo/MS_1000058,positive mode,http://purl.obolibrary.org/obo/MS_1002807,mass-spectrometry_AT1-6-<acquisition>9,Unknown,raw-spectral-data-file_AT1-6-9
2,GRP0_SBJ01_A0E0_SMP-saliva-1,extraction,Unknown,AT1-6-Extract2,polar fraction,polar fraction,labeling,Unknown,AT1-6-LE4,label_0,mass spectrometry,Agilent QTQF 6510,http://purl.obolibrary.org/obo/MS_1000676,LC,,positive mode,http://purl.obolibrary.org/obo/MS_1002807,mass-spectrometry_AT1-6-<acquisition>16,Unknown,raw-spectral-data-file_AT1-6-16
3,GRP0_SBJ01_A0E0_SMP-saliva-1,extraction,Unknown,AT1-6-Extract2,polar fraction,polar fraction,labeling,Unknown,AT1-6-LE4,label_0,mass spectrometry,Agilent QTQF 6510,http://purl.obolibrary.org/obo/MS_1000676,LC,,positive mode,http://purl.obolibrary.org/obo/MS_1002807,mass-spectrometry_AT1-6-<acquisition>15,Unknown,raw-spectral-data-file_AT1-6-15
4,GRP0_SBJ01_A0E0_SMP-saliva-1,extraction,Unknown,AT1-6-Extract1,lipids,lipids,labeling,Unknown,AT1-6-LE1,label_0,mass spectrometry,Agilent QTQF 6510,http://purl.obolibrary.org/obo/MS_1000676,LC,,positive mode,http://purl.obolibrary.org/obo/MS_1002807,mass-spectrometry_AT1-6-<acquisition>2,Unknown,raw-spectral-data-file_AT1-6-2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3835,GRP5_SBJ10_A5E4_SMP-saliva-3,extraction,Unknown,AT1-233-Extract2,polar fraction,polar fraction,labeling,Unknown,AT1-233-LE4,label_0,mass spectrometry,Agilent QTQF 6510,http://purl.obolibrary.org/obo/MS_1000676,FIA,http://purl.obolibrary.org/obo/MS_1000058,positive mode,http://purl.obolibrary.org/obo/MS_1002807,mass-spectrometry_AT1-233-<acquisition>13,Unknown,raw-spectral-data-file_AT1-233-13
3836,GRP5_SBJ10_A5E4_SMP-saliva-3,extraction,Unknown,AT1-233-Extract1,lipids,lipids,labeling,Unknown,AT1-233-LE2,label_0,mass spectrometry,Agilent QTQF 6510,http://purl.obolibrary.org/obo/MS_1000676,LC,,positive mode,http://purl.obolibrary.org/obo/MS_1002807,mass-spectrometry_AT1-233-<acquisition>6,Unknown,raw-spectral-data-file_AT1-233-6
3837,GRP5_SBJ10_A5E4_SMP-saliva-3,extraction,Unknown,AT1-233-Extract2,polar fraction,polar fraction,labeling,Unknown,AT1-233-LE4,label_0,mass spectrometry,Agilent QTQF 6510,http://purl.obolibrary.org/obo/MS_1000676,LC,,positive mode,http://purl.obolibrary.org/obo/MS_1002807,mass-spectrometry_AT1-233-<acquisition>16,Unknown,raw-spectral-data-file_AT1-233-16
3838,GRP5_SBJ10_A5E4_SMP-saliva-3,extraction,Unknown,AT1-233-Extract2,polar fraction,polar fraction,labeling,Unknown,AT1-233-LE3,label_0,mass spectrometry,Agilent QTQF 6510,http://purl.obolibrary.org/obo/MS_1000676,FIA,http://purl.obolibrary.org/obo/MS_1000058,positive mode,http://purl.obolibrary.org/obo/MS_1002807,mass-spectrometry_AT1-233-<acquisition>10,Unknown,raw-spectral-data-file_AT1-233-10


In [16]:
dataframes['a_AT2_metabolite-profiling_NMR-spectroscopy.txt']

Unnamed: 0,Sample Name,Protocol REF,Performer,Extract Name,Characteristics[extract type],Protocol REF.1,Parameter Value[instrument],Parameter Value[acquisition_mode],Parameter Value[pulse_sequence],Performer.1,Free Induction Decay Data File
0,GRP0_SBJ01_A0E1_SMP-blood-1,extraction,Unknown,AT2-13-Extract1,supernatant,nmr_spectroscopy,Bruker Avance II 1 GHz,1D 1H NMR,TOCSY,Unknown,raw_spectral_data_file_AT2-13-2
1,GRP0_SBJ01_A0E1_SMP-blood-1,extraction,Unknown,AT2-12-Extract2,pellet,nmr_spectroscopy,Bruker Avance II 1 GHz,1D 1H NMR,CPMG,Unknown,raw_spectral_data_file_AT2-12-7
2,GRP0_SBJ01_A0E1_SMP-blood-1,extraction,Unknown,AT2-12-Extract1,supernatant,nmr_spectroscopy,Bruker Avance II 1 GHz,1D 1H NMR,CPMG,Unknown,raw_spectral_data_file_AT2-12-4
3,GRP0_SBJ01_A0E1_SMP-blood-1,extraction,Unknown,AT2-13-Extract2,pellet,nmr_spectroscopy,Bruker Avance II 1 GHz,1D 1H NMR,TOCSY,Unknown,raw_spectral_data_file_AT2-13-6
4,GRP0_SBJ01_A0E1_SMP-blood-1,extraction,Unknown,AT2-12-Extract2,pellet,nmr_spectroscopy,Bruker Avance II 1 GHz,1D 1H NMR,CPMG,Unknown,raw_spectral_data_file_AT2-12-8
...,...,...,...,...,...,...,...,...,...,...,...
4795,GRP5_SBJ10_A5E4_SMP-blood-3,extraction,Unknown,AT2-587-Extract1,supernatant,nmr_spectroscopy,Bruker Avance II 1 GHz,1D 1H NMR,CPMG,Unknown,raw_spectral_data_file_AT2-587-3
4796,GRP5_SBJ10_A5E4_SMP-blood-3,extraction,Unknown,AT2-586-Extract2,pellet,nmr_spectroscopy,Bruker Avance II 1 GHz,1D 1H NMR,CPMG,Unknown,raw_spectral_data_file_AT2-586-7
4797,GRP5_SBJ10_A5E4_SMP-blood-3,extraction,Unknown,AT2-586-Extract1,supernatant,nmr_spectroscopy,Bruker Avance II 1 GHz,1D 1H NMR,CPMG,Unknown,raw_spectral_data_file_AT2-586-3
4798,GRP5_SBJ10_A5E4_SMP-blood-3,extraction,Unknown,AT2-586-Extract1,supernatant,nmr_spectroscopy,Bruker Avance II 1 GHz,1D 1H NMR,TOCSY,Unknown,raw_spectral_data_file_AT2-586-1


In [17]:
dataframes['a_AT2_metabolite-profiling_NMR-spectroscopy.txt'].shape

(4800, 11)