# Create ISA-API Investigation from Datascriptor Study Design configuration

# Factorial Study without observational factors

In this notebook I will show you how you can use a study design configuration is JSON format as produce by datascriptor (https://gitlab.com/datascriptor/datascriptor) to generate a single-study ISA investigation and how you can then serialise it in JSON and tabular (i.e. CSV) format.

Or study design configuration consists of:
- a 6-arm study design. Each arm has 12 subjects
- Patients were randomised to one of 4 groups – intravenous streptokinase, oral aspirin, both or neither – and mortality was measured. at three different dosages (5, 10 mg/day)), for three different durations (30, 60 days)
- a screen phase before treatment (7 days)
- a follow-up phase after treatment (90 days)
- two sample types colllected: unspecified tissue, blood
- two assay types: 
    - Metabolite Profiling with Mass Spectrometry for unspecified tissue
    - Metabolite Profiling with NMR Spectroscopy for blood sample

## 1. Setup

Let's import all the required libraries

In [20]:
from time import time
import os
import json

In [21]:
## ISA-API related imports
from isatools.model import Investigation, Study

In [22]:
## ISA-API create mode related imports
from isatools.create.model import StudyDesign
from isatools.create.connectors import generate_study_design

# serializer from ISA Investigation to JSON
from isatools.isajson import ISAJSONEncoder

# ISA-Tab serialisation
from isatools import isatab

In [23]:
## ISA-API create mode related imports
from isatools.create import model
from isatools import isajson

## 2. Load the Study Design JSON configuration

First of all we load the study design configurator with all the specs defined above

In [24]:
with open(os.path.abspath(os.path.join(
    "config", "factorial-study-human-4-treatments.json"
)), "r") as config_file:
    study_design_config = json.load(config_file)
study_design_config

{'_id': '60d8b601b832df0008a19abc',
 'name': '4-treatments factorial study',
 'description': 'Patients were randomised to one of 4 groups – intravenous streptokinase, oral aspirin, both or neither – and mortality was measured. (Lancet 1988;ii:349-60 )',
 'design': {'subjectType': {'term': 'Homo sapiens',
   'iri': 'http://purl.obolibrary.org/obo/NCBITaxon_9606'},
  'subjectSize': 12,
  'designType': {'term': 'full factorial design',
   'id': 'STATO:0000270',
   'iri': 'http://purl.obolibrary.org/obo/STATO_0000270',
   'label': 'Study subjects receive a single treatment',
   'value': 'fullFactorial'},
  'observationalFactors': [],
  'subjectGroups': {'selected': [{'name': 'SubjectGroup_0',
     'type': {'term': 'Homo sapiens',
      'iri': 'http://purl.obolibrary.org/obo/NCBITaxon_9606'},
     'characteristics': []}],
   'unselected': []},
  'treatmentPlan': {'observationPeriod': {'name': 'observation period',
    'duration': None,
    'durationUnit': ''},
   'screen': {'selected': True

## 3. Generate the ISA Study Design from the JSON configuration
To perform the conversion we just need to use the function `generate_isa_study_design()` (name possibly subject to change, should we drop the "isa" and "datascriptor" qualifiers?)

In [25]:
study_design = generate_study_design(study_design_config)
assert isinstance(study_design, StudyDesign)

## 4. Generate the ISA Study from the StudyDesign and embed it into an ISA Investigation

The `StudyDesign.generate_isa_study()` method returns the complete ISA-API `Study` object.

In [26]:
start = time()
study = study_design.generate_isa_study()
end = time()
print('The generation of the study design took {:.2f} s.'.format(end - start))
assert isinstance(study, Study)
investigation = Investigation(identifier='inv01', studies=[study])

The generation of the study design took 3.91 s.


## 5. Serialize and save the JSON representation of the generated ISA Investigation

In [27]:
start = time()
inv_json = json.dumps(investigation, cls=ISAJSONEncoder, sort_keys=True, indent=4, separators=(',', ': '))
end = time()
print('The JSON serialisation of the ISA investigation took {:.2f} s.'.format(end - start))

The JSON serialisation of the ISA investigation took 0.96 s.


In [28]:
directory = os.path.abspath(os.path.join('output', 'factorial-study-human-4-treatments'))
os.makedirs(directory, exist_ok=True)
with open(os.path.abspath(os.path.join(directory, 'isa-investigation-human-factorial.json')), 'w') as out_fp:
    json.dump(json.loads(inv_json), out_fp)

## 6. Dump the ISA Investigation to ISA-Tab

In [29]:
start = time()
isatab.dump(investigation, directory)
end = time()
print('The Tab serialisation of the ISA investigation took {:.2f} s.'.format(end - start))

The Tab serialisation of the ISA investigation took 43.63 s.


To use them on the notebook we can also dump the tables to pandas DataFrames, using the `dump_tables_to_dataframes` function rather than dump

In [30]:
dataframes = isatab.dump_tables_to_dataframes(investigation)

In [31]:
len(dataframes)

3

## 7. Check the correctness of the ISA-Tab DataFrames 

We have 1 study file and 2 assay files (one for MS and one for NMR). Let's check the names:

In [32]:
for key in dataframes.keys():
    display(key)

's_study_01.txt'

'a_AT2_metabolite-profiling_NMR-spectroscopy.txt'

'a_AT1_metabolite-profiling_mass-spectrometry.txt'

### 7.1 Count of subjects and samples

We have 12 subjects in the each of the six arms for a total of 72 subjects. 5 blood samples per subject are collected (1 in the screen 1 phase, 1 in treatment, and 3 in the follow-up phase) for a total of 60 * 6 blood samples. These will undergo the NMR assay. We have 1 unspecified tissue sample per subject (1 during treatment) for a total of 12 * 6 saliva samples. These will undergo the "mass spcetrometry" assay.

In [33]:
study_frame = dataframes['s_study_01.txt']
count_arm2_samples = len(study_frame[study_frame['Source Name'].apply(lambda el: 'GRP2' in el)])
count_arm4_samples = len(study_frame[study_frame['Source Name'].apply(lambda el: 'GRP4' in el)])
count_arm5_samples = len(study_frame[study_frame['Source Name'].apply(lambda el: 'GRP5' in el)])
count_arm10_samples = len(study_frame[study_frame['Source Name'].apply(lambda el: 'GRP10' in el)])
count_arm12_samples = len(study_frame[study_frame['Source Name'].apply(lambda el: 'GRP12' in el)])
count_arm13_samples = len(study_frame[study_frame['Source Name'].apply(lambda el: 'GRP13' in el)])
print("There are {} samples in the GRP2 arm (i.e. group)".format(count_arm2_samples))
print("There are {} samples in the GRP4 arm (i.e. group)".format(count_arm4_samples))
print("There are {} samples in the GRP5 arm (i.e. group)".format(count_arm5_samples))
print("There are {} samples in the GRP10 arm (i.e. group)".format(count_arm10_samples))
print("There are {} samples in the GRP12 arm (i.e. group)".format(count_arm12_samples))
print("There are {} samples in the GRP13 arm (i.e. group)".format(count_arm13_samples))


There are 72 samples in the GRP2 arm (i.e. group)
There are 72 samples in the GRP4 arm (i.e. group)
There are 72 samples in the GRP5 arm (i.e. group)
There are 72 samples in the GRP10 arm (i.e. group)
There are 72 samples in the GRP12 arm (i.e. group)
There are 72 samples in the GRP13 arm (i.e. group)


###  Study Table overview


In [34]:
study_frame

Unnamed: 0,Source Name,Characteristics[Study Subject],Term Accession Number,Protocol REF,Parameter Value[Sampling order],Parameter Value[Study cell],Date,Performer,Sample Name,Characteristics[organism part],Term Accession Number.1,Comment[study step with treatment],Factor Value[Sequence Order],Factor Value[INTENSITY],Unit,Factor Value[DURATION],Unit.1,Factor Value[AGENT]
0,GRP10_SBJ01,Homo sapiens,http://purl.obolibrary.org/obo/NCBITaxon_9606,sample collection,032,A3E1,2021-07-13,Unknown,GRP10_SBJ01_A3E1_SMP-Blood-Sample-1,Blood Sample,http://purl.obolibrary.org/obo/NCIT_C17610,YES,1,5.0,mg/day,30,days,intravenous streptokinase + oral aspirin
1,GRP10_SBJ01,Homo sapiens,http://purl.obolibrary.org/obo/NCBITaxon_9606,sample collection,060,A3E2,2021-07-13,Unknown,GRP10_SBJ01_A3E2_SMP-Blood-Sample-3,Blood Sample,http://purl.obolibrary.org/obo/NCIT_C17610,NO,2,,,90,days,
2,GRP10_SBJ01,Homo sapiens,http://purl.obolibrary.org/obo/NCBITaxon_9606,sample collection,059,A3E2,2021-07-13,Unknown,GRP10_SBJ01_A3E2_SMP-Blood-Sample-2,Blood Sample,http://purl.obolibrary.org/obo/NCIT_C17610,NO,2,,,90,days,
3,GRP10_SBJ01,Homo sapiens,http://purl.obolibrary.org/obo/NCBITaxon_9606,sample collection,008,A3E0,2021-07-13,Unknown,GRP10_SBJ01_A3E0_SMP-Blood-Sample-1,Blood Sample,http://purl.obolibrary.org/obo/NCIT_C17610,NO,0,,,7,days,
4,GRP10_SBJ01,Homo sapiens,http://purl.obolibrary.org/obo/NCBITaxon_9606,sample collection,058,A3E2,2021-07-13,Unknown,GRP10_SBJ01_A3E2_SMP-Blood-Sample-1,Blood Sample,http://purl.obolibrary.org/obo/NCIT_C17610,NO,2,,,90,days,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
427,GRP5_SBJ12,Homo sapiens,http://purl.obolibrary.org/obo/NCBITaxon_9606,sample collection,401,A1E2,2021-07-13,Unknown,GRP5_SBJ12_A1E2_SMP-Blood-Sample-2,Blood Sample,http://purl.obolibrary.org/obo/NCIT_C17610,NO,2,,,90,days,
428,GRP5_SBJ12,Homo sapiens,http://purl.obolibrary.org/obo/NCBITaxon_9606,sample collection,386,A1E1,2021-07-13,Unknown,GRP5_SBJ12_A1E1_SMP-Blood-Sample-1,Blood Sample,http://purl.obolibrary.org/obo/NCIT_C17610,YES,1,10.0,mg/day,60,days,oral aspirin
429,GRP5_SBJ12,Homo sapiens,http://purl.obolibrary.org/obo/NCBITaxon_9606,sample collection,374,A1E1,2021-07-13,Unknown,GRP5_SBJ12_A1E1_SMP-Unspecified-Tissue-1,Unspecified Tissue,http://purl.obolibrary.org/obo/NCIT_C132256,YES,1,10.0,mg/day,60,days,oral aspirin
430,GRP5_SBJ12,Homo sapiens,http://purl.obolibrary.org/obo/NCBITaxon_9606,sample collection,362,A1E0,2021-07-13,Unknown,GRP5_SBJ12_A1E0_SMP-Blood-Sample-1,Blood Sample,http://purl.obolibrary.org/obo/NCIT_C17610,NO,0,,,7,days,


###  First Assay: Metabolite Profiling using NMR spectroscopy

This assay takes blood samples as input

In [35]:
dataframes['a_AT2_metabolite-profiling_NMR-spectroscopy.txt']

Unnamed: 0,Sample Name,Comment[study step with treatment],Protocol REF,Performer,Extract Name,Characteristics[extract type],Protocol REF.1,Parameter Value[instrument],Parameter Value[acquisition_mode],Parameter Value[pulse_sequence],Performer.1,Free Induction Decay Data File
0,GRP10_SBJ01_A3E0_SMP-Blood-Sample-1,NO,extraction,Unknown,AT2-S15-Extract-R1,pellet,nmr_spectroscopy,Bruker Avance II 1 GHz,2D 13C-13C NMR,TOCSY,Unknown,AT2-S15-raw_spectral_data_file-R1.raw
1,GRP10_SBJ01_A3E0_SMP-Blood-Sample-1,NO,extraction,Unknown,AT2-S15-Extract-R2,supernatant,nmr_spectroscopy,Bruker Avance II 1 GHz,2D 13C-13C NMR,watergate,Unknown,AT2-S15-raw_spectral_data_file-R4.raw
2,GRP10_SBJ01_A3E0_SMP-Blood-Sample-1,NO,extraction,Unknown,AT2-S15-Extract-R1,pellet,nmr_spectroscopy,Bruker Avance II 1 GHz,2D 13C-13C NMR,watergate,Unknown,AT2-S15-raw_spectral_data_file-R2.raw
3,GRP10_SBJ01_A3E0_SMP-Blood-Sample-1,NO,extraction,Unknown,AT2-S16-Extract-R2,supernatant,nmr_spectroscopy,Bruker Avance II 1 GHz,2D 13C-13C NMR,watergate,Unknown,AT2-S16-raw_spectral_data_file-R4.raw
4,GRP10_SBJ01_A3E0_SMP-Blood-Sample-1,NO,extraction,Unknown,AT2-S15-Extract-R2,supernatant,nmr_spectroscopy,Bruker Avance II 1 GHz,2D 13C-13C NMR,TOCSY,Unknown,AT2-S15-raw_spectral_data_file-R3.raw
...,...,...,...,...,...,...,...,...,...,...,...,...
2875,GRP5_SBJ12_A1E2_SMP-Blood-Sample-3,NO,extraction,Unknown,AT2-S659-Extract-R1,pellet,nmr_spectroscopy,Bruker Avance II 1 GHz,2D 13C-13C NMR,TOCSY,Unknown,AT2-S659-raw_spectral_data_file-R1.raw
2876,GRP5_SBJ12_A1E2_SMP-Blood-Sample-3,NO,extraction,Unknown,AT2-S660-Extract-R1,pellet,nmr_spectroscopy,Bruker Avance II 1 GHz,2D 13C-13C NMR,watergate,Unknown,AT2-S660-raw_spectral_data_file-R2.raw
2877,GRP5_SBJ12_A1E2_SMP-Blood-Sample-3,NO,extraction,Unknown,AT2-S659-Extract-R2,supernatant,nmr_spectroscopy,Bruker Avance II 1 GHz,2D 13C-13C NMR,watergate,Unknown,AT2-S659-raw_spectral_data_file-R4.raw
2878,GRP5_SBJ12_A1E2_SMP-Blood-Sample-3,NO,extraction,Unknown,AT2-S659-Extract-R1,pellet,nmr_spectroscopy,Bruker Avance II 1 GHz,2D 13C-13C NMR,watergate,Unknown,AT2-S659-raw_spectral_data_file-R2.raw


#### Overview of the NMR assay table

In [36]:
dataframes['a_AT2_metabolite-profiling_NMR-spectroscopy.txt'].nunique(axis=0, dropna=True)

Sample Name                            360
Comment[study step with treatment]       2
Protocol REF                             1
Performer                                1
Extract Name                          1440
Characteristics[extract type]            2
Protocol REF.1                           1
Parameter Value[instrument]              1
Parameter Value[acquisition_mode]        1
Parameter Value[pulse_sequence]          2
Performer.1                              1
Free Induction Decay Data File        2880
dtype: int64

###  Second Assay: Metabolite Profiling using MS spectrometry

This assay takes blood samples as input

In [37]:
dataframes['a_AT1_metabolite-profiling_mass-spectrometry.txt']

Unnamed: 0,Sample Name,Comment[study step with treatment],Protocol REF,Performer,Extract Name,Characteristics[extract type],Term Accession Number,Protocol REF.1,Performer.1,Labeled Extract Name,...,Protocol REF.2,Parameter Value[instrument],Term Accession Number.1,Parameter Value[injection_mode],Term Accession Number.2,Parameter Value[acquisition_mode],Term Accession Number.3,MS Assay Name,Performer.2,Raw Spectral Data File
0,GRP10_SBJ01_A3E1_SMP-Unspecified-Tissue-1,YES,extraction,Unknown,AT1-S16-Extract-R2,lipids,lipids,labeling,Unknown,AT1-S16-LE-R2,...,mass spectrometry,Agilent QTQF 6510,http://purl.obolibrary.org/obo/MS_1000676,LC,,positive mode,http://purl.obolibrary.org/obo/MS_1002807,AT1-S16-mass-spectrometry-Acquisition-R6,Unknown,AT1-S16-raw-spectral-data-file-R6.raw
1,GRP10_SBJ01_A3E1_SMP-Unspecified-Tissue-1,YES,extraction,Unknown,AT1-S16-Extract-R1,polar fraction,polar fraction,labeling,Unknown,AT1-S16-LE-R1,...,mass spectrometry,Agilent QTQF 6510,http://purl.obolibrary.org/obo/MS_1000676,LC,,positive mode,http://purl.obolibrary.org/obo/MS_1002807,AT1-S16-mass-spectrometry-Acquisition-R4,Unknown,AT1-S16-raw-spectral-data-file-R4.raw
2,GRP10_SBJ01_A3E1_SMP-Unspecified-Tissue-1,YES,extraction,Unknown,AT1-S15-Extract-R2,lipids,lipids,labeling,Unknown,AT1-S15-LE-R2,...,mass spectrometry,Agilent QTQF 6510,http://purl.obolibrary.org/obo/MS_1000676,LC,,positive mode,http://purl.obolibrary.org/obo/MS_1002807,AT1-S15-mass-spectrometry-Acquisition-R6,Unknown,AT1-S15-raw-spectral-data-file-R6.raw
3,GRP10_SBJ01_A3E1_SMP-Unspecified-Tissue-1,YES,extraction,Unknown,AT1-S15-Extract-R1,polar fraction,polar fraction,labeling,Unknown,AT1-S15-LE-R1,...,mass spectrometry,Agilent QTQF 6510,http://purl.obolibrary.org/obo/MS_1000676,LC,,positive mode,http://purl.obolibrary.org/obo/MS_1002807,AT1-S15-mass-spectrometry-Acquisition-R3,Unknown,AT1-S15-raw-spectral-data-file-R3.raw
4,GRP10_SBJ01_A3E1_SMP-Unspecified-Tissue-1,YES,extraction,Unknown,AT1-S15-Extract-R2,lipids,lipids,labeling,Unknown,AT1-S15-LE-R2,...,mass spectrometry,Agilent QTQF 6510,http://purl.obolibrary.org/obo/MS_1000676,FIA,http://purl.obolibrary.org/obo/MS_1000058,positive mode,http://purl.obolibrary.org/obo/MS_1002807,AT1-S15-mass-spectrometry-Acquisition-R7,Unknown,AT1-S15-raw-spectral-data-file-R7.raw
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1147,GRP5_SBJ12_A1E1_SMP-Unspecified-Tissue-1,YES,extraction,Unknown,AT1-S124-Extract-R1,polar fraction,polar fraction,labeling,Unknown,AT1-S124-LE-R1,...,mass spectrometry,Agilent QTQF 6510,http://purl.obolibrary.org/obo/MS_1000676,LC,,positive mode,http://purl.obolibrary.org/obo/MS_1002807,AT1-S124-mass-spectrometry-Acquisition-R4,Unknown,AT1-S124-raw-spectral-data-file-R4.raw
1148,GRP5_SBJ12_A1E1_SMP-Unspecified-Tissue-1,YES,extraction,Unknown,AT1-S123-Extract-R2,lipids,lipids,labeling,Unknown,AT1-S123-LE-R2,...,mass spectrometry,Agilent QTQF 6510,http://purl.obolibrary.org/obo/MS_1000676,FIA,http://purl.obolibrary.org/obo/MS_1000058,positive mode,http://purl.obolibrary.org/obo/MS_1002807,AT1-S123-mass-spectrometry-Acquisition-R8,Unknown,AT1-S123-raw-spectral-data-file-R8.raw
1149,GRP5_SBJ12_A1E1_SMP-Unspecified-Tissue-1,YES,extraction,Unknown,AT1-S124-Extract-R2,lipids,lipids,labeling,Unknown,AT1-S124-LE-R2,...,mass spectrometry,Agilent QTQF 6510,http://purl.obolibrary.org/obo/MS_1000676,FIA,http://purl.obolibrary.org/obo/MS_1000058,positive mode,http://purl.obolibrary.org/obo/MS_1002807,AT1-S124-mass-spectrometry-Acquisition-R7,Unknown,AT1-S124-raw-spectral-data-file-R7.raw
1150,GRP5_SBJ12_A1E1_SMP-Unspecified-Tissue-1,YES,extraction,Unknown,AT1-S123-Extract-R1,polar fraction,polar fraction,labeling,Unknown,AT1-S123-LE-R1,...,mass spectrometry,Agilent QTQF 6510,http://purl.obolibrary.org/obo/MS_1000676,LC,,positive mode,http://purl.obolibrary.org/obo/MS_1002807,AT1-S123-mass-spectrometry-Acquisition-R4,Unknown,AT1-S123-raw-spectral-data-file-R4.raw


#### Overview of the MS assay table
 

In [38]:
dataframes['a_AT1_metabolite-profiling_mass-spectrometry.txt'].nunique(axis=0, dropna=True)

Sample Name                             72
Comment[study step with treatment]       1
Protocol REF                             1
Performer                                1
Extract Name                           288
Characteristics[extract type]            2
Term Accession Number                    2
Protocol REF.1                           1
Performer.1                              1
Labeled Extract Name                   288
Label                                    1
Protocol REF.2                           1
Parameter Value[instrument]              1
Term Accession Number.1                  1
Parameter Value[injection_mode]          2
Term Accession Number.2                  2
Parameter Value[acquisition_mode]        1
Term Accession Number.3                  1
MS Assay Name                         1152
Performer.2                              1
Raw Spectral Data File                1152
dtype: int64