# ISA Create Mode example

## Abstract:
    
In this notebook, we'll show how to generate an ISA-Tab and an ISA JSON representation of a metabolomics study.
The study uses GC-MS and 13C NMR on 3 distinct sample types (liver, blood and heart) collected from study subjects assigned to 3 distinct study arms.

GC-MS acquisition were carried out in duplicate, extracts were derivatized using BSA and acquired on an Agilent QTOF in both positive and negative modes.
13C NMR free induction decays were acquired on a Bruker Avance, using CPMG and PSEQ pulse sequences in duplicates.



### 1. Loading ISA-API model and relevant library

In [30]:
# If executing the notebooks on `Google Colab`,uncomment the following command 
# and run it to install the required python libraries. Also, make the test datasets available.

# !pip install -r requirements.txt

In [31]:
from isatools import isatab
from isatools.isajson import ISAJSONEncoder
from collections import OrderedDict
from isatools.model import (
    Investigation,
    OntologySource,
    OntologyAnnotation,
    FactorValue,
    Characteristic
)
from isatools.create.model import (
    Treatment,
    NonTreatment,
    StudyCell,
    StudyArm,
    ProductNode,
    SampleAndAssayPlan,
    StudyDesign,
    QualityControl
)
from isatools.create.constants import (
    BASE_FACTORS,
    SCREEN,
    RUN_IN,
    WASHOUT,
    FOLLOW_UP,
    SAMPLE,
    EXTRACT,
    LABELED_EXTRACT,
    DATA_FILE
)
from isatools.isatab import dump_tables_to_dataframes as dumpdf
import os
import json

### 2. Setting variables:

In [32]:

# ontologies
ontologies = {
    "chebi": OntologySource(
        name = "CHEBI",
        description = "Chemical Entities of Biological Interest"),
    "chmo": OntologySource(
        name = "CHMO", 
        description = "Chemical Methods Ontology"),
    "msio": OntologySource(
        name = "MSIO",
        description = "Metabolite Standards Initiative Ontology"),
    "ncbitaxon": OntologySource(
        name = "NCBITAXON", 
        description = "NCBI organismal classification"),
    "ncit": OntologySource(
        name = "NCIT", 
        description = "NCI Thesaurus OBO Edition"),
    "obi": OntologySource(
        name = "OBI", 
        description = "Ontology for Biomedical Investigations"),
    "uo": OntologySource(
        name = "UO", 
        description = "UO - the Unit Ontology"),
    "pato": OntologySource(
        name = "PATO", 
        description = "PATO - the Phenotype And Trait Ontology"),
    "uberon": OntologySource(
        name = "UBERON", 
        description = "Uber-anatomy ontology")
}
# add ontologies to investigation

isa_investigation = Investigation()

for o in ontologies.values():
    isa_investigation.ontology_source_references.append(o)


NAME = 'name'
FACTORS_0_VALUE = OntologyAnnotation(term='cadmium chloride') #, term_accession='http://purl.obolibrary.org/obo/CHEBI_35456', term_source='CHEBI'
FACTORS_0_VALUE_ALT = OntologyAnnotation(term='ethoprophos') #, term_accession='http://purl.obolibrary.org/obo/CHEBI_38665', term_source='CHEBI'
FACTORS_0_VALUE_THIRD = OntologyAnnotation(term='pirinixic acid') #, term_accession='http://purl.obolibrary.org/obo/CHEBI_32509', term_source='CHEBI'

FACTORS_1_VALUE = 5
FACTORS_2_VALUE_ALT = "BMD"
FACTORS_2_VALUE_THIRD = "EC25"
FACTORS_1_UNIT = OntologyAnnotation(term='kg/m^3') # , term_accession='http://purl.obolibrary.org/obo/UO_0000083', term_source='UO'

FACTORS_2_VALUE = 0
FACTORS_2_VALUE_ALT = 4
FACTORS_2_VALUE_2 = 24
FACTORS_2_VALUE_3 = 48
FACTORS_2_UNIT = OntologyAnnotation(term='hr')

TEST_EPOCH_0_NAME = 'test epoch 0'
TEST_EPOCH_1_NAME = 'test epoch 1'
TEST_EPOCH_2_NAME = 'test epoch 2'

TEST_STUDY_ARM_NAME_00 = 'test arm'
TEST_STUDY_ARM_NAME_01 = 'another arm'
TEST_STUDY_ARM_NAME_02 = 'yet another arm'

TEST_STUDY_DESIGN_NAME = 'test study design'

TEST_EPOCH_0_RANK = 0

SCREEN_DURATION_VALUE = 100
FOLLOW_UP_DURATION_VALUE = 5*366
WASHOUT_DURATION_VALUE = 30
DURATION_UNIT = OntologyAnnotation(term='day') #, term_accession='http://purl.obolibrary.org/obo/UO_0000033', term_source='UO'



### 3. Declaration of ISA Sample / Biomaterial templates for liver, blood and heart

In [33]:
sample_list = [
        {
            'node_type': SAMPLE,
            'characteristics_category': OntologyAnnotation(term='organism part'),
            'characteristics_value': OntologyAnnotation(term='whole organism'), # term_accession='http://purl.obolibrary.org/obo/OBI_0100026', term_source='OBI'
            'size': 1,
            'technical_replicates': None,
            'is_input_to_next_protocols': True
        }
]

### 4. Declaration of ISA Assay templates as Python `OrderedDict`

In [34]:
# A Mass Spectrometry based metabolite profiling assay

ms_assay_dict = OrderedDict([
    ('measurement_type', OntologyAnnotation(term='metabolite profiling')),
    ('technology_type', OntologyAnnotation(term='mass spectrometry')),
    ('extraction', {}),
    ('extract', [
        {
            'node_type': EXTRACT,
            'characteristics_category': OntologyAnnotation(term='extract type'),
            'characteristics_value': OntologyAnnotation(term='polar fraction'),
            'size': 1,
            'is_input_to_next_protocols': True
        },
        {
            'node_type': EXTRACT,
            'characteristics_category': OntologyAnnotation(term='extract type'),
            'characteristics_value': OntologyAnnotation(term='lipids'),
            'size': 1,
            'is_input_to_next_protocols': True
        }
    ]),
    ('derivatization', {
        '#replicates': 1,
        OntologyAnnotation(term='derivatization'): ['sylalation'],
        OntologyAnnotation(term='derivatization'): ['bis(trimethylsilyl)acetamide'],
    }),
    ('labeled extract', [
        {
            'node_type': LABELED_EXTRACT,
            'characteristics_category': OntologyAnnotation(term='labeled extract type'),
            'characteristics_value': '',
            'size': 1,
            'is_input_to_next_protocols': True
        }
    ]),
    ('mass spectrometry', {
        '#replicates': 2,
        OntologyAnnotation(term='instrument'): ['Agilent QTOF'],
        OntologyAnnotation(term='injection_mode'): ['GC'],
        OntologyAnnotation(term='acquisition_mode'): ['positive mode','negative mode']
    }),
    ('raw spectral data file', [
        {
            'node_type': DATA_FILE,
            'size': 1,
            'is_input_to_next_protocols': False
        }
    ])
])


# A high-throughput phenotyping imaging based phenotyping assay

phti_assay_dict = OrderedDict([
    ('measurement_type', OntologyAnnotation(term='phenotyping')),
    ('technology_type', OntologyAnnotation(term='high-throughput imaging')),
            ('extraction', {}),
            ('extract', [
                {
                    'node_type': EXTRACT,
                    'characteristics_category': OntologyAnnotation(term='extract type'),
                    'characteristics_value': OntologyAnnotation(term='supernatant'),
                    'size': 1,
                    'technical_replicates': None,
                    'is_input_to_next_protocols': True
                }
            ]),
            ('phenotyping by high throughput imaging', {
                'OntologyAnnotation(term=instrument)': ['lemnatech gigant'],
                'OntologyAnnotation(term=acquisition_mode)': ['UV light','near-IR light','far-IR light','visible light'],
                'OntologyAnnotation(term=camera position)': ['top','120 degree','240 degree','360 degree'],
                'OntologyAnnotation(term=imaging daily schedule)': ['06.00','19.00']
            }),
            ('raw_spectral_data_file', [
                {
                    'node_type': DATA_FILE,
                    'size': 1,
                    'technical_replicates': 2,
                    'is_input_to_next_protocols': False
                }
            ])
        ])

# A RNA-Seq based transcription profiling assay

rnaseq_assay_dict = OrderedDict([
    ('measurement_type', OntologyAnnotation(term='transcription profiling')), # term_source="OBI", term_accession="http://purl.obolibrary.org/obo/OBI_000066"
    ('technology_type', OntologyAnnotation(term='nucleotide sequencing')), #, term_source="OBI", term_accession="http://purl.obolibrary.org/obo/OBI_0000234"
            ('extraction', {}),
            ('extract', [
                {
                    'node_type': EXTRACT,
                    'characteristics_category': OntologyAnnotation(term='extract type'),
                    'characteristics_value': OntologyAnnotation(term='mRNA'),
                    'size': 1,
                    'technical_replicates': None,
                    'is_input_to_next_protocols': True
                }
            ]),
            ('library_preparation', {
                'OntologyAnnotation(term=library strategy)': ['RNA-SEQ'],
                'OntologyAnnotation(term=library layout)': ['PAIRED'],
                'OntologyAnnotation(term=size)': ['40'],
            }),
            ('nucleic acid sequencing', {
                'OntologyAnnotation(term=sequencing instrument)': ['DNBSEQ-T7']
            }),
            ('raw_data_file', [
                {
                    'node_type': DATA_FILE,
                    'size': 1,
                    'technical_replicates': 1,
                    'is_input_to_next_protocols': False
                }
            ])
        ])



### 5. Declaring Study Design key elements in terms of Treatments and Non-Treatment elements, Study Cell & Arms

In [35]:
first_treatment = Treatment(factor_values=(
    FactorValue(factor_name=BASE_FACTORS[0], value=FACTORS_0_VALUE),
    FactorValue(factor_name=BASE_FACTORS[1], value=FACTORS_1_VALUE, unit=FACTORS_1_UNIT),
    FactorValue(factor_name=BASE_FACTORS[2], value=FACTORS_2_VALUE, unit=FACTORS_2_UNIT)
))
second_treatment = Treatment(factor_values=(
    FactorValue(factor_name=BASE_FACTORS[0], value=FACTORS_0_VALUE_ALT),
    FactorValue(factor_name=BASE_FACTORS[1], value=FACTORS_1_VALUE, unit=FACTORS_1_UNIT),
    FactorValue(factor_name=BASE_FACTORS[2], value=FACTORS_2_VALUE, unit=FACTORS_2_UNIT)
))
third_treatment = Treatment(factor_values=(
    FactorValue(factor_name=BASE_FACTORS[0], value=FACTORS_0_VALUE_ALT),
    FactorValue(factor_name=BASE_FACTORS[1], value=FACTORS_1_VALUE, unit=FACTORS_1_UNIT),
    FactorValue(factor_name=BASE_FACTORS[2], value=FACTORS_2_VALUE_ALT, unit=FACTORS_2_UNIT)
))
fourth_treatment = Treatment(factor_values=(
    FactorValue(factor_name=BASE_FACTORS[0], value=FACTORS_0_VALUE_THIRD),
    FactorValue(factor_name=BASE_FACTORS[1], value=FACTORS_1_VALUE, unit=FACTORS_1_UNIT),
    FactorValue(factor_name=BASE_FACTORS[2], value=FACTORS_2_VALUE, unit=FACTORS_2_UNIT)
))

#screen = NonTreatment(element_type=SCREEN, duration_value=SCREEN_DURATION_VALUE, duration_unit=DURATION_UNIT)
#run_in = NonTreatment(element_type=RUN_IN, duration_value=WASHOUT_DURATION_VALUE, duration_unit=DURATION_UNIT)
#washout = NonTreatment(element_type=WASHOUT, duration_value=WASHOUT_DURATION_VALUE, duration_unit=DURATION_UNIT)
#follow_up = NonTreatment(element_type=FOLLOW_UP, duration_value=FOLLOW_UP_DURATION_VALUE, duration_unit=DURATION_UNIT)
#potential_concomitant_washout = NonTreatment(element_type=WASHOUT, duration_value=FACTORS_2_VALUE,
#                                                          duration_unit=FACTORS_2_UNIT)
#cell_screen = StudyCell(SCREEN, elements=(screen,))
#cell_run_in = StudyCell(RUN_IN, elements=(run_in,))
#cell_other_run_in = StudyCell('OTHER RUN-IN', elements=(run_in,))
#cell_screen_and_run_in = StudyCell('SCREEN AND RUN-IN', elements=[screen, run_in])
#cell_concomitant_treatments = StudyCell('CONCOMITANT TREATMENTS',
#                                                     elements=([{second_treatment, fourth_treatment}]))
#cell_washout_00 = StudyCell(WASHOUT, elements=(washout,))
#cell_washout_01 = StudyCell('ANOTHER WASHOUT', elements=(washout,))
cell_single_treatment_00 = StudyCell('SINGLE TREATMENT FIRST', elements=[first_treatment])
cell_single_treatment_01 = StudyCell('SINGLE TREATMENT SECOND', elements=[second_treatment])
cell_single_treatment_02 = StudyCell('SINGLE TREATMENT THIRD', elements=[third_treatment])
#cell_multi_elements = StudyCell('MULTI ELEMENTS',
#                                             elements=[{first_treatment, second_treatment,
#                                                        fourth_treatment}, washout, second_treatment])
#cell_multi_elements_padded = StudyCell('MULTI ELEMENTS PADDED',
#                                                    elements=[first_treatment, washout, {
#                                                        second_treatment,
#                                                        fourth_treatment
#                                                    }, washout, third_treatment, washout])
#cell_follow_up = StudyCell(FOLLOW_UP, elements=(follow_up,))
#cell_follow_up_01 = StudyCell('ANOTHER FOLLOW_UP', elements=(follow_up,))
#qc = QualityControl()

ms_sample_assay_plan = SampleAndAssayPlan.from_sample_and_assay_plan_dict("ms_sap", sample_list, ms_assay_dict)
rnaseq_sample_assay_plan = SampleAndAssayPlan.from_sample_and_assay_plan_dict("rnaseq_sap", sample_list, rnaseq_assay_dict)
phti_sample_assay_plan = SampleAndAssayPlan.from_sample_and_assay_plan_dict("phti_sap", sample_list, phti_assay_dict)



first_arm = StudyArm(name=TEST_STUDY_ARM_NAME_00, group_size=3, arm_map=OrderedDict([
  #  (cell_screen, None), (cell_run_in, None),
    (cell_single_treatment_00, ms_sample_assay_plan),
     (cell_single_treatment_00, rnaseq_sample_assay_plan)
   # (cell_follow_up, ms_sample_assay_plan)
]))
second_arm = StudyArm(name=TEST_STUDY_ARM_NAME_01, group_size=3, arm_map=OrderedDict([
   # (cell_screen, None), (cell_run_in, None),
    (cell_single_treatment_01, ms_sample_assay_plan),
    (cell_single_treatment_01, rnaseq_sample_assay_plan)
    #(cell_multi_elements, ms_sample_assay_plan),
   # (cell_follow_up, ms_sample_assay_plan)
]))
third_arm = StudyArm(name=TEST_STUDY_ARM_NAME_02, group_size=3, arm_map=OrderedDict([
   # (cell_screen, None), (cell_run_in, None),
    (cell_single_treatment_02,rnaseq_sample_assay_plan),
    (cell_single_treatment_02, ms_sample_assay_plan)

   # (cell_multi_elements_padded, ms_sample_assay_plan),
   # (cell_follow_up, ms_sample_assay_plan)
]))
#third_arm_no_run_in = StudyArm(name=TEST_STUDY_ARM_NAME_02, group_size=3, arm_map=OrderedDict([
   # (cell_screen, None),
 #   (cell_multi_elements_padded, ms_sample_assay_plan),
   # (cell_follow_up, ms_sample_assay_plan)
#]))
#arm_same_name_as_third = StudyArm(name=TEST_STUDY_ARM_NAME_02, group_size=3, arm_map=OrderedDict([
    #(cell_screen, None), (cell_run_in, None),
#    (cell_single_treatment_01, ms_sample_assay_plan),
    #(cell_follow_up, ms_sample_assay_plan)
#]))


# Sample QC (for mass spectroscopy and other)
#pre_run_sample_type = ProductNode(
#    id_='pre/00', node_type=SAMPLE, name='water', size=2, characteristics=(
#        Characteristic(category='dilution', value=10, unit='mg/L'),
#    )
#)
#post_run_sample_type = ProductNode(
#    id_='post/00', node_type=SAMPLE, name='ethanol', size=2, characteristics=(
#        Characteristic(category='dilution', value=1000, unit='mg/L'),
#        Characteristic(category='dilution', value=100, unit='mg/L'),
#        Characteristic(category='dilution', value=10, unit='mg/L'),
#        Characteristic(category='dilution', value=1, unit='mg/L'),
#        Characteristic(category='dilution', value=0.1, unit='mg/L')
#    ))
#dummy_sample_type = ProductNode(id_='dummy/01', node_type=SAMPLE, name='dummy')
#more_dummy_sample_type = ProductNode(id_='dummy/02', node_type=SAMPLE, name='more dummy')
#interspersed_sample_types = [(dummy_sample_type, 20)]

#qc = QualityControl(
#    interspersed_sample_type=interspersed_sample_types,
#    pre_run_sample_type=pre_run_sample_type,
#    post_run_sample_type=post_run_sample_type
#)

In [36]:
# single_arm = StudyArm(name=TEST_STUDY_ARM_NAME_00, group_size=10, arm_map=OrderedDict([
#    # (cell_screen, ms_sample_assay_plan), (cell_run_in,ms_sample_assay_plan),
#     (cell_single_treatment_00, rnaseq_sample_assay_plan),
#    # (cell_follow_up, rnaseq_sample_assay_plan)
# ]))
study_design = StudyDesign(study_arms=(first_arm, second_arm, third_arm))


### 6. Generated ISA Study from ISA Study Design Object

In [37]:
study = study_design.generate_isa_study()

In [38]:
study

isatools.model.Study(filename='s_study_01.txt', identifier='s_01', title='Study Design', description='None', submission_date='', public_release_date='', contacts=[], design_descriptors=[], publications=[], factors=[isatools.model.StudyFactor(name='DURATION', factor_type=isatools.model.OntologyAnnotation(term='time', term_source=None, term_accession='', comments=[]), comments=[]), isatools.model.StudyFactor(name='INTENSITY', factor_type=isatools.model.OntologyAnnotation(term='intensity', term_source=None, term_accession='', comments=[]), comments=[]), isatools.model.StudyFactor(name='AGENT', factor_type=isatools.model.OntologyAnnotation(term='perturbation agent', term_source=None, term_accession='', comments=[]), comments=[]), isatools.model.StudyFactor(name='Sequence Order', factor_type=isatools.model.OntologyAnnotation(term='sequence order', term_source=None, term_accession='', comments=[]), comments=[])], protocols=[isatools.model.Protocol(name='sample collection', protocol_type=isat

In [39]:
treatment_assay = next(iter(study.assays))

In [40]:
treatment_assay.graph

<networkx.classes.digraph.DiGraph at 0x12a2bb1c0>

In [41]:
[(process.name, getattr(process.prev_process, 'name', None), getattr(process.next_process, 'name', None)) for process in treatment_assay.process_sequence]

[('AT0-S1-assay0---extraction-Acquisition-R1',
  None,
  'AT0-S1-assay0---library_preparation-Acquisition-R1'),
 ('AT0-S1-assay0---library_preparation-Acquisition-R1',
  'AT0-S1-assay0---extraction-Acquisition-R1',
  'AT0-S1-assay0---nucleic-acid-sequencing-Acquisition-R1'),
 ('AT0-S1-assay0---nucleic-acid-sequencing-Acquisition-R1',
  'AT0-S1-assay0---library_preparation-Acquisition-R1',
  None),
 ('AT0-S2-assay0---extraction-Acquisition-R1',
  None,
  'AT0-S2-assay0---library_preparation-Acquisition-R1'),
 ('AT0-S2-assay0---library_preparation-Acquisition-R1',
  'AT0-S2-assay0---extraction-Acquisition-R1',
  'AT0-S2-assay0---nucleic-acid-sequencing-Acquisition-R1'),
 ('AT0-S2-assay0---nucleic-acid-sequencing-Acquisition-R1',
  'AT0-S2-assay0---library_preparation-Acquisition-R1',
  None),
 ('AT0-S3-assay0---extraction-Acquisition-R1',
  None,
  'AT0-S3-assay0---library_preparation-Acquisition-R1'),
 ('AT0-S3-assay0---library_preparation-Acquisition-R1',
  'AT0-S3-assay0---extraction-

In [42]:
a_graph = treatment_assay.graph

In [43]:
len(a_graph.nodes)

30

In [44]:
isa_investigation.studies=[study]

In [45]:
isa_tables = dumpdf(isa_investigation)

2021-12-04 10:35:40,294 [INFO]: isatab.py(_all_end_to_end_paths:1131) >> [120, 121, 122]
2021-12-04 10:35:40,296 [INFO]: isatab.py(_longest_path_and_attrs:1091) >> [[120, 124, 123], [120, 126, 125], [120, 128, 127], [121, 130, 129], [121, 132, 131], [121, 134, 133], [122, 136, 135], [122, 138, 137], [122, 140, 139]]
2021-12-04 10:35:40,386 [INFO]: isatab.py(_all_end_to_end_paths:1131) >> [123, 125, 127, 129, 131, 133]
2021-12-04 10:35:40,387 [INFO]: isatab.py(_longest_path_and_attrs:1091) >> [[123, 141, 142, 143, 144], [125, 146, 147, 148, 149], [127, 151, 152, 153, 154], [129, 156, 157, 158, 159], [131, 161, 162, 163, 164], [133, 166, 167, 168, 169]]
2021-12-04 10:35:40,388 [INFO]: isatab.py(_longest_path_and_attrs:1091) >> [[123, 141, 142, 143, 144], [125, 146, 147, 148, 149], [127, 151, 152, 153, 154], [129, 156, 157, 158, 159], [131, 161, 162, 163, 164], [133, 166, 167, 168, 169]]
2021-12-04 10:35:40,442 [INFO]: isatab.py(_all_end_to_end_paths:1131) >> [135, 137, 139]
2021-12-04 10

In [46]:
#[type(x) for x in study.assays[0].graph.nodes()]

In [47]:
#[(getattr(el, 'name', None), type(el))for el in treatment_assay.graph.nodes()]

In [48]:
from isatools.model import _build_assay_graph

In [49]:
gph = _build_assay_graph(treatment_assay.process_sequence)

In [50]:
[key for key in isa_tables.keys()]

['s_study_01.txt',
 'a_AT0_transcription-profiling_nucleotide-sequencing.txt',
 'a_AT0_metabolite-profiling_mass-spectrometry.txt']

In [51]:
isa_tables['s_study_01.txt']

Unnamed: 0,Source Name,Characteristics[Study Subject],Term Source REF,Term Accession Number,Protocol REF,Parameter Value[Sampling order],Parameter Value[Study cell],Date,Performer,Sample Name,Characteristics[organism part],Comment[study step with treatment],Factor Value[Sequence Order],Factor Value[INTENSITY],Unit,Factor Value[AGENT],Factor Value[DURATION],Unit.1
0,GRP1_SBJ1,Human,NCIT,http://purl.obolibrary.org/obo/NCIT_C14225,sample collection,1,SINGLE TREATMENT SECOND,2021-12-04,Unknown,GRP1_SBJ2_SINGLE-TREATMENT-SECOND_SMP-whole-or...,whole organism,YES,0,5,kg/m^3,ethoprophos,100.0,s
1,GRP1_SBJ1,Human,NCIT,http://purl.obolibrary.org/obo/NCIT_C14225,sample collection,2,SINGLE TREATMENT SECOND,2021-12-04,Unknown,GRP1_SBJ3_SINGLE-TREATMENT-SECOND_SMP-whole-or...,whole organism,YES,0,5,kg/m^3,ethoprophos,100.0,s
2,GRP1_SBJ1,Human,NCIT,http://purl.obolibrary.org/obo/NCIT_C14225,sample collection,3,SINGLE TREATMENT SECOND,2021-12-04,Unknown,GRP1_SBJ1_SINGLE-TREATMENT-SECOND_SMP-whole-or...,whole organism,YES,0,5,kg/m^3,ethoprophos,100.0,s
3,GRP2_SBJ3,Human,NCIT,http://purl.obolibrary.org/obo/NCIT_C14225,sample collection,4,SINGLE TREATMENT FIRST,2021-12-04,Unknown,GRP2_SBJ2_SINGLE-TREATMENT-FIRST_SMP-whole-org...,whole organism,YES,0,5,kg/m^3,cadmium chloride,100.0,s
4,GRP2_SBJ3,Human,NCIT,http://purl.obolibrary.org/obo/NCIT_C14225,sample collection,5,SINGLE TREATMENT FIRST,2021-12-04,Unknown,GRP2_SBJ1_SINGLE-TREATMENT-FIRST_SMP-whole-org...,whole organism,YES,0,5,kg/m^3,cadmium chloride,100.0,s
5,GRP2_SBJ3,Human,NCIT,http://purl.obolibrary.org/obo/NCIT_C14225,sample collection,6,SINGLE TREATMENT FIRST,2021-12-04,Unknown,GRP2_SBJ3_SINGLE-TREATMENT-FIRST_SMP-whole-org...,whole organism,YES,0,5,kg/m^3,cadmium chloride,100.0,s
6,GRP3_SBJ2,Human,NCIT,http://purl.obolibrary.org/obo/NCIT_C14225,sample collection,7,SINGLE TREATMENT THIRD,2021-12-04,Unknown,GRP3_SBJ1_SINGLE-TREATMENT-THIRD_SMP-whole-org...,whole organism,YES,0,5,kg/m^3,ethoprophos,50.0,s
7,GRP3_SBJ2,Human,NCIT,http://purl.obolibrary.org/obo/NCIT_C14225,sample collection,8,SINGLE TREATMENT THIRD,2021-12-04,Unknown,GRP3_SBJ3_SINGLE-TREATMENT-THIRD_SMP-whole-org...,whole organism,YES,0,5,kg/m^3,ethoprophos,50.0,s
8,GRP3_SBJ2,Human,NCIT,http://purl.obolibrary.org/obo/NCIT_C14225,sample collection,9,SINGLE TREATMENT THIRD,2021-12-04,Unknown,GRP3_SBJ2_SINGLE-TREATMENT-THIRD_SMP-whole-org...,whole organism,YES,0,5,kg/m^3,ethoprophos,50.0,s


In [53]:
isa_tables['a_AT0_transcription-profiling_nucleotide-sequencing.txt']

Unnamed: 0,Sample Name,Comment[study step with treatment],Protocol REF,Performer,Extract Name,Characteristics[extract type],Protocol REF.1,Parameter Value[OntologyAnnotation(term=library strategy)],Parameter Value[OntologyAnnotation(term=library layout)],Parameter Value[OntologyAnnotation(term=size)],Performer.1,Protocol REF.2,Parameter Value[OntologyAnnotation(term=sequencing instrument)],Performer.2,Raw Data File
0,GRP1_SBJ1_SINGLE-TREATMENT-SECOND_SMP-whole-or...,YES,assay0 - extraction,Unknown,AT0-S3-Extract-R1,mRNA,assay0 - library_preparation,RNA-SEQ,PAIRED,40,Unknown,assay0 - nucleic acid sequencing,DNBSEQ-T7,Unknown,AT0-S3-raw_data_file-R1-
1,GRP1_SBJ2_SINGLE-TREATMENT-SECOND_SMP-whole-or...,YES,assay0 - extraction,Unknown,AT0-S1-Extract-R1,mRNA,assay0 - library_preparation,RNA-SEQ,PAIRED,40,Unknown,assay0 - nucleic acid sequencing,DNBSEQ-T7,Unknown,AT0-S1-raw_data_file-R1-
2,GRP1_SBJ3_SINGLE-TREATMENT-SECOND_SMP-whole-or...,YES,assay0 - extraction,Unknown,AT0-S2-Extract-R1,mRNA,assay0 - library_preparation,RNA-SEQ,PAIRED,40,Unknown,assay0 - nucleic acid sequencing,DNBSEQ-T7,Unknown,AT0-S2-raw_data_file-R1-
3,GRP2_SBJ1_SINGLE-TREATMENT-FIRST_SMP-whole-org...,YES,assay0 - extraction,Unknown,AT0-S5-Extract-R1,mRNA,assay0 - library_preparation,RNA-SEQ,PAIRED,40,Unknown,assay0 - nucleic acid sequencing,DNBSEQ-T7,Unknown,AT0-S5-raw_data_file-R1-
4,GRP2_SBJ2_SINGLE-TREATMENT-FIRST_SMP-whole-org...,YES,assay0 - extraction,Unknown,AT0-S4-Extract-R1,mRNA,assay0 - library_preparation,RNA-SEQ,PAIRED,40,Unknown,assay0 - nucleic acid sequencing,DNBSEQ-T7,Unknown,AT0-S4-raw_data_file-R1-
5,GRP2_SBJ3_SINGLE-TREATMENT-FIRST_SMP-whole-org...,YES,assay0 - extraction,Unknown,AT0-S6-Extract-R1,mRNA,assay0 - library_preparation,RNA-SEQ,PAIRED,40,Unknown,assay0 - nucleic acid sequencing,DNBSEQ-T7,Unknown,AT0-S6-raw_data_file-R1-


In [54]:
isa_tables['a_AT0_metabolite-profiling_mass-spectrometry.txt']

Unnamed: 0,Sample Name,Comment[study step with treatment],Protocol REF,Performer,Extract Name,Characteristics[extract type],Protocol REF.1,Parameter Value[derivatization],Performer.1,Labeled Extract Name,Protocol REF.2,Parameter Value[instrument],Parameter Value[injection_mode],Parameter Value[acquisition_mode],Performer.2,Raw Spectral Data File
0,GRP3_SBJ1_SINGLE-TREATMENT-THIRD_SMP-whole-org...,YES,assay0 - extraction,Unknown,AT0-S1-Extract-R2,polar fraction,assay0 - derivatization,bis(trimethylsilyl)acetamide,Unknown,AT0-S1-LE-R2,assay0 - mass spectrometry,Agilent QTOF,GC,positive mode,Unknown,AT0-S1-raw-spectral-data-file-R8
1,GRP3_SBJ1_SINGLE-TREATMENT-THIRD_SMP-whole-org...,YES,assay0 - extraction,Unknown,AT0-S1-Extract-R1,lipids,assay0 - derivatization,bis(trimethylsilyl)acetamide,Unknown,AT0-S1-LE-R1,assay0 - mass spectrometry,Agilent QTOF,GC,positive mode,Unknown,AT0-S1-raw-spectral-data-file-R1
2,GRP3_SBJ1_SINGLE-TREATMENT-THIRD_SMP-whole-org...,YES,assay0 - extraction,Unknown,AT0-S1-Extract-R1,lipids,assay0 - derivatization,bis(trimethylsilyl)acetamide,Unknown,AT0-S1-LE-R1,assay0 - mass spectrometry,Agilent QTOF,GC,positive mode,Unknown,AT0-S1-raw-spectral-data-file-R2
3,GRP3_SBJ1_SINGLE-TREATMENT-THIRD_SMP-whole-org...,YES,assay0 - extraction,Unknown,AT0-S1-Extract-R1,lipids,assay0 - derivatization,bis(trimethylsilyl)acetamide,Unknown,AT0-S1-LE-R1,assay0 - mass spectrometry,Agilent QTOF,GC,negative mode,Unknown,AT0-S1-raw-spectral-data-file-R3
4,GRP3_SBJ1_SINGLE-TREATMENT-THIRD_SMP-whole-org...,YES,assay0 - extraction,Unknown,AT0-S1-Extract-R1,lipids,assay0 - derivatization,bis(trimethylsilyl)acetamide,Unknown,AT0-S1-LE-R1,assay0 - mass spectrometry,Agilent QTOF,GC,negative mode,Unknown,AT0-S1-raw-spectral-data-file-R4
5,GRP3_SBJ1_SINGLE-TREATMENT-THIRD_SMP-whole-org...,YES,assay0 - extraction,Unknown,AT0-S1-Extract-R2,polar fraction,assay0 - derivatization,bis(trimethylsilyl)acetamide,Unknown,AT0-S1-LE-R2,assay0 - mass spectrometry,Agilent QTOF,GC,negative mode,Unknown,AT0-S1-raw-spectral-data-file-R5
6,GRP3_SBJ1_SINGLE-TREATMENT-THIRD_SMP-whole-org...,YES,assay0 - extraction,Unknown,AT0-S1-Extract-R2,polar fraction,assay0 - derivatization,bis(trimethylsilyl)acetamide,Unknown,AT0-S1-LE-R2,assay0 - mass spectrometry,Agilent QTOF,GC,negative mode,Unknown,AT0-S1-raw-spectral-data-file-R6
7,GRP3_SBJ1_SINGLE-TREATMENT-THIRD_SMP-whole-org...,YES,assay0 - extraction,Unknown,AT0-S1-Extract-R2,polar fraction,assay0 - derivatization,bis(trimethylsilyl)acetamide,Unknown,AT0-S1-LE-R2,assay0 - mass spectrometry,Agilent QTOF,GC,positive mode,Unknown,AT0-S1-raw-spectral-data-file-R7
8,GRP3_SBJ2_SINGLE-TREATMENT-THIRD_SMP-whole-org...,YES,assay0 - extraction,Unknown,AT0-S3-Extract-R2,polar fraction,assay0 - derivatization,bis(trimethylsilyl)acetamide,Unknown,AT0-S3-LE-R2,assay0 - mass spectrometry,Agilent QTOF,GC,negative mode,Unknown,AT0-S3-raw-spectral-data-file-R6
9,GRP3_SBJ2_SINGLE-TREATMENT-THIRD_SMP-whole-org...,YES,assay0 - extraction,Unknown,AT0-S3-Extract-R2,polar fraction,assay0 - derivatization,bis(trimethylsilyl)acetamide,Unknown,AT0-S3-LE-R2,assay0 - mass spectrometry,Agilent QTOF,GC,negative mode,Unknown,AT0-S3-raw-spectral-data-file-R5


In [55]:
final_dir = os.path.abspath(os.path.join('notebook-output', 'sd-test'))

### 7. Serialization as ISA-JSON and ISA-Tab

In [56]:
isa_j = json.dumps(isa_investigation, cls=ISAJSONEncoder, sort_keys=True, indent=4, separators=(',', ': '))
open(os.path.join(final_dir,"isa_as_json_from_dumps2.json"),"w").write(isa_j) # this call write the string 'isa_j' to the file called 'isa_as_json_from_dumps.json'

199796

In [57]:
isatab.dump(isa_obj=isa_investigation, output_path=final_dir)

2021-12-04 10:36:12,087 [INFO]: isatab.py(_all_end_to_end_paths:1131) >> [120, 121, 122]
2021-12-04 10:36:12,088 [INFO]: isatab.py(_longest_path_and_attrs:1091) >> [[120, 124, 123], [120, 126, 125], [120, 128, 127], [121, 130, 129], [121, 132, 131], [121, 134, 133], [122, 136, 135], [122, 138, 137], [122, 140, 139]]
2021-12-04 10:36:12,150 [INFO]: isatab.py(_all_end_to_end_paths:1131) >> [123, 125, 127, 129, 131, 133]
2021-12-04 10:36:12,151 [INFO]: isatab.py(_longest_path_and_attrs:1091) >> [[123, 141, 142, 143, 144], [125, 146, 147, 148, 149], [127, 151, 152, 153, 154], [129, 156, 157, 158, 159], [131, 161, 162, 163, 164], [133, 166, 167, 168, 169]]
2021-12-04 10:36:12,152 [INFO]: isatab.py(_longest_path_and_attrs:1091) >> [[123, 141, 142, 143, 144], [125, 146, 147, 148, 149], [127, 151, 152, 153, 154], [129, 156, 157, 158, 159], [131, 161, 162, 163, 164], [133, 166, 167, 168, 169]]
2021-12-04 10:36:12,212 [INFO]: isatab.py(_all_end_to_end_paths:1131) >> [135, 137, 139]
2021-12-04 10

isatools.model.Investigation(identifier='', filename='', title='', submission_date='', public_release_date='', ontology_source_references=[isatools.model.OntologySource(name='CHEBI', file='', version='', description='Chemical Entities of Biological Interest', comments=[]), isatools.model.OntologySource(name='CHMO', file='', version='', description='Chemical Methods Ontology', comments=[]), isatools.model.OntologySource(name='MSIO', file='', version='', description='Metabolite Standards Initiative Ontology', comments=[]), isatools.model.OntologySource(name='NCBITAXON', file='', version='', description='NCBI organismal classification', comments=[]), isatools.model.OntologySource(name='NCIT', file='', version='', description='NCI Thesaurus OBO Edition', comments=[]), isatools.model.OntologySource(name='OBI', file='', version='', description='Ontology for Biomedical Investigations', comments=[]), isatools.model.OntologySource(name='UO', file='', version='', description='UO - the Unit Ontol

### 8. Performing syntactic validation by invoking ISA Validator

In [58]:
with open(os.path.join(final_dir,'i_investigation.txt')) as isa:
    validation_report=isatab.validate(isa)

2021-12-04 10:36:13,541 [INFO]: isatab.py(validate:4216) >> Loading... /Users/philippe/Documents/git/isa-api2/isa-api/isa-cookbook/content/notebooks/notebook-output/sd-test/i_investigation.txt
2021-12-04 10:36:13,687 [INFO]: isatab.py(validate:4218) >> Running prechecks...
2021-12-04 10:36:14,248 [INFO]: isatab.py(validate:4239) >> Finished prechecks...
2021-12-04 10:36:14,249 [INFO]: isatab.py(validate:4240) >> Loading configurations found in /Users/philippe/.pyenv/versions/3.9.0/envs/isa-api-py39/src/isatools/isatools/resources/config/xml
2021-12-04 10:36:14,313 [INFO]: isatab.py(validate:4245) >> Using configurations found in /Users/philippe/.pyenv/versions/3.9.0/envs/isa-api-py39/src/isatools/isatools/resources/config/xml
2021-12-04 10:36:14,314 [INFO]: isatab.py(validate:4247) >> Checking investigation file against configuration...
2021-12-04 10:36:14,316 [INFO]: isatab.py(validate:4250) >> Finished checking investigation file
2021-12-04 10:36:14,317 [INFO]: isatab.py(validate:426

2021-12-04 10:36:15,176 [INFO]: utils.py(detect_isatab_process_pooling:85) >> Checking a_AT0_transcription-profiling_nucleotide-sequencing.txt
2021-12-04 10:36:15,178 [INFO]: utils.py(detect_isatab_process_pooling:85) >> Checking a_AT0_metabolite-profiling_mass-spectrometry.txt
2021-12-04 10:36:15,179 [INFO]: utils.py(detect_graph_process_pooling:57) >> Possible process pooling detected on:  assay0 - derivatization
2021-12-04 10:36:15,180 [INFO]: utils.py(detect_graph_process_pooling:57) >> Possible process pooling detected on:  assay0 - derivatization
2021-12-04 10:36:15,181 [INFO]: utils.py(detect_graph_process_pooling:57) >> Possible process pooling detected on:  assay0 - derivatization
2021-12-04 10:36:15,182 [INFO]: isatab.py(validate:4454) >> Finished validation...


In [59]:
validation_report["errors"]

[]

## Conclusion:

With this notebook, we have shown how to use study design information to generate a populated instance of ISA Study object and write it to file.



## About this notebook

- authors: philippe.rocca-serra@oerc.ox.ac.uk, massimiliano.izzo@oerc.ox.ac.uk
- license: CC-BY 4.0
- support: isatools@googlegroups.com
- issue tracker: https://github.com/ISA-tools/isa-api/issues