# Create ISA objects for EATRIS-Plus multi-omics data set of Czech population cohort

See example: https://isatools.readthedocs.io/en/latest/example-createSimpleISAtab.html

## Define ontology references

Onotologies in (FAIR genomes metadata schema)[https://github.com/fairgenomes/fairgenomes-semantic-model] include AFR, AFRL, DC (Dublin Core), DUO (Data Use Ontology), EDAM, EFO, FG, FIX, GAZ, GENEPIO, GO, GSSO, HANCESTRO, HGNC, HL7, HP, IAO (Information Artifact Ontology), ICO (Informed Consent Ontology), LOINC, NCIT, OBI, OMIABIS, Orphanet, PATO, RO (Relation Ontology), SIO (Semanticscience Integrated Ontology), SNOMEDCT, SWO, UATC.

We reuse these as much as possible.

In [2]:
from isatools.model import *
ontologies = {
    "BAO": OntologySource(
        name = "BAO - BioAssay Ontology", 
        file = "http://www.bioassayontology.org/bao/bao_complete.owl",
        description = "The BioAssay Ontology (BAO) describes biological screening assays and their results including high-throughput screening (HTS) data for the purpose of categorizing assays and data analysis. BAO is an extensible, knowledge-based, highly expressive (currently SHOIQ(D)) description of biological assays making use of descriptive logic based features of the Web Ontology Language (OWL). BAO currently has over 700 classes and also makes use of several other ontologies. It describes several concepts related to biological screening, including Perturbagen, Format, Meta Target, Design, Detection Technology, and Endpoint. Perturbagens are perturbing agents that are screened in an assay; they are mostly small molecules. Assay Meta Target describes what is known about the biological system and / or its components interrogated in the assay (and influenced by the Perturbagen). Meta target can be directly described as a molecular entity (e.g. a purified protein or a protein complex), or indirectly by a biological process or event (e.g. phosphorylation). Format describes the biological or chemical features common to each test condition in the assay and includes biochemical, cell-based, organism-based, and variations thereof. The assay Design describes the assay methodology and implementation of how the perturbation of the biological system is translated into a detectable signal. Detection Technology relates to the physical method and technical details to detect and record a signal. Endpoints are the final HTS results as they are usually published (such as IC50, percent inhibition, etc). BAO has been designed to accommodate multiplexed assays. All main BAO components include multiple levels of sub-categories and specification classes, which are linked via object property relationships forming an expressive knowledge-based representation."), 
    "CHEBI": OntologySource(
        name = "CHEBI - Chemical Entities of Biological Interest", 
        file = "http://purl.obolibrary.org/obo/chebi.owl",
        description = "A structured classification of molecular entities of biological interest focusing on 'small' chemical compounds."), 
    "CHMO": OntologySource(
        name = "CHMO - Chemical Methods Ontology", 
        file = "http://purl.obolibrary.org/obo/chmo.owl",
        description = "CHMO, the chemical methods ontology, describes methods used to collect data in chemical experiments, such as mass spectrometry and electron microscopy prepare and separate material for further analysis, such as sample ionisation, chromatography, and electrophoresis synthesise materials, such as epitaxy and continuous vapour deposition It also describes the instruments used in these experiments, such as mass spectrometers and chromatography columns. It is intended to be complementary to the Ontology for Biomedical Investigations (OBI)."), 
    "CRO": OntologySource(
        # The Contributor Role Ontology (CRO) is an extension of the CASRAI Contributor Roles Taxonomy (CRediT) and replaces the former Contribution Ontology.
        name = "CRO - Contributor Role Ontology",
        file = "http://purl.obolibrary.org/obo/cro.owl",
        description = "A classification of the diverse roles performed in the work leading to a published research output in the sciences. Its purpose to provide transparency in contributions to scholarly published work, to enable improved systems of attribution, credit, and accountability."),
    "EDAM": OntologySource(
        name = "EDAM - EMBRACE Data and Methods", 
        file = "http://edamontology.org/EDAM.owl",
        description = "EDAM (EMBRACE Data and Methods) is an ontology of common bioinformatics operations, topics, types of data including identifiers, and formats. EDAM comprises common concepts (shared within the bioinformatics community) that apply to semantic annotation of resources."), 
    "EFO": OntologySource(
        name = "EFO - Experimental Factor Ontology", 
        file = "http://www.ebi.ac.uk/efo/efo.owl",
        description = "The Experimental Factor Ontology (EFO) provides a systematic description of many experimental variables available in EBI databases, and for external projects such as the NHGRI GWAS catalogue. It combines parts of several biological ontologies, such as anatomy, disease and chemical compounds. The scope of EFO is to support the annotation, analysis and visualization of data handled by many groups at the EBI and as the core ontology for OpenTargets.org"), 
    "GENEPIO": OntologySource(
        name = "GENEPIO - Genomic Epidemiology Ontology", 
        file = "http://purl.obolibrary.org/obo/genepio.owl",
        description = "The Genomic Epidemiology Ontology (GenEpiO) covers vocabulary necessary to identify, document and research foodborne pathogens and associated outbreaks."),
    "MI": OntologySource(
        name = "MI - Molecular Interactions Controlled Vocabulary", 
        file = "http://purl.obolibrary.org/obo/mi.owl",
        description = "A structured controlled vocabulary for the annotation of experiments concerned with protein-protein interactions."),
    "MMO": OntologySource(
        name = "MMO - Measurement method ontology", 
        file = "http://purl.obolibrary.org/obo/mmo.owl",
        description = "A representation of the variety of methods used to make clinical and phenotype measurements."),
    "MS": OntologySource(
        name = "Metabolomics Standards Initiative Ontology (MSIO)",
        file = "http://purl.obolibrary.org/obo/ms.owl",
        description = "A structured controlled vocabulary for the annotation of experiments concerned with proteomics mass spectrometry."),
    "MSIO": OntologySource(
        name = "MS - Mass spectrometry ontology",
        file = "http://purl.obolibrary.org/obo/msio.owl",
        description = "MSIO aims to provide a single point of entry to support semantic markup of experiments making use of NMR and MS techniques to identify, measure and quantify small molecules known as metabolites. MSIO covers metabolite profiling, targeted or undertargeted, tracer based applications. MSIO reuses a number of resources such as CHEBI, DUO, NMRCV, OBI, and STATO."),
    "NCBITAXON": OntologySource(
        name = "NCBI organismal classification", 
        file = "http://purl.obolibrary.org/obo/ncbitaxon.owl",
        description = "An ontology representation of the NCBI organismal taxonomy"),
    "NCIT": OntologySource(
        name = "NCI Thesaurus OBO Edition", 
        file = "http://purl.obolibrary.org/obo/ncit.owl",
        description = "The NCIt OBO Edition project aims to increase integration of the NCIt with OBO Library ontologies. NCIt is a reference terminology that includes broad coverage of the cancer domain, including cancer related diseases, findings and abnormalities. NCIt OBO Edition releases should be considered experimental."),
    "OBI": OntologySource(
        name = "OBI - Ontology for Biomedical Investigations", 
        file = "http://purl.obolibrary.org/obo/obi.owl",
        description = "An integrated ontology for the description of life-science and clinical investigations"),
    "OMIABIS": OntologySource(
        name = "Ontologized MIABIS", 
        file = "http://purl.obolibrary.org/obo/omiabis.owl",
        description = "An ontological version of MIABIS (Minimum Information About BIobank data Sharing)"),
    "ORNASEQ": OntologySource(
        name = "Ontology for RNA sequencing (ORNASEQ)", 
        file = "http://purl.obolibrary.org/obo/ornaseq.owl",
        description = "An application ontology designed to annotate next-generation sequencing experiments performed on RNA."),
    "PRIDE": OntologySource(
        name = "PRIDE Controlled Vocabulary",
        file = "http://purl.obolibrary.org/obo/pride_cv.obo",
        description = "The PRIDE PRoteomics IDEntifications (PRIDE) database is a centralized, standards compliant, public data repository for proteomics data, including protein and peptide identifications, post-translational modifications and supporting spectral evidence."),
    "STATO": OntologySource(
        name = "STATO: the statistical methods ontology",
        file = "http://purl.obolibrary.org/obo/stato.owl",
        description = "STATO is the statistical methods ontology. It contains concepts and properties related to statistical methods, probability distributions and other concepts related to statistical analysis, including relationships to study designs and plots."),
    "UBERON": OntologySource(
        name = "Uber-anatomy ontology",
        file = "http://purl.obolibrary.org/obo/uberon.owl",
        description = "Uberon is an integrated cross-species anatomy ontology representing a variety of entities classified according to traditional anatomical criteria such as structure, function and developmental lineage. The ontology includes comprehensive relationships to taxon-specific anatomical ontologies, allowing integration of functional, phenotype and expression data."),
    "tbd": OntologySource(
        name = "to be defined",
        file = "http://tbd.owl",
        description = "tbd")
}

## Create investigation

In [3]:
investigation_filename = "i_investigation.txt"
investigation = Investigation(
    filename = investigation_filename, 
    identifier = "", 
    title = "EATRIS-Plus - Flagship in Personalised Medicine",
    description = "EATRIS-Plus project aims to support the long-term sustainability of the European Research Infrastructure for Translational Medicine (EATRIS) by delivering innovative scientific tools to the research community, strengthening the EATRIS financial model, and reinforcing EATRIS' leadership in the European Research Area in the field of Personalised Medicine (PM).",
    submission_date = "",
    public_release_date = "",
    ontology_source_references = [o for o in ontologies.values()],
    publications = None,
    contacts = [
        Person(
            last_name = "Keidong", 
            first_name = "Eliis",
            #mid_initials = "",
            affiliation = "EATRIS",
            roles = [
                OntologyAnnotation(
                    term = "project management role",
                    term_source = ontologies["CRO"], 
                    term_accession ="http://purl.obolibrary.org/obo/CRO_0000065")])],
    studies = None,
    comments = None)

## Create study

In [4]:
study_filename = "s_study.txt"
cohort_study = Study(
    filename = study_filename, 
    identifier = "", 
    title = "Multi-omics data of a Czech population cohort",
    description = "Multi-omics data of a Czech population cohort", 
    submission_date = "", 
    public_release_date = "",
    contacts = [
        Person(
            last_name = "Hajduch", 
            first_name = "Marian",
            #mid_initials = "",
            affiliation = "Institute of Molecular and Translational Medicine (IMTM), Palacky University Olomouc")],
    design_descriptors = [
        OntologyAnnotation(
                term = "Multi-omics study",
                term_source = ontologies["PRIDE"],
                term_accession = "http://purl.obolibrary.org/obo/PRIDE_0000461"),
        OntologyAnnotation(
                term = "population based study design",
                term_source = ontologies["OMIABIS"],
                term_accession = "http://purl.obolibrary.org/obo/OMIABIS_0001022")], 
    factors = None, 
    protocols = None,
    assays = None,
    sources = None,
    samples = None,
    process_sequence = None,
    other_material = None,
    characteristic_categories = None,
    comments = None,
    units = None)
investigation.studies.append(cohort_study)

## Samples, protocols, assays

### Define protocols and protocol parameters (for all assays)

In [5]:
# protocol parameters 
protocol_params = {
    #"anatomical entity": ProtocolParameter(
    #    parameter_name = OntologyAnnotation(
    #        term = "anatomical entity",
    #        term_source = ontologies["UBERON"],
    #        term_accession = "http://purl.obolibrary.org/obo/UBERON_0001062")),
    "Post Extraction": ProtocolParameter(
        parameter_name = "Post Extraction"),
    "Derivatization": ProtocolParameter(
        parameter_name = OntologyAnnotation(
            term = "Derivatization",
            term_source = ontologies["MSIO"],
            term_accession = "http://purl.obolibrary.org/obo/MSIO_0000111")),
    "Chromatography Instrument": ProtocolParameter(
        parameter_name = OntologyAnnotation(
            term = "Chromatography Instrument",
            term_source = ontologies["OBI"],
            term_accession = "http://purl.obolibrary.org/obo/OBI_0000485")),
    "Column model": ProtocolParameter(
        parameter_name = "Column model"),
    "Column type": ProtocolParameter(
        parameter_name = "Column type"),
    "Scan polarity": ProtocolParameter(
        parameter_name = OntologyAnnotation(
            term = "scan polarity",
            term_source = ontologies["MS"],
            term_accession = "http://purl.obolibrary.org/obo/MS_1000465")),
        #values: 
        #negative scan http://purl.obolibrary.org/obo/MS_1000129
        #positive scan http://purl.obolibrary.org/obo/MS_1000130
    "Scan m/z range": ProtocolParameter(
        parameter_name = OntologyAnnotation(
            term = "Scan m/z range")),
            #term_source = ontologies[""],
            #term_accession = "")),
    "Instrument": ProtocolParameter(
        parameter_name = OntologyAnnotation(
            term = "Instrument",
            term_source = ontologies["MS"],
            term_accession = "http://purl.obolibrary.org/obo/MS_1000463")),
    "Ion source": ProtocolParameter(
        parameter_name = OntologyAnnotation(
            term = "Ion source",
            term_source = ontologies["CHMO"],
            term_accession = "http://purl.obolibrary.org/obo/CHMO_0000960")),
    "Mass analyzer": ProtocolParameter(
        parameter_name = OntologyAnnotation(
            term = "Mass analyzer",
            term_source = ontologies["MS"],
            term_accession = "http://purl.obolibrary.org/obo/MS_1000451")),
    "method reference": ProtocolParameter(
        parameter_name = OntologyAnnotation(
            term = "method reference",
            term_source = ontologies["MI"],
            term_accession = "http://purl.obolibrary.org/obo/MI_0357")),
    "technical replicate": ProtocolParameter(
        parameter_name = OntologyAnnotation(
            term = "technical replicate",
            term_source = ontologies["EFO"],
            term_accession = "http://www.ebi.ac.uk/efo/EFO_0002090"))
}

# protocols
protocols = {
    "sample_collection": Protocol( 
        name = "sample_collection_protocol",
        protocol_type = OntologyAnnotation(
            term = "sample collection", # has to be "sample collection" based on ISA specification, but no term available
            term_source = None,  
            term_accession = None)),
        #parameters = [
        #    protocol_params["anatomical entity"]
            # additional parameters could include, e.g. collection or storage procedure]
    "dna_extraction": Protocol(
        name = "DNA extraction",
        protocol_type = OntologyAnnotation(
            term = "DNA extraction",
            term_source = ontologies["OBI"], 
            term_accession = "http://purl.obolibrary.org/obo/OBI_0000257")),
    "rna_extraction": Protocol(
        name = "RNA extraction",
        protocol_type = OntologyAnnotation(
            term = "RNA extraction",
            term_source = ontologies["OBI"], 
            term_accession = "http://purl.obolibrary.org/obo/OBI_0666666"),
        description = "RNA extraction is a nucleic acid extraction where the desired output material is RNA"),
    "Whole Genome Sequencing": Protocol(
        name = "Whole Genome Sequencing", 
        protocol_type = OntologyAnnotation(
            term = "Whole Genome Sequencing",
            term_source = ontologies["NCIT"],
            term_accession = "http://purl.obolibrary.org/obo/NCIT_C101294")),
    "mRNA library preparation": Protocol(
        name = "mRNA library preparation", 
        protocol_type = OntologyAnnotation(
            term = "library preparation protocol",
            term_source = ontologies["ORNASEQ"],
            term_accession = "http://purl.obolibrary.org/obo/ORNASEQ_0000007"),
        description = "Library preparation from 800 ng of total RNA was performed according to Illumina stranded mRNA prep Reference Guide (Illumina, San Diego, CA, USA). ). Library quality check was performed using LabChip GX Touch HT High Sensitivity assay (PerkinElmer, USA). The libraries were quantified for sequencing using KAPA Library Quantification Kit (KAPA Biosystems, Wilmington, MA, USA).",
        uri = "https://www.illumina.com/products/by-type/sequencing-kits/library-prep-kits/stranded-mrna-prep.html"),
    "mRNA Sequencing": Protocol(
        name = "mRNA Sequencing", 
        protocol_type = OntologyAnnotation(
            term = "nucleic acid sequencing", #"mRNA Sequencing",
            term_source = ontologies["NCIT"],
            term_accession = "http://purl.obolibrary.org/obo/NCIT_C18881"),
        description = "Sequencing was performed with Illumina NovaSeq system using S4 flow cell with lane divider (Illumina, San Diego, CA, USA). Read length for the paired-end run was 2x151 bp."),
    "microRNAseq": Protocol(
        name = "MicroRNA Sequencing", 
        protocol_type = OntologyAnnotation(
            term = "MicroRNA Sequencing",
            term_source = ontologies["NCIT"],
            term_accession = "http://purl.obolibrary.org/obo/NCIT_C156057")),
    # microRNA qRT-PCR
    "Reverse Transcription": Protocol(
        name = "Reverse Transcription", 
        protocol_type = OntologyAnnotation(
            term = "artificially induced reverse transcription",
            term_source = ontologies["OBI"],
            term_accession = "http://purl.obolibrary.org/obo/OBI_0600028"),
        description = "A protocol with the objective to transcribe single-stranded RNA into complementary DNA (cDNA)"),
    "miRNA array": Protocol(
        name = "miRNA array", 
        protocol_type = OntologyAnnotation(
            term = "microRNA profiling by RT-PCR",
            term_source = ontologies["EFO"],
            term_accession = "http://www.ebi.ac.uk/efo/EFO_0007687"),
        parameters = [protocol_params["technical replicate"]],
        description = "An assay in which a set of microRNAs of a biological sample is analysed by reverse transcription PCR (RT-PCR)"),
    "Data Normalization": Protocol(
        name = "Data Normalization", 
        protocol_type = OntologyAnnotation(
            term = "Constant Normalization",
            term_source = ontologies["NCIT"],
            term_accession = "http://purl.obolibrary.org/obo/NCIT_C63786"),
        description = "Comparison of a variable factor to one with a fixed value (e.g., comparison of gene expression to a gene with constant expression)"),
    # metabolomics protocols
    "Extraction": Protocol(
        name = "Extraction", 
        protocol_type = OntologyAnnotation(
            term = "extraction",
            term_source = ontologies["OBI"],
            term_accession = "http://purl.obolibrary.org/obo/OBI_0302884"), #also in: MSIO, etc.
        parameters = [protocol_params["Post Extraction"],
                      protocol_params["Derivatization"]]),
        #description = "<i>Sample preparation:</i> To 50 uL plasma or serum were added 50 uL standard 1 (23.5 umol/L <sup>2</sup>H<sub>3</sub>-free carnitine in H<sub>2</sub>O) and 50 uL standard 2 (10 umol/L <sup>2</sup>H<sub>3</sub>-C<sub>2</sub>-, 2 umol/L <sup>2</sup>H<sub>3</sub>-C<sub>8</sub>- and 2 umol/L <sup>2</sup>H<sub>3</sub>-C<sub>16</sub>-carnitine in acetonitrile). Samples were mixed and subsequently deproteinized with 500 uL of acetonitrile and centrifuged. The resulting supernatant was dried under nitrogen at 45°C, and subsequently derivatized in 100 uL butanol-HCl for 15 min at 60°C. Samples were dried under nitrogen at 45°C and redissolved in 300 uL of acetonitrile. Prior to injection, 70 uL of the acetonitrile containing the acylcarnitines was mixed with 30 uL H<sub>2</sub>O.",
        #uri = "https://doi.org/10.1023/A:1005587617745"
        #parameters = [
         # control samples, standards, etc.   
        # Post Extraction
        # Derivatization]
    "Chromatography": Protocol(
        name = "Chromatography", 
        protocol_type = OntologyAnnotation(
            term = "chromatography",
            term_source = ontologies["CHMO"],
            term_accession = "http://purl.obolibrary.org/obo/CHMO_0001000"), #also in: PRIDE
        parameters = [protocol_params["Chromatography Instrument"],
                      protocol_params["Column model"],
                      protocol_params["Column type"]]),
      # parameters instrument, column, mobile phase, gradient, settings, injection volume
        # Chromatography Instrument
        # Column type
        # Column model
        # Guard column
        # Autosampler model
    "Mass spectrometry": Protocol(
        name = "Mass spectrometry", 
        protocol_type = OntologyAnnotation(
            term = "mass spectrometry",
            term_source = ontologies["CHMO"],
            term_accession = "http://purl.obolibrary.org/obo/CHMO_0000470"), #also in: MSIO, PRIDE
        #description = "<i>Sample introduction and ESI-MS/MS analysis:</i> Free carnitine and acylcarnitines were measured using scanning for precursor ions of mass 85 from 200 to 550 Da during 2 min on a Micromass Quattro II triple-quadrupole mass spectrometer, using a Gilson 231XL autosampler and a Hewlett-Packard HP-1100 HPLC pump, essentially as described previously (Rashed et al 1995a, 1997).",
        #uri = "https://doi.org/10.1023/A:1005587617745",
        parameters = [protocol_params["Scan polarity"],
                      protocol_params["Scan m/z range"],
                      protocol_params["Instrument"],
                      protocol_params["Ion source"],
                      protocol_params["Mass analyzer"],
                      protocol_params["method reference"]]),
        #  instrument used (make & manufacturer), ion source, ionisation mode (positive / negative), m/z range, and specific parameters such as temperatures, voltages, flow rates, scan rates
    "Data transformation_acylcarnitines": Protocol(
        name = "Data transformation - Plasma acylcarnitine analysis", 
        protocol_type = OntologyAnnotation(
            term = "data transformation",
            term_source = ontologies["OBI"],
            term_accession = "http://purl.obolibrary.org/obo/OBI_0200000"), #also in: EFO, GENEPIO, OMIABIS, STATO, etc.
        description = "<i>Calibration:</i> Calibration curves were obtained for free carnitine in the range 5-100 umol/L, for acetylcarnitine in the range 2-40 umol/L and for all other available acylcarnitines in the range 0.25-6 umol/L by adding standards to a normal plasma pool. All calibration curves were linear (r > 0.99, data not shown). For unsaturated and hydroxylated acylcarnitines an identical response to that of their saturated counterparts was assumed.",
        uri = "https://doi.org/10.1023/A:1005587617745"),
        # methods / pipelines and software used to transform the raw data
    "Metabolite identification_acylcarnitines": Protocol(
        name = "Metabolite identification - Plasma acylcarnitine analysis", 
        protocol_type = OntologyAnnotation(
            term = "metabolite identification",
            term_source = ontologies["MI"],
            term_accession = "http://purl.obolibrary.org/obo/MI_2131")),
    # details of methods / pipelines, reference databases and software used to identify features and/or annotate metabolites
    "EM-seq": Protocol(
        name = "EM-seq", 
        protocol_type = OntologyAnnotation(
            term = "methylation profiling by high throughput sequencing",
            term_source = ontologies["EFO"],
            term_accession = "http://www.ebi.ac.uk/efo/EFO_0002761"),
        description = "NEBNext Enzymatic Methyl-seq (EM-seq)"),
}

# append to study protocols
for protocol in protocols.values():
    cohort_study.protocols.append(protocol)

### Define sources and derived samples

In [6]:
# add dummy samples
for source_idx in range(1, 4):
    # create source (=individual)
    source_name = "individual_{0}".format(source_idx)
    source = Source(
        name = source_name,
        characteristics = [
            Characteristic(
                category = OntologyAnnotation(
                    term = "Organism",
                    term_source = ontologies["OBI"],
                    term_accession = "http://purl.obolibrary.org/obo/OBI_0100026"),
                value = OntologyAnnotation(
                    term = "Homo sapiens",
                    term_source = ontologies["NCBITAXON"],
                    term_accession = "http://purl.obolibrary.org/obo/NCBITaxon_9606"))])
    cohort_study.sources.append(source)
    # create sample
    sample_name = "sample_{0}".format(source_idx)
    sample = Sample(
        name = sample_name, 
        derives_from = [source])
    sample.characteristics.append(
        Characteristic(
            category = OntologyAnnotation(
                term = "anatomical entity",
                term_source = ontologies["UBERON"],
                term_accession = "http://purl.obolibrary.org/obo/UBERON_0001062"),
            value = OntologyAnnotation(
                term = "blood",
                term_source = ontologies["UBERON"],
                term_accession = "http://purl.obolibrary.org/obo/UBERON_0000178")))
    cohort_study.samples.append(sample)
    # sample collection process
    sample_collection_process = Process(
        name = "samplecollection_{0}".format(source_idx),
        executes_protocol = protocols["sample_collection"],
        #parameter_values = [
        #    ParameterValue(
        #        category = protocol_params["anatomical entity"], #ProtocolParameter 
        #        value = OntologyAnnotation(
        #            term = "blood",
        #            term_source = ontologies["UBERON"],
        #            term_accession = "http://purl.obolibrary.org/obo/UBERON_0000178"))],
        inputs = [source],
        outputs = [sample])
    cohort_study.process_sequence.append(sample_collection_process)

### Define assays

See assay options: https://github.com/ISA-tools/isa-api/blob/master/isatools/resources/config/yaml/assay-options.yml

In [7]:
assays = {
    "genomics_imtm": Assay(
        # options from https://github.com/ISA-tools/isa-api/blob/master/isatools/resources/config/yaml/assay-options.yml
        # measurement type: genome sequencing
        # technology type: nucleic acid sequencing
        # protocol type: nucleic acid sequencing
        # raw data file: Raw Data File 
        # derived data file: Derived Data File
        filename = "a_assay_genomics_imtm.txt",
        measurement_type = OntologyAnnotation(
            term = "Whole Genome Sequencing", #"DNA Sequence",
            term_source = ontologies["NCIT"],
            term_accession = "http://purl.obolibrary.org/obo/NCIT_C101294"), #"http://purl.obolibrary.org/obo/NCIT_C13299"),
        technology_type = OntologyAnnotation(
            term = "Nucleic Acid Sequencing", #"Whole Genome Sequencing",
            term_source = ontologies["NCIT"], 
            term_accession = "http://purl.obolibrary.org/obo/NCIT_C18881"), #"http://purl.obolibrary.org/obo/NCIT_C101294"),
        technology_platform = OntologyAnnotation(
            term = "Illumina platform", # TODO is there a more specific term for the platform?
            term_source = ontologies["GENEPIO"], 
            term_accession = "http://purl.obolibrary.org/obo/GENEPIO_0001923")
    ), 
    "methylseq_uu": Assay(
        # measurement type: DNA methylation profiling
        # technology type: nucleic acid sequencing
        # protocol type: nucleic acid sequencing
        # raw data file: Raw Data File 
        # derived data file: Derived Data File
        filename = "a_assay_methylseq_uu.txt",
        measurement_type = OntologyAnnotation(
            term = "DNA methylation profiling by high throughput sequencing assay", #"methylation profiling by high throughput sequencing",
            term_source = ontologies["OBI"], #"EFO"],
            term_accession = "http://purl.obolibrary.org/obo/OBI_0001266"), #"http://www.ebi.ac.uk/efo/EFO_0002761")
        technology_type = OntologyAnnotation(
            term = "Nucleic Acid Sequencing", 
            term_source = ontologies["NCIT"], 
            term_accession = "http://purl.obolibrary.org/obo/NCIT_C18881"), 
        #technology_type = "Enzymatic methylation sequencing",
        technology_platform = "NEBNext Enzymatic Methyl-seq (EM-seq)" # TODO check if this should be the sequencing platform instead
    ), 
    "rnaseq_fimm": Assay(
        # measurement type: transcription profiling
        # technology type: nucleic acid sequencing
        # protocol type: nucleic acid sequencing
        # raw data file: Raw Data File 
        # derived data file: Derived Data File
        filename = "a_assay_rnaseq_fimm.txt",
        measurement_type = OntologyAnnotation(
            term = "transcription profiling", #"cDNA Sequence"
            term_source = ontologies["EFO"],
            term_accession = "http://www.ebi.ac.uk/efo/EFO_0001032"),
        technology_type = OntologyAnnotation(
            term = "Nucleic Acid Sequencing", #"mRNA Sequencing",
            term_source = ontologies["NCIT"], 
            term_accession = "http://purl.obolibrary.org/obo/NCIT_C18881"),
        technology_platform = OntologyAnnotation(
            term = "Illumina NovaSeq 6000",
            term_source = ontologies["OBI"], 
            term_accession = "http://purl.obolibrary.org/obo/OBI_0002630")
    ),
    "mirnaseq_fimm": Assay(
        # measurement type: transcription profiling
        # technology type: nucleic acid sequencing
        # protocol type: nucleic acid sequencing
        # raw data file: Raw Data File 
        # derived data file: Derived Data File
        filename = "a_assay_mirnaseq_fimm.txt",
        measurement_type = OntologyAnnotation(
            term = "transcription profiling", #"cDNA Sequence"
            term_source = ontologies["EFO"],
            term_accession = "http://www.ebi.ac.uk/efo/EFO_0001032"),
        technology_type = OntologyAnnotation(
            term = "Nucleic Acid Sequencing", #"microRNA Sequencing",
            term_source = ontologies["NCIT"], 
            term_accession = "http://purl.obolibrary.org/obo/NCIT_C18881"),
        technology_platform = OntologyAnnotation(
            term = "Illumina NovaSeq 6000",
            term_source = ontologies["OBI"], 
            term_accession = "http://purl.obolibrary.org/obo/OBI_0002630")
    ), 
    "mirnaqrtpcr_sermas": Assay(
        # measurement type: transcription profiling
        # technology type: RT-pcr
        # protocol type: RT-pcr
        # raw data file: Raw Data File 
        # derived data file: Derived Data File
        filename = "a_assay_mirnaqrtpcr_sermas.txt",
        measurement_type = OntologyAnnotation(
            #term = "transcription profiling by RT-PCR assay", 
            #term_source = ontologies["OBI"],
            #term_accession = "http://purl.obolibrary.org/obo/OBI_0001361"),
            term = "transcription profiling", 
            term_source = ontologies["EFO"],
            term_accession = "http://www.ebi.ac.uk/efo/EFO_0001032"),
        technology_type = OntologyAnnotation(
            term = "microRNA profiling by RT-PCR", 
            term_source = ontologies["EFO"], 
            term_accession = "http://www.ebi.ac.uk/efo/EFO_0007687"),
        technology_platform = OntologyAnnotation(
            term = "LightCycler 480 Real-Time PCR detection instrument",
            term_source = ontologies["BAO"], 
            term_accession = "http://www.bioassayontology.org/bao#BAO_0150063")
    ),
#    "proteomics_imtm": Assay(
#        filename = "a_assay_proteomics_imtm.txt"), 
    "metabolomics_acylcarnitines_rumc": Assay(
        # measurement type: targeted metabolite profiling
        # technology type: mass spectrometry
        # protocol type: mass spectrometry
        # raw data file: Raw Spectral Data File
        # derived data file:
        #     - Derived Spectral Data File
        #     - Metabolite Assignment File
        # https://doi.org/10.1023/A:1005587617745
        filename = "a_assay_metabolomics_acylcarnitines_rumc.txt",
        measurement_type = OntologyAnnotation(
            term = "targeted metabolite profiling",
            term_source = ontologies["MSIO"],
            term_accession = "http://purl.obolibrary.org/obo/MSIO_0000100"),
        technology_type = OntologyAnnotation(
            term = "mass spectrometry", #"electrospray ionisation tandem mass spectrometry",
            term_source = ontologies["CHMO"], 
            term_accession = "http://purl.obolibrary.org/obo/CHMO_0000470"), #"http://purl.obolibrary.org/obo/CHMO_0000577"),
        technology_platform = "Micromass Quattro II triple-quadrupole mass spectrometer"
    ),
    # http://snomed.info/id/442613004 Quantitative measurement of acylcarnitine in plasma specimen (procedure)
    "metabolomics_aminoacids_mumc": Assay(
        filename = "a_assay_metabolomics_aminoacids_mumc.txt",
        measurement_type = OntologyAnnotation(
            term = "targeted metabolite profiling",
            term_source = ontologies["MSIO"],
            term_accession = "http://purl.obolibrary.org/obo/MSIO_0000100"),
        technology_type = OntologyAnnotation(
            term = "mass spectrometry", 
            term_source = ontologies["CHMO"], 
            term_accession = "http://purl.obolibrary.org/obo/CHMO_0000470"),
        technology_platform = "Micromass Quattro Premier XE Tandem Mass Spectrometer" # from # https://doi.org/10.1016/j.cca.2009.06.023
    ), 
#    "metabolomics_fattyacids_mumc": Assay(
#        filename = "a_assay_metabolomics_fattyacids_mumc.txt"), 
}
# NOTE: make sure every assay contains samples etc. before added to study; 
#       otherwise it's possible that some assays are not written to files 

### Define materials and processes referring to protocols for all assays

Data file types are:
'Raw Data File',
'Derived Data File',
'Image File',
'Acquisition Parameter Data File',
'Derived Spectral Data File',
'Protein Assignment File',
'Raw Spectral Data File',
'Peptide Assignment File',
'Array Data File',
'Derived Array Data File',
'Post Translational Modification Assignment File',
'Derived Array Data Matrix File',
'Free Induction Decay Data File',
'Metabolite Assignment File',
'Array Data Matrix File'

In [8]:
# genomics IMTM
for idx, sample in enumerate(cohort_study.samples):
    # DNA extraction
    dna = Material(
        name = "DNA_{0}".format(sample.name),
        type_ = "Extract Name")
    dna_extraction_process = Process(
        name = "DNA_extraction_{0}".format(sample.name),
        executes_protocol = protocols["dna_extraction"],
        inputs = [sample], 
        outputs = [dna])
    ##################################
    # genomics measurement
    # TODO derived data files 
    wgs_raw_file1 = DataFile(
        filename = "WGS_rawdata_{0}_R1.fastq.gz".format(dna.name), 
        label = "Raw Data File R1", 
        generated_from = [dna])
    wgs_raw_file2 = DataFile(
        filename = "WGS_rawdata_{0}_R2.fastq.gz".format(dna.name), 
        label = "Raw Data File R2", 
        generated_from = [dna])
    wgs_process = Process(
        name = "WGS_{0}".format(dna.name),
        executes_protocol = protocols["Whole Genome Sequencing"],
        inputs = [dna], 
        outputs = [wgs_raw_file1, wgs_raw_file2])
    plink(dna_extraction_process, wgs_process)
    assays["genomics_imtm"].samples.append(sample)
    assays["genomics_imtm"].other_material.append(dna)
    assays["genomics_imtm"].data_files.append(wgs_raw_file1)
    assays["genomics_imtm"].data_files.append(wgs_raw_file2)
    assays["genomics_imtm"].process_sequence.append(dna_extraction_process)
    assays["genomics_imtm"].process_sequence.append(wgs_process)

In [9]:
# methylation sequencing UU
for idx, sample in enumerate(cohort_study.samples):
    ################################
    # materials used in multiple assays
    # DNA extraction
    dna = Material(
        name = "DNA_{0}".format(sample.name),
        type_ = "Extract Name")
    dna_extraction_process = Process(
        name = "DNA_extraction_{0}".format(sample.name),
        executes_protocol = protocols["dna_extraction"],
        inputs = [sample], 
        outputs = [dna])
    ############################
    # methylation sequencing
    # TODO issue - is this the correct way of handling two raw data files? Should be of type "Raw Data File", but there can only be one file of this type
    methylseq_raw_file1 = DataFile(
        filename = "EMseq_rawdata_{0}_R1.fastq".format(dna.name), 
        label = "Raw Data File R1", 
        generated_from = [dna])
    methylseq_raw_file2 = DataFile(
        filename = "EMseq_rawdata_{0}_R2.fastq".format(dna.name), 
        label = "Raw Data File R2", 
        generated_from = [dna])
    methylseq_process = Process(
        name = "WGS_{0}".format(dna.name),
        executes_protocol = protocols["EM-seq"],
        inputs = [dna], 
        outputs = [methylseq_raw_file1, methylseq_raw_file2])
    plink(dna_extraction_process, methylseq_process)
    assays["methylseq_uu"].samples.append(sample)
    assays["methylseq_uu"].other_material.append(dna)
    assays["methylseq_uu"].data_files.append(methylseq_raw_file1)
    assays["methylseq_uu"].data_files.append(methylseq_raw_file2)
    assays["methylseq_uu"].process_sequence.append(dna_extraction_process)
    assays["methylseq_uu"].process_sequence.append(methylseq_process)
    

In [10]:
# RNA-seq FIMM
for idx, sample in enumerate(cohort_study.samples):
    # RNA extraction
    rna = Material(
        name = "RNA_{0}".format(sample.name),
        type_ = "Extract Name")
    rna_extraction_process = Process(
        name = "RNA_extraction_{0}".format(sample.name),
        executes_protocol = protocols["rna_extraction"],
        inputs = [sample], 
        outputs = [rna])
    ##################################
    # transcriptomics - RNAseq
    # TODO derived data files 
    rnaseq_raw_file1 = DataFile(
        filename = "RNAseq_rawdata_{0}_R1.fastq.gz".format(rna.name), 
        label = "Raw Data File R1", 
        generated_from = [rna])
    rnaseq_raw_file2 = DataFile(
        filename = "RNAseq_rawdata_{0}_R2.fastq.gz".format(rna.name), 
        label = "Raw Data File R2", 
        generated_from = [rna])
    rnaseq_libraryprep_process = Process(
        name = "mRNAlibprep_{0}".format(rna.name),
        executes_protocol = protocols["mRNA library preparation"],
        inputs = [rna], 
        outputs = [])
    rnaseq_process = Process(
        name = "mRNAseq_{0}".format(rna.name),
        executes_protocol = protocols["mRNA Sequencing"],
        inputs = [], 
        outputs = [rnaseq_raw_file1, rnaseq_raw_file2])
    plink(rna_extraction_process, rnaseq_libraryprep_process)
    plink(rnaseq_libraryprep_process, rnaseq_process)
    assays["rnaseq_fimm"].samples.append(sample)
    assays["rnaseq_fimm"].other_material.append(rna)
    assays["rnaseq_fimm"].data_files.append(rnaseq_raw_file1)
    assays["rnaseq_fimm"].data_files.append(rnaseq_raw_file2)
    assays["rnaseq_fimm"].process_sequence.append(rna_extraction_process)
    assays["rnaseq_fimm"].process_sequence.append(rnaseq_libraryprep_process)
    assays["rnaseq_fimm"].process_sequence.append(rnaseq_process)

In [11]:
# miRNA qRT-PCR SERMAS
for idx, sample in enumerate(cohort_study.samples):
    # RNA extraction
    rna = Material(
        name = "RNA_{0}".format(sample.name),
        type_ = "Extract Name")
    rna_extraction_process = Process(
        name = "RNA_extraction_{0}".format(sample.name),
        executes_protocol = protocols["rna_extraction"],
        inputs = [sample], 
        outputs = [rna])
    ##################################
    # microRNA qRT-PCR - SERMAS
    # 1 sample -> 1 RNA isolation -> 1 cDNA -> triplicate PCR
    cdna = Material(
        name = "cDNA_{0}".format(sample.name),
        type_ = "Labeled Extract Name")
    reversetranscription_process = Process(
        name = "ReverseTranscription_{0}".format(rna.name),
        executes_protocol = protocols["Reverse Transcription"],
        inputs = [rna],
        outputs = [cdna])
    for techrep in range(1,4): # triplicate PCR
    #techrep = 1
        miRNAqRTPCR_raw_file = DataFile(
            filename = "miRNAqPCR_rawdata_{0}_Replicate_{1}".format(sample.name, techrep), 
            label = "Raw Data File", 
            generated_from = [cdna])
        miRNAarray_process = Process(
            name = "miRNAarray_{0}_Replicate_{1}".format(sample.name, techrep), 
            executes_protocol = protocols["miRNA array"],
            parameter_values = [
                ParameterValue(category = protocol_params["technical replicate"],
                               value = techrep)],
            inputs = [cdna], 
            outputs = [miRNAqRTPCR_raw_file])
        miRNAqRTPCR_derived_file = DataFile(
            filename = "miRNAqPCR_normalizationdata_{0}_Replicate_{1}".format(sample.name, techrep), 
            label = "Derived Data File", 
            generated_from = [miRNAqRTPCR_raw_file])
        datanorm_process = Process(
            name = "datanormalization_{0}_Replicate_{1}".format(sample.name, techrep), 
            executes_protocol = protocols["Data Normalization"], 
            inputs = [miRNAqRTPCR_raw_file], 
            outputs = [miRNAqRTPCR_derived_file])

        plink(rna_extraction_process, reversetranscription_process)
        plink(reversetranscription_process, miRNAarray_process)
        plink(miRNAarray_process, datanorm_process)

        assays["mirnaqrtpcr_sermas"].samples.append(sample)
        assays["mirnaqrtpcr_sermas"].other_material.append(rna)
        assays["mirnaqrtpcr_sermas"].other_material.append(cdna)

        assays["mirnaqrtpcr_sermas"].data_files.append(miRNAqRTPCR_raw_file)
        assays["mirnaqrtpcr_sermas"].data_files.append(miRNAqRTPCR_derived_file)

        assays["mirnaqrtpcr_sermas"].process_sequence.append(rna_extraction_process)
        assays["mirnaqrtpcr_sermas"].process_sequence.append(reversetranscription_process)
        assays["mirnaqrtpcr_sermas"].process_sequence.append(miRNAarray_process)
        assays["mirnaqrtpcr_sermas"].process_sequence.append(datanorm_process)

In [12]:
# metabolomics RUMC
for idx, sample in enumerate(cohort_study.samples):
    ##################################
    # metabolomics_acylcarnitines_rumc
    acylcarnitine_sample_extract = Material(
        name = "{0}^B-BCARP_NM".format(sample.name),
        type_ = "Extract Name")
    acylcarnitine_sampleprep_process = Process(
        name = "Sample_preparation_plasma_acylcarnitine_analysis_{0}".format(sample.name),
        executes_protocol = protocols["Extraction"],
        inputs = [sample], 
        outputs = [acylcarnitine_sample_extract],
        parameter_values = [
            ParameterValue(category = protocol_params["Post Extraction"],
                           value = "50 uL of sample was diluted with IS in MeOH, followed by 500 uL acetonitril. Mixture was vortexed and subsequently centrifuged at 16100*g at room temp. For 5 minutes. Supernatant was transferred to a new vial and evaporated at 37 degree Celsius under a flow of nitrogen. 50 uL of 1M derivitizing agent (1-butanol : acetylchloride 19:1) was added. The sample was vortexed and kept at 60° Celsius for exactly 15 minutes. Sample was again evaporated at 37° Celsius under a flow of nitrogen. Sample was then resuspended in 500 uL acetonitril, vortexed and injected."),
            ParameterValue(category = protocol_params["Derivatization"],
                           value = "Butylation")])
    acylcarnitine_chromatography_process = Process(
        name = "Chromatography_plasma_acylcarnitine_analysis_{0}".format(sample.name),
        executes_protocol = protocols["Chromatography"],
        inputs = [acylcarnitine_sample_extract], 
        outputs = [],
        parameter_values = [
            ParameterValue(category = protocol_params["Chromatography Instrument"],
                           value = "Waters I-class HPLC system"),
            ParameterValue(category = protocol_params["Column type"],
                           value = "flow injection analysis")])
    acylcarnitine_raw_data = DataFile(
        filename = "{0}^B-BCARP_NM.raw".format(sample.name), 
        label = "Raw Spectral Data File")
    acylcarnitine_MS_process = Process(
        name = "MS_plasma_acylcarnitine_analysis_{0}".format(sample.name),
        executes_protocol = protocols["Mass spectrometry"],
        inputs = [], 
        outputs = [acylcarnitine_raw_data],
        parameter_values = [
            ParameterValue(category = protocol_params["Scan polarity"],
                           value = "postive"),
            ParameterValue(category = protocol_params["Scan m/z range"],
                           value = "200-525"),
            ParameterValue(category = protocol_params["Instrument"],
                           value = "Waters Xevo TQ-S micro"),
            ParameterValue(category = protocol_params["Ion source"],
                           value = OntologyAnnotation(
                               term = "electrospray ionization",
                               term_source = ontologies["CHMO"],
                               term_accession = "http://purl.obolibrary.org/obo/MS_1000073")), 
            ParameterValue(category = protocol_params["Mass analyzer"],
                           value = OntologyAnnotation(
                               term = "triple quadrupole mass spectrometer",
                               term_source = ontologies["CHMO"],
                               term_accession = "http://purl.obolibrary.org/obo/CHMO_0002021")),
            ParameterValue(category = protocol_params["method reference"],
                           value = "https://doi.org/10.1023/A:1005587617745")])
    acylcarnitine_datatransform_process = Process(
        name = "Data_transformation_plasma_acylcarnitine_analysis_{0}".format(sample.name),
        inputs = [acylcarnitine_raw_data],
        outputs = [],
        executes_protocol = protocols["Data transformation_acylcarnitines"])
    acylcarnitine_MAF = DataFile(
        filename = "m_carnprofiel.tsv", 
        label = "Metabolite Assignment File")
    acylcarnitine_metaboliteident_process = Process(
        name = "Metabolite_identification_plasma_acylcarnitine_analysis_{0}".format(sample.name),
        outputs = [acylcarnitine_MAF],
        executes_protocol = protocols["Metabolite identification_acylcarnitines"])
    plink(acylcarnitine_sampleprep_process, acylcarnitine_chromatography_process)
    plink(acylcarnitine_chromatography_process, acylcarnitine_MS_process)
    #plink(acylcarnitine_sampleprep_process, acylcarnitine_MS_process)
    plink(acylcarnitine_MS_process, acylcarnitine_datatransform_process)
    plink(acylcarnitine_datatransform_process, acylcarnitine_metaboliteident_process)
    assays["metabolomics_acylcarnitines_rumc"].samples.append(sample)
    assays["metabolomics_acylcarnitines_rumc"].other_material.append(acylcarnitine_sample_extract)
    assays["metabolomics_acylcarnitines_rumc"].process_sequence.append(acylcarnitine_sampleprep_process)
    assays["metabolomics_acylcarnitines_rumc"].process_sequence.append(acylcarnitine_chromatography_process)
    assays["metabolomics_acylcarnitines_rumc"].process_sequence.append(acylcarnitine_MS_process)
    assays["metabolomics_acylcarnitines_rumc"].process_sequence.append(acylcarnitine_datatransform_process)
    assays["metabolomics_acylcarnitines_rumc"].process_sequence.append(acylcarnitine_metaboliteident_process)

In [34]:
# add assays to study
assay_filenames = []
for assay_name, assay in assays.items():
    print(assay_name)
    if assay_name in ("metabolomics_acylcarnitines_rumc", "mirnaqrtpcr_sermas", 
                      "rnaseq_fimm", "genomics_imtm", "methylseq_uu"):
        cohort_study.assays.append(assay)
        assay_filenames.append(assay.filename)

genomics_imtm
methylseq_uu
rnaseq_fimm
mirnaseq_fimm
mirnaqrtpcr_sermas
metabolomics_acylcarnitines_rumc
metabolomics_aminoacids_mumc


In [14]:
#print(cohort_study.assays)

## Export to ISA-Tab and ISA-JSON

Write ISA-Tab files

In [15]:
# create output directories (if necessary)
import os
out_dir = "../ISA"
out_dir_tab = os.path.join(out_dir, "ISA-Tab")
out_dir_tabxlsx = os.path.join(out_dir, "ISA-Tab_asXLSX")
out_dir_json = os.path.join(out_dir, "ISA-JSON")
for p in (out_dir_tab, out_dir_tabxlsx, out_dir_json):
    if not os.path.isdir(p):
        os.makedirs(p)
# write to ISA-Tab
from isatools import isatab
isatab.dump(investigation, out_dir_tab)

isatools.model.Investigation(identifier='', filename='i_investigation.txt', title='EATRIS-Plus - Flagship in Personalised Medicine', submission_date='', public_release_date='', ontology_source_references=[isatools.model.OntologySource(name='BAO - BioAssay Ontology', file='http://www.bioassayontology.org/bao/bao_complete.owl', version='', description='The BioAssay Ontology (BAO) describes biological screening assays and their results including high-throughput screening (HTS) data for the purpose of categorizing assays and data analysis. BAO is an extensible, knowledge-based, highly expressive (currently SHOIQ(D)) description of biological assays making use of descriptive logic based features of the Web Ontology Language (OWL). BAO currently has over 700 classes and also makes use of several other ontologies. It describes several concepts related to biological screening, including Perturbagen, Format, Meta Target, Design, Detection Technology, and Endpoint. Perturbagens are perturbing ag

In [16]:
# read ISA-Tab files and validate
#with open(os.path.join(out_dir, "i_investigation.txt")) as my_file:
#    ISA = isatab.validate(my_file)

Write ISA-JSON file

In [17]:
# write to ISA-JSON
# see example: https://isa-tools.org/isa-api/content/examples/example-createSimpleISAJSON.html
import json
from isatools.isajson import ISAJSONEncoder
with open(os.path.join(out_dir_json, "isa.json"), "w") as out_file:
    json.dump(
        investigation, 
        out_file,
        cls = ISAJSONEncoder, 
        sort_keys = True, 
        indent = 4, 
        separators = (',', ': '))

In [18]:
#from isatools.convert import json2isatab
#from isatools import isajson
#isajson.validate(open('isa.json'))
#with open("isa.json") as file_pointer:
#    json2isatab.convert(file_pointer, './ISA/')

Combine csv files to xlsx file

In [41]:
import csv
import openpyxl

xlsx_workbook = openpyxl.Workbook()
for idx, f in enumerate([investigation_filename, study_filename] + 
                        assay_filenames):
    xlsx_sheets = xlsx_workbook.create_sheet(f[:30], idx)
    with open(os.path.join(out_dir_tab, f), "r") as txt_file:
        reader = csv.reader(txt_file, delimiter = "\t")
        for row in reader:
            xlsx_sheets.append(row)
xlsx_workbook.save(os.path.join(out_dir_tabxlsx, "ISA_merged.xlsx"))