# Chemical Intervention on Rat

This notebook is an exercise relying on the ISA-API functionality for building an experiment description based on the experimental design.

### Objectives
- Identify concepts from the experimental design
- Represent the experimental design concepts using the ISA-API model  
- Produce ISA metadata for the experiment

### References
- [Documentation about the ISA-API](https://isatools.readthedocs.io), and more specifically:
    - [documentation about study-design driven creation of ISA content](https://isatools.readthedocs.io/en/latest/studydesigncreation.html)
    - [documentation of main ISA objects](https://isatools.readthedocs.io/en/latest/creation.html#)


# Part 1

Read the following description of tan experiment and follow the steps below to identify the main elements from the experimental design.


Experiment Narrative:
---------------------

*Male Fisher F344 rats* purchased from Charles River were treated with **3 commonly used painkillers**, namely **acetylsalicilic acid, acetaminophen and ibuprofen**, at **2 distinct dose levels**, delivered *per os*.
Equal number of animals (n=5) were allocated to each of the group defined by a **compound, dose level and duration post exposure** (of 1 hour, 2 hours and 4 hours) combination.*
Following sacrifice performed by cervical dislocation and exsanguination preceded by anesthesia (ketamine and xylazine solution), *blood and kidney specimens were collected*.
*Total RNA were extracted from Kidney samples and mRNA sequencing (transcription profiling) was performed using paired-end libraries on Illumina sequencing platform using an Illumina HiSeq 2000 instrument*.
Blood samples were collected at sacrifice time and immediately placed in precooled 60 percent methanol ammonium bicarbonate buffer to quench cellular metabolism. Blood Metabolites were separated in water-soluble and lipophilic fractions. 
*Metabolite profiling was performed on the polar metabolite fraction only, using flow injection analysis (FIA) mass spectrometry on an Agilent 6550 iFunnel Q-TOF Mass Sprectrometry platform*. Each fraction was injected twice and data were acquired in both ionization modes (positive mode and negative mode).
Raw data files were saved in native instrument format and later converted to HUPO-PSI standard format for mass spectrometry.


## Experimental description with structured metadata

In order to describe the experiment, we will rely on the models defined by the ISA-API, and thus we need to following import statements:

In [36]:
from isatools.create.models import *
from isatools.model import *

## Identification of Variables, their levels and definition of the Treatment Plans

In the context of the above experiment, ** can you identify the independent variables and their associated levels**?.

You can define a new variable by relying in the ```StudyFactor``` object in this way:

```python
chemical_agent = StudyFactor(name="chemical agent")

```
or if you want to add the ontology term for such study factor, you can use the [EBI Ontology Lookup Service](https://www.ebi.ac.uk/ols/) to find a relevant term from the ChEBI ontology, and build the factor in this way:
    
```python
chemical_agent = StudyFactor(name="agent", factor_type=OntologyAnnotation(term="chemical entity", term_source="ChEBI", term_accession="http://purl.obolibrary.org/obo/CHEBI_24431"))
```    

Define the relevant ```StudyFactor```'s below and see if you can find ontology terms that are relevant to annotate them (e.g. you can try and find terms from the Experimental Factor Ontology or EFO):


In [37]:
### ANSWER

chemical_agent = StudyFactor(name="agent", factor_type=OntologyAnnotation(term="chemical entity", term_source="ChEBI", term_accession="http://purl.obolibrary.org/obo/CHEBI_24431"))
dose = StudyFactor(name="dose")
duration = StudyFactor(name="duration")


Next, we can use a ```TreatmentFactory``` to include the ```StudyFactor```'s and indicate their different factor levels, i.e. the values that these variables will assume according to the experiment description above.

You should create the treatment factory providing the different study factors:

```python
treatment_factory = TreatmentFactor(factors=[ ... here list the variables...])

```
and then for each factor, you can add their levels in the following way:

```python
treatment_factory.add_factor_value( ... factor variable ..., { ... list of strings with the names of the factor values... }
```


In [38]:
### ANSWER

treatment_factory = TreatmentFactory(factors=[chemical_agent, dose, duration])

treatment_factory.add_factor_value(chemical_agent, {'acetyl salicylic acid', 'acetaminophen', 'ibuprofen'})
treatment_factory.add_factor_value(dose, {'high dose', 'low dose'})
treatment_factory.add_factor_value(duration, {'1 hr', '2 hr', '4 hr'})


### Computing the Number of Unique Treatment Groups/Study Groups: 


As per the description above, the experiment follows a factorial design where all combinations of factor values are considered. We can build our treatment plan, or ```TreatmentSequence```, by relying on an utilty method that given the factors and their levels computes the full factorial design:



In [39]:
all_treatments = treatment_factory.compute_full_factorial_design()

How many treatment groups should have been identified? Check the number you considered matches with the answer of running the command below:

In [40]:
print('Number of study groups (treatment groups): {}'.format(len(all_treatments)))

Number of study groups (treatment groups): 18


We can now build a treatment plan, or ```TreatmentSequence```, by including all the treatments according to the factorial design:

In [41]:
treatment_sequence = TreatmentSequence(ranked_treatments=all_treatments)

Are study subjects exposed to a single intervention or to multiple intervention?
  


You can now visualise a summary of the treatment plan that you created with the following command:

In [42]:
report = make_summary_from_treatment_sequence(treatment_sequence)
report

{'full_factorial': True,
 'length_of_treatment_sequence': 1,
 'list_of_treatments': [[{'factor': 'agent', 'value': 'acetyl salicylic acid'},
   {'factor': 'dose', 'value': 'low dose'},
   {'factor': 'duration', 'value': '1 hr'}],
  [{'factor': 'agent', 'value': 'acetyl salicylic acid'},
   {'factor': 'dose', 'value': 'high dose'},
   {'factor': 'duration', 'value': '2 hr'}],
  [{'factor': 'agent', 'value': 'acetaminophen'},
   {'factor': 'dose', 'value': 'low dose'},
   {'factor': 'duration', 'value': '4 hr'}],
  [{'factor': 'agent', 'value': 'ibuprofen'},
   {'factor': 'dose', 'value': 'high dose'},
   {'factor': 'duration', 'value': '1 hr'}],
  [{'factor': 'agent', 'value': 'acetaminophen'},
   {'factor': 'dose', 'value': 'low dose'},
   {'factor': 'duration', 'value': '1 hr'}],
  [{'factor': 'agent', 'value': 'ibuprofen'},
   {'factor': 'dose', 'value': 'low dose'},
   {'factor': 'duration', 'value': '4 hr'}],
  [{'factor': 'agent', 'value': 'acetyl salicylic acid'},
   {'factor': '

Is the treatment plan report in agreement with the experimental design?

### Study group size

The following code builds a slider (relying on the ```ipywidgets``` library) for you to set the group size, please select the appropriate number according to the experiment description:

In [43]:
from ipywidgets import (IntSlider)
group_size = IntSlider(value=1, min=0, max=100, step=1, description='Group size:', disabled=False, continuous_update=False, orientation='horizontal', readout=True, readout_format='d')
group_size


The group size value you chose, and that is going to be used in the next section, is:

In [44]:
group_size.value

1

## Sample collection and assay plans

Given the group size selected above, we are now going to build the sample collection and assay plans based on the group size. Given this, can you say if the design is balanced or unbalanced?

In [45]:
plan = SampleAssayPlan(group_size=group_size.value)

Let's now build a ```dictionary``` with the sample collection plan: it should contain key:value pairs with the specimen or sample type as key and the number of samples collected for each type over the course of the study. Here is the code snippet that you should complete:

```python
sample_collection_plan = { "sample type 1": 0, "sample type 2": 1 }
```


In [46]:
### ANSWER
sample_collection_plan = {"kidney":1, "blood":1} 


Next, the following code will take the sample_collection_plan object that you built and include all the details in the sample_collection_plan object:

In [47]:
for sample_type in sample_collection_plan:    
    plan.add_sample_type(sample_type)
    plan.add_sample_plan_record(sample_type,sample_collection_plan[sample_type])


### View the sample and assay plan information as a JSON document

This section is meant to show how to serialize key study design descriptors in a compact document serialized in format format. 
Why is this relevant? How would you use this feature? List 3 possible uses.

In [48]:
import json
from isatools.create.models import SampleAssayPlanEncoder
print(json.dumps(plan, cls=SampleAssayPlanEncoder, sort_keys=True, indent=4, separators=(',', ': ')))

{
    "assay_plan": [],
    "assay_types": [],
    "group_size": 1,
    "sample_plan": [
        {
            "sample_type": "kidney",
            "sampling_size": 1
        },
        {
            "sample_type": "blood",
            "sampling_size": 1
        }
    ],
    "sample_qc_plan": [],
    "sample_types": [
        "blood",
        "kidney"
    ]
}


## Create an ISA experimental description based on the study design and the sampling plan:

In the following section, the task is to build ISA objects relying on the study design information we built above.


Let's first create an ```Investigation``` object to hold all the information about the experiment with this command, which also assigns an identifier for the investigation.

In [49]:
isa_investigation = Investigation(identifier='inv-dtp-exercise')

Now, let's create a study object using the sample and assay plan as well as the ```TreatmentSequence``` object we built before. For this, we will need an object of the ```IsaModelObjectFactory``` class provided in the ISA-API:

In [50]:
isa_object_factory = IsaModelObjectFactory(plan, treatment_sequence)
isa_study = isa_object_factory.create_study_from_plan()

Now, we can link the study to the investigation we created before:

In [51]:
isa_investigation.studies = [isa_study]


and set a name for the study file:

In [52]:
isa_study.filename = '... complete here a filename for the study file...'

In [53]:
## ANSWER

isa_study.filename = 's_study.txt'

Let's now check the study sample table:

In [54]:
from isatools.isatab import dump_tables_to_dataframes as dumpdf
from qgrid import show_grid

dataframes = dumpdf(isa_investigation)
sample_table = next(iter(dataframes.values()))
show_grid(sample_table)

In [55]:
print('Total rows generated: {}'.format(len(sample_table)))

Total rows generated: 36


### Study description and study design type


Can you set the study description (or abstract) relying in this code snippet?

```python
isa_study.description = "... here the text of the description..."
```


In [56]:
### ANSWER

isa_study.description = "*Male Fisher F344 rats* purchased from Charles River were treated with *3 commonly used painkillers*, namely *acetylsalicilic acid, acetaminophen and ibuprofen*, at 2 distinct dose levels, delivered *per os*. *Equal number of animals (n=5) were allocated to each of the group defined by a compound, dose level and duration post exposure combination.* Following sacrifice performed by cervical dislocation and exsanguination preceded by anesthesia (ketamine and xylazine solution), *blood and kidney specimens were collected*. *Total RNA were extracted from Kidney samples and mRNA sequencing (transcription profiling) was performed using paired-end libraries on Illumina sequencing platform using an Illumina HiSeq 2000 instrument*. Blood samples were collected at sacrifice time and immediately placed in precooled 60 percent methanol ammonium bicarbonate buffer to quench cellular metabolism. Blood Metabolites were separated in water-soluble and lipophilic fractions.*Metabolite profiling was performed on the polar metabolite fraction only, using flow injection analysis (FIA) mass spectrometry on an Agilent 6550 iFunnel Q-TOF Mass Sprectrometry platform*. Each fraction was injected twice and data were acquired in both ionization modes (positive mode and negative mode). Raw data files were saved in native instrument format and later converted to HUPO-PSI standard format for mass spectrometry."

Next, we would like to specify the type of study design (and we can set multiple values if necessary). 

For this, we will build ontology annotations, as we did for the ```StudyFactor`` objects:

```python
descriptor_1 = OntologyAnnotation(
                  term="... here the label of the term...", 
                  term_source="... here the name of the ontology the term comes from...", 
                  term_accession="... here the URL of the term ...")
```

To determine some of the study design descriptors, consider the following questions and use the Ontology Lookup Service to find relevant terms:

- is the experiment following an 'intervention design' or an 'observation design'?
- is the design 'factorial' or a 'randomized block' design?
- is the design 'full' or 'fractional'?

In [57]:
### HERE DEFINE YOUR DESCRIPTORS

In [58]:
### ANSWER

descriptor_1 = OntologyAnnotation(
                term="intervention design",
                term_source= "OBI",
                term_accession="http://purl.obolibrary.org/obo/OBI_0000115"
                )

descriptor_2 = OntologyAnnotation(
                term="full factorial design",
                term_source= "",
                term_accession=""
                )



After you defined the descriptors, you can append them to isa_study.design_descriptors list as follows:

```python
isa_study.design_descriptors.append(descriptor_1)
```


In [59]:
### HERE APPEND ALL THE DESCRIPTORS YOU DEFINED ABOVE

In [60]:
### ANSWER 
isa_study.design_descriptors.append(descriptor_1)
isa_study.design_descriptors.append(descriptor_2)

## Assay and Data Acquisition Plans:

From the textual description of the experiment, identify the response or dependent variables.

In the ISA model, an Assay is defined by the type of measurment it performs and the technology used to obtain the measurements.

In the ISA model, there is a series of configuration files that define the vetted values for Measurement Type and Technology type.

An ISA configuration can be accessed from: https://github.com/ISA-tools/Configuration-Files/tree/master/isaconfig-default_v2015-07-02

Given those lists of configurations, let's define the assay types needed for our experiment.

The way to define an assay type is as follows

```python
assay_type_1= AssayType(measurement_type='...here a supported measurement type...', technology_type='...here a supported technology type...')
```
Define below the assay types that you can identify from the experiment narrative:

In [61]:
#### ANSWER

assay_type_1 = AssayType(measurement_type='transcription profiling', technology_type='nucleotide sequencing')
assay_type_2 = AssayType(measurement_type='metabolite profiling', technology_type='mass spectrometry')


transcription profiling  using  nucleotide sequencing
metabolite profiling  using  mass spectrometry


Let's now define a set for the assay types:

In [None]:
assay_types = set()

In [None]:
You can add the types you defined above to the set as follows:
    
```python
assay_types.add(assay_type_1)
```

Add all your assay types to the assay_types set below:

In [None]:
### ANSWER
assay_types.add(assay_type_1)
assay_types.add(assay_type_2)


Let's now visualise the assay types:

In [None]:
for x in assay_types:
        print(x.measurement_type.term," using ", x.technology_type.term)


## Assay Specific Descriptors
Each data acquisition modality comes with its own set of parameters which may be used to capture the underlying workflow graph modifications. This section of the exercise aims to show how to deal with workflows specific to sequencing applications and mass spectrometry ones.

### Generate ISA model objects from the assay plan and render the assay table

In [62]:
# Starting with DNA sequencing assay workflow: 
# Setting the type of library and the number of runs per sample.

# This bit is meant to introduce you to iwidgets, a library for ipython allowing user interaction (not implemented here)
ngs_technical_replicates = IntSlider(value=2, min=0, max=5, step=1, description='Technical repeats:', disabled=False, continuous_update=False, orientation='horizontal', readout=True, readout_format='d')
print(ngs_technical_replicates.value)
ngs_technical_replicates



2


### Check state of Sample Assay Plan after entering assay plan information:

### Dealing with Next Generation Sequencing Data Acquisition Plan

In [63]:
sequencing_instruments = set()
sequencing_instruments.add('Illumina HiSeq 2000') # check the experiment description to obtain value.

top_mods_seq = DNASeqAssayTopologyModifiers(technical_replicates=ngs_technical_replicates.value, instruments=sequencing_instruments,distinct_libraries=2)

print('Technical replicates: {}'.format(top_mods_seq.technical_replicates))

assay_type1.topology_modifiers = top_mods_seq

plan.add_assay_type(assay_type1)
# sample_type = plan.sample_types[]
for element in plan.sample_types:
    if element.value.term == 'kidney':
        print(element.value.term)
        plan.add_assay_plan_record(element.value.term, assay_type1)
        assay_plan = next(iter(plan.assay_plan))
    elif element.value.term == 'blood':
        print("not for this assay type")
    else:
        print(element.value.term)

# if sample_type not in plan.sample_types:
#     print(sample_type + " Not in sample types")
#else:    
#    plan.add_assay_plan_record(sample_type, assay_type1)  # check the experiment description to obtain value.
   

print('Added assay plan: {0} -> {1}/{2}'.format(assay_plan[0].value.term, assay_plan[1].measurement_type.term, assay_plan[1].technology_type.term))
if len(top_mods_seq.instruments) > 0:
    print('Instruments: {}'.format(list(top_mods_seq.instruments)))

Technical replicates: 2
kidney
not for this assay type
Added assay plan: kidney -> transcription profiling/nucleotide sequencing
Instruments: ['Illumina HiSeq 2000']


In [64]:
print(json.dumps(plan, cls=SampleAssayPlanEncoder, sort_keys=True, indent=4, separators=(',', ': ')))

{
    "assay_plan": [
        {
            "assay_type": {
                "measurement_type": "transcription profiling",
                "technology_type": "nucleotide sequencing",
                "topology_modifiers": {
                    "distinct_libraries": 2,
                    "instruments": [
                        "Illumina HiSeq 2000"
                    ],
                    "technical_replicates": 2
                }
            },
            "sample_type": "kidney"
        }
    ],
    "assay_types": [
        {
            "measurement_type": "transcription profiling",
            "technology_type": "nucleotide sequencing",
            "topology_modifiers": {
                "distinct_libraries": 2,
                "instruments": [
                    "Illumina HiSeq 2000"
                ],
                "technical_replicates": 2
            }
        }
    ],
    "group_size": 1,
    "sample_plan": [
        {
            "sample_type": "kidney",
            "sa

### Dealing with Mass Spectrometry Data Acquisition Plan

In [65]:
chromatography_instruments = set()
ms_instruments = set()
injection_modes = set()
acquisition_modes = set()

ms_instruments.add('agilent') # check the experiment description to obtain value.
injection_modes.add('FIA')  # check the experiment description to obtain value.
acquisition_modes.add('positive') # check the experiment description to obtain value.
acquisition_modes.add('negative') # check the experiment description to obtain value.
ms_tech_rep = 1 # check the experiment description to obtain value.
top_mods_ms = MSAssayTopologyModifiers(technical_replicates=ms_tech_rep, injection_modes=injection_modes, acquisition_modes=acquisition_modes, instruments=ms_instruments, chromatography_instruments=chromatography_instruments)
assay_type2.topology_modifiers = top_mods_ms

if len(top_mods_ms.chromatography_instruments) > 0:
    print('Chromatography instruments: {}'.format(list(top_mods_ms.chromatography_instruments)))
else:
    print('no chromatography used or no information supplied')

if len(top_mods_ms.instruments) > 0:
    print('Data acquisition instruments: {}'.format(list(top_mods_ms.instruments)))    
if len(top_mods_ms.injection_modes) > 0:
    print('Injection modes: {}'.format(list(top_mods_ms.injection_modes)))
if len(top_mods_ms.acquisition_modes) > 0:
    print('Acquisition modes: {}'.format(list(top_mods_ms.acquisition_modes)))


plan.add_assay_type(assay_type2)
plan.add_assay_plan_record("blood", assay_type2)

assay_plan = next(iter(plan.assay_plan))



no chromatography used or no information supplied
Data acquisition instruments: ['agilent']
Injection modes: ['FIA']
Acquisition modes: ['negative', 'positive']


In [66]:
print(json.dumps(plan, cls=SampleAssayPlanEncoder, sort_keys=True, indent=4, separators=(',', ': ')))

{
    "assay_plan": [
        {
            "assay_type": {
                "measurement_type": "transcription profiling",
                "technology_type": "nucleotide sequencing",
                "topology_modifiers": {
                    "distinct_libraries": 2,
                    "instruments": [
                        "Illumina HiSeq 2000"
                    ],
                    "technical_replicates": 2
                }
            },
            "sample_type": "kidney"
        },
        {
            "assay_type": {
                "measurement_type": "metabolite profiling",
                "technology_type": "mass spectrometry",
                "topology_modifiers": {
                    "acquisition_modes": [
                        "negative",
                        "positive"
                    ],
                    "chromatography_instruments": [],
                    "injection_modes": [
                        "FIA"
                    ],
                    "

In [67]:
isa_investigation.studies = [isa_object_factory.create_assays_from_plan()]
for assay in isa_investigation.studies[-1].assays:
    print('Assay generated: {0}, {1} samples, {2} processes, {3} data files'
          .format(assay.filename, len(assay.samples), len(assay.process_sequence), len(assay.data_files)))
dataframes = dumpdf(isa_investigation)

A protocol with name "metabolite extraction" has already been declared in the study
Assay generated: a_kidney_dnaseq_Illumina HiSeq 2000_assay.txt, 18 samples, 108 processes, 36 data files
Assay generated: a_blood_ms_FIA_negative_assay.txt, 18 samples, 36 processes, 18 data files
Assay generated: a_blood_ms_FIA_positive_assay.txt, 18 samples, 36 processes, 18 data files


In [68]:
show_grid(dataframes[next(iter(dataframes.keys()))])

In [69]:
show_grid(dataframes['a_blood_ms_FIA_positive_assay.txt'])

In [70]:
show_grid(dataframes['a_blood_ms_FIA_negative_assay.txt'])