#### Create ISA-API Investigation from Datascriptor Study Design configuration
## Cross-over study with two drug treatments on humans

In this notebook, we will show you how you can use a study design configuration is JSON format as produce by datascriptor (https://gitlab.com/datascriptor/datascriptor) to generate a single-study ISA investigation and how you can then serialise it in JSON and tabular (i.e. CSV) format.

The study design configuration consists of:
- a 4-arm study design. Each arm has 10 subjects
- Subjects are humans. 
    - There is an observational factor, named "status" with two values: "healthy" and "diseased"
- a crossover of two drug treatments: a proper treatment ("hypertena" 20 mg/day for 14 days) and a control treatment ("placebo" 20 mg/day for 14 days)
- three non-treatment phases: screen (7 days), washout (14 days) and follow-up (180 days)
- two sample types collected: blood and saliva
- two assay types: 
    - DNA methylation profiling using nucleic acid sequencing on saliva samples
    - clinical chemistry with marker on blood samples

## 1. Setup

Let's import all the required libraries

In [None]:
from time import time
import os
import json

In [None]:
## ISA-API related imports
from isatools.model import Investigation, Study

In [None]:
## ISA-API create mode related imports
from isatools.create.model import StudyDesign
from isatools.create.connectors import generate_study_design

# serializer from ISA Investigation to JSON
from isatools.isajson import ISAJSONEncoder

# ISA-Tab serialisation
from isatools import isatab

In [None]:
## ISA-API create mode related imports
from isatools.create import model
from isatools import isajson

## 2. Load the Study Design JSON configuration

First of all we load the study design configurator with all the specs defined above

In [None]:
with open(os.path.abspath(os.path.join(
    "isa-study-design-as-json", "datascriptor", "crossover-study-human.json"
)), "r") as config_file:
    study_design_config = json.load(config_file)

## 3. Generate the ISA Study Design from the JSON configuration
To perform the conversion we just need to use the function `generate_isa_study_design()` (name possibly subject to change, should we drop the "isa" and "datascriptor" qualifiers?)

In [None]:
study_design = generate_study_design(study_design_config)
assert isinstance(study_design, StudyDesign)

## 4. Generate the ISA Study from the StudyDesign and embed it into an ISA Investigation

The `StudyDesign.generate_isa_study()` method returns the complete ISA-API `Study` object.

In [None]:
start = time()
study = study_design.generate_isa_study()
end = time()
print('The generation of the study design took {:.2f} s.'.format(end - start))
assert isinstance(study, Study)
investigation = Investigation(identifier='inv01', studies=[study])

## 5. Serialize and save the JSON representation of the generated ISA Investigation

In [None]:
start = time()
inv_json = json.dumps(investigation, cls=ISAJSONEncoder, sort_keys=True, indent=4, separators=(',', ': '))
end = time()
print('The JSON serialisation of the ISA investigation took {:.2f} s.'.format(end - start))

In [None]:
directory = os.path.abspath(os.path.join('notebook-output/isa-study-from-design-config/', 'crossover-2-treatments-human'))
os.makedirs(directory, exist_ok=True)
with open(os.path.abspath(os.path.join(directory, 'isa-investigation-crossover-2-treatments-human.json')), 'w') as out_fp:
    json.dump(json.loads(inv_json), out_fp)

## 6. Dump the ISA Investigation to ISA-Tab

In [None]:
start = time()
isatab.dump(investigation, directory)
end = time()
print('The Tab serialisation of the ISA investigation took {:.2f} s.'.format(end - start))

To use them on the notebook we can also dump the tables to pandas DataFrames, using the `dump_tables_to_dataframes` function rather than dump

In [None]:
dataframes = isatab.dump_tables_to_dataframes(investigation)

In [None]:
len(dataframes)

## 7. Check the correctness of the ISA-Tab DataFrames 

We have 1 study file and 2 assay files (one for MS and one for NMR). Let's check the names:

In [None]:
for key in dataframes.keys():
    display(key)

### 7.1 Count of subjects and samples

We have 10 subjects in the each of the 4 arms for a total of 40 subjects.

We collect:
- 5 blood samples per subject (50 samples * 4 arms = 200 total samples)
- 2 saliva samples per subject (20 samples * 4 arms = 80 total samples)

Across the 4 study arms a total of 280 samples are collected (70 samples per arm)

In [None]:
study_frame = dataframes['s_study_01.txt']
count_arm0_samples = len(study_frame[study_frame['Source Name'].apply(lambda el: 'GRP0' in el)])
count_arm1_samples = len(study_frame[study_frame['Source Name'].apply(lambda el: 'GRP1' in el)])
count_arm2_samples = len(study_frame[study_frame['Source Name'].apply(lambda el: 'GRP2' in el)])
count_arm3_samples = len(study_frame[study_frame['Source Name'].apply(lambda el: 'GRP3' in el)])
print("There are {} samples in the GRP0 arm (i.e. group)".format(count_arm0_samples))
print("There are {} samples in the GRP1 arm (i.e. group)".format(count_arm1_samples))
print("There are {} samples in the GRP2 arm (i.e. group)".format(count_arm2_samples))
print("There are {} samples in the GRP3 arm (i.e. group)".format(count_arm3_samples))

### 7.2 Study Table Overview

The study table provides an overview of the subjects (sources) and samples

In [None]:
study_frame

### 7.3 First Assay: DNA Methylation Profiling using nucleic acid sequencing

This assay takes urine samples as input

In [None]:
dataframes['a_AT5_DNA-methylation-profiling_nucleic-acid-sequencing.txt']

#### 7.3.1 Nucleic acid sequencing stats Stats

For this assay we have 280 samples. 280 DNA extracts are extracted from the samples. The 280 extracts are subsequently labeled. For each labeled extract, 4 mass.spec analyses are run (using Agilent QTQF 6510, positive acquisition mode, 2 replicates each for LC and FIA injection mode), for a total of 1120 mass. spec. processes and 1120 raw spectral data files

In [None]:
dataframes['a_AT5_DNA-methylation-profiling_nucleic-acid-sequencing.txt'].nunique(axis=0, dropna=True)

### 7.4 Second Assay: Clinical Chemistry Marker Panel

This assay takes blood samples as input

In [None]:
dataframes['a_AT11_clinical-chemistry_marker-panel.txt']

#### 7.4.1 Marker Panel Stats

For this assay we use 320 blood samples. For each sample three chemical marker assays are run, producing a total of 960 sample preparation processes and 960 raw data files

In [None]:
dataframes['a_AT11_clinical-chemistry_marker-panel.txt'].nunique(axis=0, dropna=True)