# <center>Workflow for on-line GC and HPLC analysis in flow chemistry</center>
# <center>2.1 Experimental notebook - Parsing</center>

---

This is the ``Experimental`` ``notebook`` ``2.1 "Parsing"``, where all the relevent data of the experiments are read in from different ressources. For each individual experiment this workflow is to be executed once, and the data can be appended to the project's dataset.

---

---
## Section 0: Imports, Paths, and Logging
---

In this section all the necessary python packages are imported, the path to this notebook and the logger for this notebook is set up.

In [6]:
# Activate autoreload to keep on track with changing modules #
%reload_ext autoreload
%autoreload 2

# Import standard libraries #
import os
import json
import logging
from pathlib import Path
import ipywidgets as widgets
from IPython.display import display

# Import librarian module for file directory handling #
from datamodel_b07_tc.tools import Librarian

# Import modified sdRDM objects #
from datamodel_b07_tc.modified.experiment import Experiment
from datamodel_b07_tc.modified.measurement import Measurement
from datamodel_b07_tc.modified.plantsetup import PlantSetup

# Import datamodel from sdRDM #
from sdRDM import DataModel

# Import tools for parsing and calibration of the raw data #
from datamodel_b07_tc.tools import Calibrator
from datamodel_b07_tc.tools import gc_parser
from datamodel_b07_tc.tools import gstatic_parser
from datamodel_b07_tc.tools import mfm_parser
# from datamodel_b07_tc.tools import DEXPI2sdRDM

# from sdRDM.generator import generate_python_api
# generate_python_api('specifications/datamodel_b07_tc.md', '', 'datamodel_b07_tc')

In [7]:
#Define paths for loggin output #
root                = Path.cwd()
logging_config_path = root / "datamodel_b07_tc/tools/logging/config_exp_2_1.json"

# Read in logger specs and configurate logger (set name to current notebook) #
with open(logging_config_path) as logging_config_json: logging.config.dictConfig( json.load( logging_config_json ) )
logger = logging.getLogger(__name__)

# Set the level of thid-party logger to avoid dumping too much information #
third_party_module_loggers = ['markdown_it', 'h5py', 'numexpr', 'git']
for logger_ in third_party_module_loggers: logging.getLogger(logger_).setLevel('WARNING')

# Initialize the librarian with root directory of this notebook #
librarian = Librarian(root_directory=root)

# Info for loggers #
# Some third party modules use the same logging module and structure as this notebook, which is unproblematic, 
# unless the level of their corresponding logging handlers is too low. In these cases the logging messages of 
# lower levels, such as 'DEBUG' and 'INFO' are propagated to the parent logger of this notebook.</div>

---
## Section 1: Dataset and data model parsing
---
In this section the data model and the dataset as well as all the output files necessary for the analysis notenook are parsed.  

In [29]:
# Check for all available subdirectories #
root_subdirectories = librarian.enumerate_subdirectories(directory=root)
print("\n")

# Search for subdirectory "datasets" and in it for all dataset json files #
idx_dataset        = [i for i in range(len(root_subdirectories)) if str(root_subdirectories[i]).split("/")[-1] == "datasets" ][0]
json_dataset_files = librarian.enumerate_files(directory=root_subdirectories[idx_dataset], filter='json')
print("\n")

# Choose dataset: use the index given. e.g.: 0, 1, .. #
json_dataset = json_dataset_files[0]
dataset, lib = DataModel.parse(json_dataset)

# If wanted visualize the datamodel as tree (if not then commen this line) #
#lib.Dataset.meta_tree()

# Find the data folder #
idx_datafolder          = [i for i in range(len(root_subdirectories)) if str(root_subdirectories[i]).split("/")[-1] == "data" ]
data_subdirectories     = librarian.enumerate_subdirectories(directory=root_subdirectories[idx_datafolder])

# Find the raw data folders #
idx_rawdatafolder       = [i for i in range(len(data_subdirectories)) if str(data_subdirectories[i]).split("/")[-1] == "Rohdaten" ]
raw_data_subdirectories = librarian.enumerate_subdirectories(directory=data_subdirectories[idx_rawdatafolder])

Parent directory: 
 /Users/samir/Documents/PhD/SFB1333/datamodel_b07_tc 
Available subdirectories:
0: .../specifications
1: .../datasets
2: .../datamodel_b07_tc
3: .../.github
4: .../.git
5: .../data


Directory: 
 /Users/samir/Documents/PhD/SFB1333/datamodel_b07_tc/datasets 
Available files:
0: b07.json




---
## Section 2: Plant setup parsing
---

Instantiate 'experiment' object.

In [22]:
experiment = Experiment()

## Get plan setup given as in dexpi format ##

#idx_rawdatafolder       = [i for i in range(len(data_subdirectories)) if str(data_subdirectories[i]).split("/")[-1] == "plant_setup" ]
#plant_setup_files = librarian.enumerate_files(dirctory=data_subdirectories[idx_rawdatafolder], filter='xml')
#plant_setup = PlantSetup.from_parser(parser=DEXPI2sdRDM, path=plant_setup_files[0])
#experiment.plant_setup = plant_setup

---
## Section 3: Potenstiostatic data parsing
---
Select path to the potentiostatic data and print available subdirectories.

In [None]:
# Search for the electrochemical data folder #
idx_echem_direct  = [i for i in range(len(raw_data_subdirectories)) if str(raw_data_subdirectories[i]).split("/")[-1] == "01_EChem" ]
echem_directories = librarian.enumerate_subdirectories(directory=raw_data_subdirectories[idx_echem_direct])

# Serach the potentiostatic files #
idx_potentiostatic_files      = [i for i in range(len(echem_directories)) if str(echem_directories[i]).split("/")[-1] == "CAD14-Cu@AB" ]
potentiostatic_raw_data_files = librarian.enumerate_files(directory=echem_directories[idx_potentiostatic_files], filter='DTA')

# Read in the gstatic data #
idx_gstatic_dat   = [i for i in range(len(potentiostatic_raw_data_files)) if str(potentiostatic_raw_data_files[i]).split("/")[-1] == "GSTATIC.DTA" ]
potentiostatic_metadata_df, potentiostatic_measurement = Measurement.from_parser(parser=gstatic_parser, metadata_path=potentiostatic_raw_data_files[idx_gstatic_dat])

---
## Section 4: MFM data parsing
---
Provide name of the subdirectory containing the mass flow meter measurement data.

In [29]:
# Search for mass flow meter data folder #
idx_mfm_direct  = [i for i in range(len(raw_data_subdirectories)) if str(raw_data_subdirectories[i]).split("/")[-1] == "03_MFM" ]
mfm_directories = librarian.enumerate_subdirectories(directory=raw_data_subdirectories[idx_mfm_direct])

# Serach for the csv output files #
idx_mfm_files      = [i for i in range(len(mfm_directories)) if str(mfm_directories[i]).split("/")[-1] == "CAD14-Cu@AB" ]
mfm_raw_data_files = librarian.enumerate_files(directory=mfm_directories[idx_mfm_files], filter='csv')

# Manually select the wanted csv file and read it in #
idx_mfm_file       = 0
mfm_experimental_data_df, mfm_measurement = Measurement.from_parser(parser=mfm_parser, experimental_data_path=mfm_raw_data_files[idx_mfm_file])

Parent directory: 
 /mnt/c/Users/rscho/Documents/GitHub/datamodel_b07_tc/data/Rohdaten/03_MFM 
Available subdirectories:
0: .../CAD14-Cu@AB


---
## Section 5: GC data parsing
---

In [32]:
# Search for mass flow meter data folder #
idx_gc_direct  = [i for i in range(len(raw_data_subdirectories)) if str(raw_data_subdirectories[i]).split("/")[-1] == "02_GC" ]
gc_directories = librarian.enumerate_subdirectories(directory=raw_data_subdirectories[idx_gc_direct])

# Serach for gc subdirectories #
idx_gc_sub_dir    = [i for i in range(len(gc_directories)) if str(gc_directories[i]).split("/")[-1] == "CAD14-Cu@AB" ]
gc_subdirectories = librarian.enumerate_subdirectories(directory=gc_directories[idx_gc_sub_dir])

# Select subdirectory of wanted experiment from given directories #
idx_gc_sub_sub_dir   = 0
gc_subsubdirectories = librarian.enumerate_subdirectories(directory=gc_subdirectories[idx_gc_sub_sub_dir])

# Gather all the gc raw data files #
gc_raw_data_files_list = []

# Select the indices of the subdirectories that should be read in
gc_raw_data_subdir_idx = [ 3, 4, 5 ] 

gc_raw_data_files_list = [librarian.enumerate_files(directory=gc_subsubdirectories[i], filter='CSV') for i in gc_raw_data_subdir_idx]

# Read out all the data from the provided gc files #
gc_experimental_data_df_list = []
gc_metadata_df_list = []
gc_measurements_list = []

for data_file in gc_raw_data_files_list:
    gc_metadata_df, gc_experimental_data_df, gc_measurement = Measurement.from_parser(
        parser=gc_parser,
        metadata_path=data_file[0],
        experimental_data_path=data_file[1]
    )
    gc_experimental_data_df_list.append(gc_experimental_data_df)
    gc_metadata_df_list.append(gc_metadata_df)
    gc_measurements_list.append(gc_measurement)

Parent directory: 
 /mnt/c/Users/rscho/Documents/GitHub/datamodel_b07_tc/data/Rohdaten/02_GC 
Available subdirectories:
0: .../CAD14-Cu@AB


In [None]:
## Combine all gathered data in the experiments object ##

experiment.measurements = [potentiostatic_measurement, mfm_measurement, *gc_measurements_list]

---
## Section 6: Calibration data parsing
---

Search for calibation files in the 'calibration' directory.

In [42]:
calibration_files = librarian.enumerate_files(directory=data_subdirectories[0])

Directory: 
 /mnt/c/Users/rscho/Documents/GitHub/datamodel_b07_tc/data/calibration 
Available files:
0: calibration.json


Initialize calibrator with an available calibration file selected by its index.

In [43]:
calibrator = Calibrator.from_json_file(path_to_json_file=calibration_files[0])

Calibrate and return analysis object with calibration parameters just computed. <br> Append the resulting SpeciesData objects to the experiment object.

In [44]:
species_data_list = calibrator.calibrate()
experiment.species_data = species_data_list

---
## Section 7: Parsing auxiliary data
---

### Correction factors

Search for correction factors files in the 'correction factors' directory.

In [45]:
correction_factors_files = librarian.enumerate_files(directory=data_subdirectories[1])

Directory: 
 /mnt/c/Users/rscho/Documents/GitHub/datamodel_b07_tc/data/correction_factors 
Available files:
0: correction_factors.json


Load correction factors into the experiment object.

In [46]:
experiment.read_correction_factors(correction_factors_files[0])

### Farady coefficients

Search for faraday coefficients files in the 'correction factors' directory.

In [47]:
faraday_coefficients_files = librarian.enumerate_files(directory=data_subdirectories[2])

Directory: 
 /mnt/c/Users/rscho/Documents/GitHub/datamodel_b07_tc/data/faraday_coefficients 
Available files:
0: faraday_coefficients.json


Load faraday coefficients into the experiment object.

In [48]:
experiment.read_faraday_coefficients(faraday_coefficients_files[0])

### Electrode surface area

Set value for the surface area of the electrode.

In [49]:
electrode_surface_area = 1.0 # cm^2

---
## Section 8: Appending parsed data to dataset
---

Print current state of experiment object.

In [50]:
# print(experiment.json())

Append experiment object to the dataset.

In [51]:
dataset.experiments.append(experiment)

Replace 'old' dataset by its extended version containing all the parsef data.

In [52]:
with open(json_dataset, "w") as f:
    f.write(dataset.json())