# <center>Workflow for on-line GC and HPLC analysis in flow chemistry</center>
# <center>2.1 Experimental notebook - Parsing</center>

---

This is the ``Experimental`` ``notebook`` ``2.1 "Parsing"``, where all the relevent data of the experiments are read in from different ressources. For each individual experiment this workflow is to be executed once, and the data can be appended to the project's dataset.

---

---
## Section 0: Imports, Paths, and Logging
---

In this section all the necessary python packages are imported, the path to this notebook and the logger for this notebook is set up.

Activate autoreload.

In [1]:
%reload_ext autoreload
%autoreload 2

Import standard library python packages necessary to set up the ``logger``.

In [2]:
import os
import json
import logging
import logging.config
from pathlib import Path

Get path to the directory this notebook is located.

In [3]:
root = Path(os.path.abspath(''))

Set path to the directory containing the configuration file for the logger.

In [4]:
logging_config_path = root / "datamodel_b07_tc/tools/logging/config_exp_2_1.json"

Set up logger by reading the .json-type configuration file.

In [5]:
with open(logging_config_path) as logging_config_json:
    logging_config = json.load(logging_config_json)
logging.config.dictConfig(logging_config)

Create a child of the root logger and set its name to the name of the current notebook.

In [6]:
logger = logging.getLogger(__name__)

Set the level of several third-party module loggers to avoid dumping too much information in the log file.
<div class="alert alert-block alert-info"><b>Info:</b> Some third party modules use the same logging module and structure as this notebook, which is unproblematic, unless the level of their corresponding logging handlers is too low. In these cases the logging messages of lower levels, such as 'DEBUG' and 'INFO' are propagated to the parent logger of this notebook.</div>

In [7]:
third_party_module_loggers = ['markdown_it', 'h5py', 'numexpr', 'git']
for logger_ in third_party_module_loggers:
    logging.getLogger(logger_).setLevel('WARNING')

Import and instantiate the ``Librarian`` module for efficient and clean file and directory handling.

In [8]:
from datamodel_b07_tc.tools import Librarian
librarian = Librarian(root_directory=root)

Import modfied sdRDM object.

In [9]:
from datamodel_b07_tc.modified.experiment import Experiment
from datamodel_b07_tc.modified.measurement import Measurement

<div class="alert alert-block alert-info"><b>Info:</b> Python objects created by the sdRDM generator can be equipped with additional features, such as functions or classes, e.g. to parse data or perform internal calculations, which allows for a more modular approach of working with them.</div>

Import the data model containing all the objects of sdRDM's python API.

In [10]:
# from sdRDM.generator import generate_python_api
from sdRDM import DataModel

<div class="alert alert-block alert-info"><b>Info:</b> sdRDM objects already imported are not overriden!</div>

Manually generate the sdRDM python objects.

In [11]:
# generate_python_api('specifications/datamodel_b07_tc.md', '', 'datamodel_b07_tc')

Import tools used for parsing and calibration of the raw data.

In [12]:
from datamodel_b07_tc.tools import Calibrator
from datamodel_b07_tc.tools import gc_parser
from datamodel_b07_tc.tools import gstatic_parser
from datamodel_b07_tc.tools import mfm_parser
# from DEXPI2sdRDM import DEXPI2sdRDM

Import additional standard library python packages.

In [13]:
import ipywidgets as widgets
from IPython.display import display

---
## Section 1: Dataset and data model parsing
---

In this section the data model and the dataset as well as all the output files necessary for the analysis notenook are parsed.  

Print available subdirectories of the 'root' directory.

In [14]:
root_subdirectories = librarian.enumerate_subdirectories(directory=root)

Parent directory: 
 /mnt/c/Users/rscho/Documents/GitHub/datamodel_b07_tc 
Available subdirectories:
0: .../.git
1: .../.github
2: .../.vscode
3: .../data
4: .../datamodel_b07_tc
5: .../datasets
6: .../logging
7: .../specifications


List all available dataset json files in the 'datasets' directory.

In [15]:
json_dataset_files = librarian.enumerate_files(directory=root_subdirectories[5], filter='json')

Directory: 
 /mnt/c/Users/rscho/Documents/GitHub/datamodel_b07_tc/datasets 
Available files:
0: b07.json


Choose dataset to be loaded by its index.

In [16]:
json_dataset = json_dataset_files[0]
dataset, lib = DataModel.parse(json_dataset)

Visualize the data model.

In [17]:
# lib.Dataset.meta_tree()

Print current status of the dataset.

In [18]:
# print(dataset.json())

Show directory tree of the ``root`` directory.

In [55]:
# librarian.visualize_directory_tree(directory=root, skip_directories=['.git'])

In [21]:
root_subdirectories

{0: PosixPath('/mnt/c/Users/rscho/Documents/GitHub/datamodel_b07_tc/.git'),
 1: PosixPath('/mnt/c/Users/rscho/Documents/GitHub/datamodel_b07_tc/.github'),
 2: PosixPath('/mnt/c/Users/rscho/Documents/GitHub/datamodel_b07_tc/.vscode'),
 3: PosixPath('/mnt/c/Users/rscho/Documents/GitHub/datamodel_b07_tc/data'),
 4: PosixPath('/mnt/c/Users/rscho/Documents/GitHub/datamodel_b07_tc/datamodel_b07_tc'),
 5: PosixPath('/mnt/c/Users/rscho/Documents/GitHub/datamodel_b07_tc/datasets'),
 6: PosixPath('/mnt/c/Users/rscho/Documents/GitHub/datamodel_b07_tc/logging'),
 7: PosixPath('/mnt/c/Users/rscho/Documents/GitHub/datamodel_b07_tc/specifications')}

In [22]:
data_subdirectories = librarian.enumerate_subdirectories(directory=root_subdirectories[3])

Parent directory: 
 /mnt/c/Users/rscho/Documents/GitHub/datamodel_b07_tc/data 
Available subdirectories:
0: .../calibration
1: .../correction_factors
2: .../faraday_coefficients
3: .../Rohdaten


In [23]:
raw_data_subdirectories = librarian.enumerate_subdirectories(directory=data_subdirectories[3])

Parent directory: 
 /mnt/c/Users/rscho/Documents/GitHub/datamodel_b07_tc/data/Rohdaten 
Available subdirectories:
0: .../01_EChem
1: .../02_GC
2: .../03_MFM
3: .../04_HPLC
4: .../05_Pressure


---
## Section 2: Potenstiostatic data parsing
---

Select path to the potentiostatic data and print available subdirectories.

In [24]:
echem_directories = librarian.enumerate_subdirectories(directory=raw_data_subdirectories[0])

Parent directory: 
 /mnt/c/Users/rscho/Documents/GitHub/datamodel_b07_tc/data/Rohdaten/01_EChem 
Available subdirectories:
0: .../210728_ITO_TEST
1: .../CAD14-Cu@AB


Select subdirectory by its index and print raw data files available in there.

In [25]:
potentiostatic_raw_data_files = librarian.enumerate_files(directory=echem_directories[1], filter='DTA')

Directory: 
 /mnt/c/Users/rscho/Documents/GitHub/datamodel_b07_tc/data/Rohdaten/01_EChem/CAD14-Cu@AB 
Available files:
0: GSTATIC.DTA
1: POTDYN.DTA


Extract the metadata from it using the ``GstaticParser`` and load into the data model.

In [26]:
potentiostatic_metadata_df, potentiostatic_measurement = Measurement.from_parser(parser=gstatic_parser, metadata_path=potentiostatic_raw_data_files[0])
# potentiostatic_metadata_df

---
## Section 3: MFM data parsing
---

Provide name of the subdirectory containing the mass flow meter measurement data.

In [27]:
mfm_directories = librarian.enumerate_subdirectories(directory=raw_data_subdirectories[2])

Parent directory: 
 /mnt/c/Users/rscho/Documents/GitHub/datamodel_b07_tc/data/Rohdaten/03_MFM 
Available subdirectories:
0: .../CAD14-Cu@AB


In [28]:
mfm_raw_data_files = librarian.enumerate_files(directory=mfm_directories[0], filter='csv')

Directory: 
 /mnt/c/Users/rscho/Documents/GitHub/datamodel_b07_tc/data/Rohdaten/03_MFM/CAD14-Cu@AB 
Available files:
0: Bench-2h-GSS_CAD14-Cu@AB_200_50c_24h.csv
1: Bench-2h-GSS_CAD14-Cu@AB_200_50c_24h_truncated.csv


In [29]:
mfm_experimental_data_df, mfm_measurement = Measurement.from_parser(parser=mfm_parser, experimental_data_path=mfm_raw_data_files[0])
mfm_experimental_data_df

Unnamed: 0,Date time,Time,Signal,Volumetric flow rate
0,2023-02-06 09:58:48,0,3258,5.090180
1,2023-02-06 09:58:50,2,3267,5.104674
2,2023-02-06 09:58:52,4,3273,5.114520
3,2023-02-06 09:58:54,6,3278,5.122616
4,2023-02-06 09:58:56,8,3290,5.139893
...,...,...,...,...
2848,2023-02-06 11:33:44,5696,3210,5.015965
2849,2023-02-06 11:33:46,5698,3204,5.006263
2850,2023-02-06 11:33:48,5700,3202,5.003840
2851,2023-02-06 11:33:50,5702,3204,5.006141


---
## Section 4: GC data parsing
---

In [30]:
gc_directories = librarian.enumerate_subdirectories(directory=raw_data_subdirectories[1])

Parent directory: 
 /mnt/c/Users/rscho/Documents/GitHub/datamodel_b07_tc/data/Rohdaten/02_GC 
Available subdirectories:
0: .../CAD14-Cu@AB


In [31]:
gc_subdirectories = librarian.enumerate_subdirectories(directory=gc_directories[0])

Parent directory: 
 /mnt/c/Users/rscho/Documents/GitHub/datamodel_b07_tc/data/Rohdaten/02_GC/CAD14-Cu@AB 
Available subdirectories:
0: .../JH-1H 2023-02-06 10-00-18


In [32]:
gc_subsubdirectories = librarian.enumerate_subdirectories(directory=gc_subdirectories[0])

Parent directory: 
 /mnt/c/Users/rscho/Documents/GitHub/datamodel_b07_tc/data/Rohdaten/02_GC/CAD14-Cu@AB/JH-1H 2023-02-06 10-00-18 
Available subdirectories:
0: .../JH_GASPRODUKTE.M
1: .../JH_GASPRODUKTE_30MIN.M
2: .../NV-F0101.D
3: .../NV-F0102.D
4: .../NV-F0103.D
5: .../NV-F0104.D
6: .../NV-F0201.D


In [33]:
gc_raw_data_files_list = []
gc_raw_data_files_1 = librarian.enumerate_files(directory=gc_subsubdirectories[3], filter='CSV')
gc_raw_data_files_list.append(gc_raw_data_files_1)

Directory: 
 /mnt/c/Users/rscho/Documents/GitHub/datamodel_b07_tc/data/Rohdaten/02_GC/CAD14-Cu@AB/JH-1H 2023-02-06 10-00-18/NV-F0102.D 
Available files:
0: report00.CSV
1: REPORT01.CSV


In [34]:
gc_raw_data_files_2 = librarian.enumerate_files(directory=gc_subsubdirectories[4], filter='CSV')
gc_raw_data_files_list.append(gc_raw_data_files_2)

Directory: 
 /mnt/c/Users/rscho/Documents/GitHub/datamodel_b07_tc/data/Rohdaten/02_GC/CAD14-Cu@AB/JH-1H 2023-02-06 10-00-18/NV-F0103.D 
Available files:
0: report00.CSV
1: REPORT01.CSV


In [35]:
gc_raw_data_files_3 = librarian.enumerate_files(directory=gc_subsubdirectories[5], filter='CSV')
gc_raw_data_files_list.append(gc_raw_data_files_3)

Directory: 
 /mnt/c/Users/rscho/Documents/GitHub/datamodel_b07_tc/data/Rohdaten/02_GC/CAD14-Cu@AB/JH-1H 2023-02-06 10-00-18/NV-F0104.D 
Available files:
0: report00.CSV
1: REPORT01.CSV


In [36]:
gc_experimental_data_df_list = []
gc_metadata_df_list = []
gc_measurements_list = []
for data_file in gc_raw_data_files_list:
    gc_metadata_df, gc_experimental_data_df, gc_measurement = Measurement.from_parser(
        parser=gc_parser,
        metadata_path=data_file[0],
        experimental_data_path=data_file[1]
    )
    gc_experimental_data_df_list.append(gc_experimental_data_df)
    gc_metadata_df_list.append(gc_metadata_df)
    gc_measurements_list.append(gc_measurement)

Show first set of GC experimental data.

In [37]:
gc_experimental_data_df_list[0]

Unnamed: 0,Peak number,Retention time,Signal,Peak type,Peak area,Peak height,Peak area percentage
0,1,1.729967,1,PBAN,69.171577,32.512886,0.098238
1,2,2.909973,1,BBA,65492.746094,3794.478271,93.013605
2,3,3.43423,2,BV,164.157028,43.253098,0.233138
3,4,3.657794,2,VB,141.173935,49.408844,0.200497
4,5,6.045472,2,BB,1624.07373,347.834717,2.30653
5,6,12.997822,1,BB,2876.952637,88.829025,4.085884
6,7,14.194683,2,BB,43.731697,14.139935,0.062108


Show first set of GC metadata.

In [38]:
gc_metadata_df_list[0]

Unnamed: 0,parameter,value,description
0,Sample Name,,
1,Sample Info,,
2,Data File,D:\GC\Kurz\CAD14-Cu@AB\JH-1H 2023-02-06 10-00-18\,NV-F0102.D
3,Acq. Instrument,Instrument 1,
4,Analysis Method,D:\GC\Kurz\CAD14-Cu@AB\JH-1H 2023-02-06 10-00-18\,JH_GASPRODUKTE.M
5,Method Info,,
6,Results Created,06.02.2023 10:32:26,
7,Results Created by,MS,
8,Acq. Method,JH_GASPRODUKTE.M,
9,Injection Date,"06-Feb-23, 10:17:24",


In [39]:
# hplc_path = raw_data_path / '04_HPLC'
# pressure_path = raw_data_path / '05_Pressure'

Instantiate an experiment object which will hold all the information about one single experiment and append all the measurement objects.

In [40]:
experiment = Experiment()
experiment.measurements = [potentiostatic_measurement, mfm_measurement, *gc_measurements_list]

---
## Section 4: Calibration data parsing
---

Search for calibation files in the 'calibration' directory.

In [41]:
calibration_files = librarian.enumerate_files(directory=data_subdirectories[0])

Directory: 
 /mnt/c/Users/rscho/Documents/GitHub/datamodel_b07_tc/data/calibration 
Available files:
0: calibration.json


Initialize calibrator with an available calibration file selected by its index.

In [42]:
calibrator = Calibrator.from_json_file(path_to_json_file=calibration_files[0])

Calibrate and return analysis object with calibration parameters just computed. <br> Append the resulting SpeciesData objects to the experiment object.

In [43]:
species_data_list = calibrator.calibrate()
experiment.species_data = species_data_list

---
## Section 5: Parsing auxiliary data
---

### Correction factors

Search for correction factors files in the 'correction factors' directory.

In [44]:
correction_factors_files = librarian.enumerate_files(directory=data_subdirectories[1])

Directory: 
 /mnt/c/Users/rscho/Documents/GitHub/datamodel_b07_tc/data/correction_factors 
Available files:
0: correction_factors.json


Load correction factors into the experiment object.

In [45]:
experiment.read_correction_factors(correction_factors_files[0])

### Farady coefficients

Search for faraday coefficients files in the 'correction factors' directory.

In [46]:
faraday_coefficients_files = librarian.enumerate_files(directory=data_subdirectories[2])

Directory: 
 /mnt/c/Users/rscho/Documents/GitHub/datamodel_b07_tc/data/faraday_coefficients 
Available files:
0: faraday_coefficients.json


Load faraday coefficients into the experiment object.

In [47]:
experiment.read_faraday_coefficients(faraday_coefficients_files[0])

### Electrode surface area

Set value for the surface area of the electrode.

In [48]:
electrode_surface_area = 1.0 # cm^2

---
## Section 6: Appending parsed data to dataset
---

Print current state of experiment object.

In [49]:
# print(experiment.json())

Append experiment object to the dataset.

In [50]:
dataset.experiments.append(experiment)

Replace 'old' dataset by its extended version containing all the parsef data.

In [51]:
with open(json_dataset, "w") as f:
    f.write(dataset.json())

In [52]:
# button = widgets.Button(description="Append experiment", layout=widgets.Layout(width='30%', height='80px'))
# button.style.button_color = 'darkcyan'
# button.style.text_color = 'lightgrey'
# button.style.font_size = '30px'


# output = widgets.Output()

# display(button, output)

# def click_on_button(b):
#     with output:
#         print("Experiment successfully appended.")

# button.on_click(click_on_button)

In [53]:
%%html
<style>
.cell-output-ipywidget-background {
    background-color: transparent !important;
}
:root {
    --jp-widgets-color: var(--vscode-editor-foreground);
    --jp-widgets-font-size: var(--vscode-editor-font-size);
}  
</style>