# <center>Workflow for the CRC1333 project B07 - Technical Chemistry</center>
# <center>Experimental notebook</center>
# <center>2.1 Parsing</center>

---

This is the ``Experimental`` ``notebook`` ``2.1 Parsing``, where all the relevent data of the experiments are read in from different ressources. For each individual experiment this workflow is to be executed once, and the data can be appended to the project's dataset.

---

In [1]:
from datamodel_b07_tc.modified.experiment import Experiment

In [2]:
# from sdRDM.generator import generate_python_api
from sdRDM import DataModel

In [3]:
# generate_python_api('specifications/datamodel_b07_tc.md', '', 'datamodel_b07_tc')

Import standard library python packages.

In [4]:
%reload_ext autoreload
%autoreload 2

from datamodel_b07_tc.tools import Calibrator
from datamodel_b07_tc.tools import GCParser
from datamodel_b07_tc.tools import GstaticParser
from datamodel_b07_tc.tools import MFMParser
# from DEXPI2sdRDM import DEXPI2sdRDM

In [5]:
import os
import json
import ipywidgets as widgets
import logging
import logging.config
from IPython.display import display
from pathlib import Path

---
## Section 0: Paths and Logging
---

Get path to the directory this notebook is located and check for correctness.

In [6]:
root = Path(os.path.abspath(''))
print("Path to this notebook's location:", root)
print('Is the path valid?', root.is_dir())

Path to this notebook's location: /mnt/c/Users/rscho/Documents/GitHub/datamodel_b07_tc
Is the path valid? True


Set path for the logger.

In [7]:
config_path = root / "datamodel_b07_tc/tools/logging/config.json"
print(config_path)

/mnt/c/Users/rscho/Documents/GitHub/datamodel_b07_tc/datamodel_b07_tc/tools/logging/config.json


Set up logger.

In [8]:
logging_config_path = root / "datamodel_b07_tc/tools/logging/config.json"
with open(logging_config_path) as logging_config_json:
    logging_config = json.load(logging_config_json)

In [9]:
logging.config.dictConfig(logging_config)

In [10]:
logger = logging.getLogger(__name__)
logger.debug("obacht")
logger.warning('uff')



---
## Section 1: Dataset and data model parsing
---

In this section the data model and the dataset as well as all the output files necessary for analysis are parsed.  

 Set path to datasets.

In [11]:
path_to_datasets = root / 'datasets'

List all available datasets in the directory.


In [12]:
files = path_to_datasets.iterdir()
json_files = {index:file for index, file in enumerate(files) if file.suffix == '.json'}
for index, file in json_files.items():
    print(f'{index}: {file.name}')

0: b07.json


Choose dataset to be loaded by its index.

In [13]:
index_dataset = 0
dataset, lib = DataModel.parse(json_files[index_dataset])

Visualize the data model.

In [14]:
# lib.Dataset.meta_tree()

Print current status of the dataset.

In [15]:
# print(dataset.json())

Set path to the directory containing all relevant data.

In [16]:
path_data = root / 'data'

Set path for the measurement (raw) data.

In [17]:
# raw_data_path = Path('F:\Doktorand\\03_Messungen\Rohdaten')
path_raw_data = path_data / 'Rohdaten'

Instantiate an experiment object which will hold all the information about one single experiment.

In [18]:
experiment = Experiment()

---
## Section 2: Potenstiostatic data parsing
---

Provide name of the directory containing the potentiostatic measurement data.

In [19]:
path_echem = path_raw_data / '01_EChem'

Search in that directory for further subdirectories and print them.

In [20]:
subdirectories_echem = {index:directory for index, directory in enumerate(path_echem.iterdir())}
for index, directory in subdirectories_echem.items():
    print(f"{index}: {directory.name}")

0: 210728_ITO_TEST
1: CAD14-Cu@AB


Choose one of the found subdirectory by its index.

In [21]:
subdirectory_index_echem = 1
selected_subdirectory_echem = subdirectories_echem[subdirectory_index_echem]
print(selected_subdirectory_echem)

/mnt/c/Users/rscho/Documents/GitHub/datamodel_b07_tc/data/Rohdaten/01_EChem/CAD14-Cu@AB


Provide suffix of the file that contains the data.

In [22]:
suffix_echem = 'DTA'

Initialize the ``GstaticParser`` and print available files.

In [23]:
gstaticparser = GstaticParser(selected_subdirectory_echem, suffix_echem)
files_dict_echem = gstaticparser.available_files
for index, gstatic_file in files_dict_echem.items():
    print(f"{index}: {gstatic_file.stem}")

0: GSTATIC
1: POTDYN


Chose file to be parsed by its index.

In [24]:
file_index_echem = 0
file_echem = files_dict_echem[file_index_echem]
file_echem.name

'GSTATIC.DTA'

Extract the metadata from it using the ``GstaticParser`` and load into the data model.

In [25]:
gstatic_metadata_df, gstatic_metadata = gstaticparser.extract_metadata(file_index_echem)
potentiometric_measurement = lib.Measurement(measurement_type=lib.enums.MeasurementType.POTENTIOSTATIC.value, metadata=gstatic_metadata)
experiment.measurements = [potentiometric_measurement]
gstatic_metadata_df

Unnamed: 0,Parameter,Data_type,Value,Description
0,PSTAT,PSTAT,REF3000-19129,Potentiostat
1,IINIT,QUANT,-2.00000E+002,Initial I (mA/cm^2)
2,TINIT,QUANT,3.60000E+003,Initial Time (s)
3,IFINAL,QUANT,-2.00000E+002,Final I (mA/cm^2)
4,TFINAL,QUANT,0.00000E+000,Final Time (s)
5,SAMPLETIME,QUANT,1.00000E+000,Sample Period (s)
6,AREA,QUANT,1.00000E+000,Sample Area (cm^2)
7,DENSITY,QUANT,7.87000E+000,Density (g/cm^3)
8,EQUIV,QUANT,2.79200E+001,Equiv. Wt
9,IRCOMP,TOGGLE,T,IR Comp


---
## Section 3: MFM data parsing
---

Provide name of the subdirectory containing the mass flow meter measurement data.

In [26]:
path_mfm = path_raw_data / '03_MFM'

Search directory for further subdirectories and print them.

In [27]:
subdirectories_mfm = {index:directory for index, directory in enumerate(path_mfm.iterdir())}
for index, directory in subdirectories_mfm.items():
    print(f"{index}: {directory.name}")

0: CAD14-Cu@AB


Choose one of the found subdirectory by its index.

In [28]:
subdirectory_index_mfm = 0
selected_subdirectory_mfm = subdirectories_mfm[subdirectory_index_mfm]
print(selected_subdirectory_mfm)

/mnt/c/Users/rscho/Documents/GitHub/datamodel_b07_tc/data/Rohdaten/03_MFM/CAD14-Cu@AB


Provide suffix of the file that contains the data.

In [29]:
suffix_mfm = 'csv'

Instantiate the ``MFMParser`` to parse MFM output files and show available files in the selected directory.

In [30]:
mfmparser = MFMParser(selected_subdirectory_mfm, suffix_mfm)
files_dict_mfm = mfmparser.available_files
for index, mfm_file in files_dict_mfm.items():
    print(f"{index}: {mfm_file.name}")

0: Bench-2h-GSS_CAD14-Cu@AB_200_50c_24h.csv
1: Bench-2h-GSS_CAD14-Cu@AB_200_50c_24h_truncated.csv


Chose file to be parsed by its index.

In [31]:
file_index_mfm = 0
file_mfm = files_dict_mfm[file_index_mfm]
file_mfm.name

'Bench-2h-GSS_CAD14-Cu@AB_200_50c_24h.csv'

Extract the experimental data from it using the ``MFMParser`` and load into the data model.

In [32]:
experimental_data_df_mfm, experimental_data_dict_mfm = mfmparser.extract_exp_data(file_index_mfm)
mfm = lib.Measurement(
            measurement_type=lib.enums.MeasurementType.MFM.value,
            experimental_data=[value for value in experimental_data_dict_mfm.values()],
        )
experiment.measurements.append(mfm)
experimental_data_df_mfm

Unnamed: 0,Datetime,Time,Signal,Flow_rate
0,2023-02-06 09:58:48,0,3258,5.090180
1,2023-02-06 09:58:50,2,3267,5.104674
2,2023-02-06 09:58:52,4,3273,5.114520
3,2023-02-06 09:58:54,6,3278,5.122616
4,2023-02-06 09:58:56,8,3290,5.139893
...,...,...,...,...
2848,2023-02-06 11:33:44,5696,3210,5.015965
2849,2023-02-06 11:33:46,5698,3204,5.006263
2850,2023-02-06 11:33:48,5700,3202,5.003840
2851,2023-02-06 11:33:50,5702,3204,5.006141


---
## Section 4: GC data parsing
---

Provide name of the subdirectory containing the mass flow meter measurement data.

In [33]:
path_gc = path_raw_data / '02_GC'

Search directory for further subdirectories and print them.

In [34]:
subdirectories_gc = {index:directory for index, directory in enumerate(path_gc.iterdir())}
for index, directory in subdirectories_gc.items():
    print(f"{index}: {directory.name}")

0: CAD14-Cu@AB


Choose one of the found subdirectory by its index.

In [35]:
subdirectory_index_gc = 0
selected_subdirectory_gc = subdirectories_gc[subdirectory_index_gc]
print(selected_subdirectory_gc)

/mnt/c/Users/rscho/Documents/GitHub/datamodel_b07_tc/data/Rohdaten/02_GC/CAD14-Cu@AB


Search subdirectory for further subsubdirectories and print them.

In [36]:
subsubdirectories_gc = {index:directory for index, directory in enumerate(selected_subdirectory_gc.iterdir())}
for index, directory in subsubdirectories_gc.items():
    print(f"{index}: {directory.name}")

0: JH-1H 2023-02-06 10-00-18


Choose one of the found subsubdirectory by its index.

In [37]:
subsubdirectory_index_gc = 0
selected_subsubdirectory_gc = subsubdirectories_gc[subsubdirectory_index_gc]
print(selected_subsubdirectory_gc)

/mnt/c/Users/rscho/Documents/GitHub/datamodel_b07_tc/data/Rohdaten/02_GC/CAD14-Cu@AB/JH-1H 2023-02-06 10-00-18


Print available directories containg the individual measurement datasets.

In [38]:
exp_directories_gc = {index:directory for index, directory in enumerate(selected_subsubdirectory_gc.iterdir()) if directory.is_dir()}
for index, directory in exp_directories_gc.items():
    print(f"{index}: {directory.name}")

3: JH_GASPRODUKTE.M
4: JH_GASPRODUKTE_30MIN.M
6: NV-F0101.D
7: NV-F0102.D
8: NV-F0103.D
9: NV-F0104.D
10: NV-F0201.D


Choose directories of idividual GC measurements to be used for calculation by their indices.

In [39]:
indices_exp_directories_gc = [7,8,9]
selected_subdirectories_gc = [exp_directories_gc[file] for file in indices_exp_directories_gc]
for sub in selected_subdirectories_gc:  
    print(sub)

/mnt/c/Users/rscho/Documents/GitHub/datamodel_b07_tc/data/Rohdaten/02_GC/CAD14-Cu@AB/JH-1H 2023-02-06 10-00-18/NV-F0102.D
/mnt/c/Users/rscho/Documents/GitHub/datamodel_b07_tc/data/Rohdaten/02_GC/CAD14-Cu@AB/JH-1H 2023-02-06 10-00-18/NV-F0103.D
/mnt/c/Users/rscho/Documents/GitHub/datamodel_b07_tc/data/Rohdaten/02_GC/CAD14-Cu@AB/JH-1H 2023-02-06 10-00-18/NV-F0104.D


Provide filenames of the files that contains the meta data and experimental data, respectively.

In [40]:
filename_exp_gc = 'report01.CSV'
filename_meta_gc = 'report00.CSV'

Initialize GCParser.

In [41]:
gcparser = GCParser(selected_subdirectories_gc, filename_meta_gc, filename_exp_gc)

Show available metadata files contained in the selected directory.

In [42]:
metadata_dict_gc = gcparser.available_meta_files
for index, gc_file in metadata_dict_gc.items():
    print(f"{index}: {gc_file.name}")

0: report00.CSV
1: report00.CSV
2: report00.CSV


Select GC metadata files to be parsed by their indices.

In [43]:
indices_gc_meta = [0,1,2]

Show available experimental data files contained in the selected directory.

In [44]:
exp_data_dict_gc = gcparser.available_exp_files
for index, gc_file in exp_data_dict_gc.items():
    print(f"{index}: {gc_file.name}")

0: report01.CSV
1: report01.CSV
2: report01.CSV


Select GC experimental data files to be parsed by their indices.

In [45]:
indices_gc_exp = [0,1,2]

Extract the metadata and experimental data from them and load into the dataset.

In [46]:

list_df_meta = []
list_df_exp = []
for index in indices_gc_exp:
    metadata_df_gc, metadata_gc= gcparser.extract_metadata(index)
    exp_data_df_gc, exp_data_gc = gcparser.extract_exp_data(index)
    gc = lib.Measurement(
        measurement_type=lib.enums.MeasurementType.GC.value,
        metadata=[value for value in metadata_gc.values()],
        experimental_data=[value for value in exp_data_gc.values()]
    )
    experiment.measurements.append(gc)
    list_df_meta.append(metadata_df_gc)
    list_df_exp.append(exp_data_df_gc)

Print example content of first metadata file.

In [47]:
list_df_meta[0]

Unnamed: 0,parameter,value,description
0,Sample Name,,
1,Sample Info,,
2,Data File,D:\GC\Kurz\CAD14-Cu@AB\JH-1H 2023-02-06 10-00-18\,NV-F0102.D
3,Acq. Instrument,Instrument 1,
4,Analysis Method,D:\GC\Kurz\CAD14-Cu@AB\JH-1H 2023-02-06 10-00-18\,JH_GASPRODUKTE.M
5,Method Info,,
6,Results Created,06.02.2023 10:32:26,
7,Results Created by,MS,
8,Acq. Method,JH_GASPRODUKTE.M,
9,Injection Date,"06-Feb-23, 10:17:24",


Print example content of first experimental data file.

In [48]:
list_df_exp[0]

Unnamed: 0,Peak_number,Retention_time,Signal,Peak_type,Peak_area,Peak_height,Peak_area_percentage
0,1,1.729967,1,PBAN,69.171577,32.512886,0.098238
1,2,2.909973,1,BBA,65492.746094,3794.478271,93.013605
2,3,3.43423,2,BV,164.157028,43.253098,0.233138
3,4,3.657794,2,VB,141.173935,49.408844,0.200497
4,5,6.045472,2,BB,1624.07373,347.834717,2.30653
5,6,12.997822,1,BB,2876.952637,88.829025,4.085884
6,7,14.194683,2,BB,43.731697,14.139935,0.062108


In [49]:
# hplc_path = raw_data_path / '04_HPLC'
# pressure_path = raw_data_path / '05_Pressure'

---
## Section 4: Calibration data parsing
---

Set path to calibration data.

In [50]:
path_calibration_data = path_data / 'calibration'

Print json files available in the ```calibration``` directory.

In [51]:
files_dict_calibration = Calibrator.available_json_files(path_to_calibration_data=path_calibration_data)
for count, path in files_dict_calibration.items():
    print(f'{count}:{path}')

0:/mnt/c/Users/rscho/Documents/GitHub/datamodel_b07_tc/data/calibration/calibration.json


Select calibration data files to be parsed by its index.

In [52]:
file_index_calibration = 0
file_calibration = files_dict_calibration[file_index_mfm]
file_calibration.name

'calibration.json'

Initialize calibrator by its method ```from_json_file```.

In [53]:
calibrator = Calibrator.from_json_file(path_to_json_file=file_calibration)

Calibrate and return analysis object with calibration parameters just computed. <br> Append the resulting SpeciesData objects to the experiment object.

In [54]:
species_data_list = calibrator.calibrate()
experiment.species_data = species_data_list

---
## Section 5: Parsing auxiliary data
---

### Correction factors

Set path for the ``correction`` ``factors``.

In [55]:
filename = 'correction_factors.json'
path_correction_factors = path_data / 'correction_factors' / filename

Load correction factors.

In [56]:
experiment.read_correction_factors(path_correction_factors)

### Farady coefficients

Set path for the ``Faraday`` ``coefficients``.

In [57]:
filename = 'faraday_coefficients.json'
path_faraday_coefficients = path_data / 'faraday_coefficients' / filename

Load faraday coefficients.

In [58]:
experiment.read_faraday_coefficients(path_faraday_coefficients)

### Electrode surface area

Set value for the surface area of the electrode.

In [59]:
electrode_surface_area = 1.0 # cm^2

---
## Section 6: Appending parsed data to dataset
---

Print current state of experiment object.

In [60]:
# print(experiment.json())

In [61]:
dataset.experiments.append(experiment)

In [66]:
with open(json_files[index_dataset], "w") as f:
    f.write(dataset.json())

In [65]:
# button = widgets.Button(description="Append experiment", layout=widgets.Layout(width='30%', height='80px'))
# button.style.button_color = 'darkcyan'
# button.style.text_color = 'lightgrey'
# button.style.font_size = '30px'


# output = widgets.Output()

# display(button, output)

# def click_on_button(b):
#     with output:
#         print("Experiment successfully appended.")

# button.on_click(click_on_button)

In [64]:
%%html
<style>
.cell-output-ipywidget-background {
    background-color: transparent !important;
}
:root {
    --jp-widgets-color: var(--vscode-editor-foreground);
    --jp-widgets-font-size: var(--vscode-editor-font-size);
}  
</style>