# <center>Workflow for the CRC1333 project B07 - Technical Chemistry</center>
# <center>Experimental notebook</center>

---

This is the ``Experimental`` ``notebook``, where the actual analysis of the experiments takes place. It consists of three parts: ``Parsing``, ``analysis`` and ``DaRUS`` ``upload``. Within the scope of each project, multiple experiments are perfomed, hence multiple analyses are to be done. For each individual experiment this workflow is to be executed once, and the results can be appended to the project's dataset.

---

In [35]:
from sdRDM.generator import generate_python_api
from sdRDM import DataModel

In [36]:
# generate_python_api('specifications/datamodel_b07_tc.md', '', 'datamodel_b07_tc')

Import standard library python packages.

In [37]:
%load_ext autoreload
%autoreload 2

from datamodel_b07_tc.tools import GCParser
from datamodel_b07_tc.tools import GstaticParser
from datamodel_b07_tc.tools import MFMParser
from datamodel_b07_tc.tools import Calculator
from datamodel_b07_tc.tools import get_volumetric_flow_mean
from datamodel_b07_tc.tools import get_initial_time_and_current
from datamodel_b07_tc.tools import assign_peaks
# from DEXPI2sdRDM import DEXPI2sdRDM

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [38]:
import os
import ipywidgets as widgets
from IPython.display import display
from pathlib import Path

---
## Section 1: Parsing
---

In this section the data model and the dataset as well as all the output files necessary for analysis are parsed.  

Get path to the directory this file is located and check if it exists.

In [39]:
root = Path(os.path.abspath(''))
print("Path to this notebook's location:", root)
print('Is the path correct?', root.is_dir())

Path to this notebook's location: /mnt/c/Users/rscho/Documents/GitHub/datamodel_b07_tc
Is the path correct? True


 Set path to datasets.

In [40]:
path_to_datasets = root / 'datasets'

List all available datasets in the directory.


In [41]:
files = path_to_datasets.iterdir()
json_files = {index:file for index, file in enumerate(files) if file.suffix == '.json'}
for index, file in json_files.items():
    print(f'{index}: {file.name}')

0: b07.json


Choose dataset to be loaded by its index.

In [42]:
index_dataset = 0
dataset, lib = DataModel.parse(json_files[index_dataset])

Visualize the data model.

In [43]:
# lib.Dataset.meta_tree()

Print current status of the dataset.

In [44]:
# print(dataset.json())

Set path to the directory containing the raw data.

In [45]:
# raw_data_path = Path('F:\Doktorand\\03_Messungen\Rohdaten')
path_raw_data = root / 'data' / 'Rohdaten'

Instantiate an experiment object which holds all the information about one single experiment.

In [46]:
experiment = lib.Experiment()

### Potenstiostatic data

Provide name of the directory containing the potentiostatic measurement data.

In [47]:
path_echem = path_raw_data / '01_EChem'

Search in that directory for further subdirectories and print them.

In [48]:
subdirectories_echem = {index:directory for index, directory in enumerate(path_echem.iterdir())}
for index, directory in subdirectories_echem.items():
    print(f"{index}: {directory.name}")

0: 210728_ITO_TEST
1: CAD14-Cu@AB


Choose subdirectory by its index.

In [49]:
subdirectory_index_echem = 1
selected_subdirectory_echem = subdirectories_echem[subdirectory_index_echem]
print(selected_subdirectory_echem)

/mnt/c/Users/rscho/Documents/GitHub/datamodel_b07_tc/data/Rohdaten/01_EChem/CAD14-Cu@AB


Provide suffix of the file that contains the data.

In [50]:
suffix_echem = 'DTA'

Initialize the ``GstaticParser`` and print available files.

In [51]:
gstaticparser = GstaticParser(selected_subdirectory_echem, suffix_echem)
files_dict_echem = gstaticparser.available_files
for index, gstatic_file in files_dict_echem.items():
    print(f"{index}: {gstatic_file.stem}")

0: GSTATIC
1: POTDYN


Chose specific file.

In [52]:
file_index_echem = 0
file_echem = files_dict_echem[file_index_echem]
file_echem.name

'GSTATIC.DTA'

Extract the metadata from it using the ``GstaticParser`` and load into the data model.

In [53]:
gstatic_metadata_df, gstatic_metadata = gstaticparser.extract_metadata(file_index_echem)
potentiometric_measurement = lib.Measurement(measurement_type=lib.enums.MeasurementType.POTENTIOSTATIC, metadata=gstatic_metadata)
experiment.measurements = [potentiometric_measurement]
gstatic_metadata_df

Unnamed: 0,Parameter,Data_type,Value,Description
0,PSTAT,PSTAT,REF3000-19129,Potentiostat
1,IINIT,QUANT,-2.00000E+002,Initial I (mA/cm^2)
2,TINIT,QUANT,3.60000E+003,Initial Time (s)
3,IFINAL,QUANT,-2.00000E+002,Final I (mA/cm^2)
4,TFINAL,QUANT,0.00000E+000,Final Time (s)
5,SAMPLETIME,QUANT,1.00000E+000,Sample Period (s)
6,AREA,QUANT,1.00000E+000,Sample Area (cm^2)
7,DENSITY,QUANT,7.87000E+000,Density (g/cm^3)
8,EQUIV,QUANT,2.79200E+001,Equiv. Wt
9,IRCOMP,TOGGLE,T,IR Comp


### MFM data

Provide name of the subdirectory containing the mass flow meter measurement data.

In [54]:
path_mfm = path_raw_data / '03_MFM'

Search directory for further subdirectories and print them.

In [55]:
subdirectories_mfm = {index:directory for index, directory in enumerate(path_mfm.iterdir())}
for index, directory in subdirectories_mfm.items():
    print(f"{index}: {directory.name}")

0: CAD14-Cu@AB


Choose subdirectory by its index.

In [56]:
subdirectory_index_mfm = 0
selected_subdirectory_mfm = subdirectories_mfm[subdirectory_index_mfm]
print(selected_subdirectory_mfm)

/mnt/c/Users/rscho/Documents/GitHub/datamodel_b07_tc/data/Rohdaten/03_MFM/CAD14-Cu@AB


Provide suffix of the file that contains the data.

In [57]:
suffix_mfm = 'csv'

Instantiate the ``MFMParser`` to parse MFM output files and show available files in the selected directory.

In [58]:
mfmparser = MFMParser(selected_subdirectory_mfm, suffix_mfm)
files_dict_mfm = mfmparser.available_files
for index, mfm_file in files_dict_mfm.items():
    print(f"{index}: {mfm_file.name}")

0: Bench-2h-GSS_CAD14-Cu@AB_200_50c_24h.csv
1: Bench-2h-GSS_CAD14-Cu@AB_200_50c_24h_truncated.csv


Chose file to be parsed.

In [59]:
file_index_mfm = 1
file_mfm = files_dict_mfm[file_index_mfm]
file_mfm.name

'Bench-2h-GSS_CAD14-Cu@AB_200_50c_24h_truncated.csv'

Extract the experimental data from it using the ``MFMParser`` and load into the data model.

In [60]:
experimental_data_df_mfm, experimental_data_dict_mfm = mfmparser.extract_exp_data(file_index_mfm)
mfm = lib.Measurement(
            measurement_type=lib.enums.MeasurementType.MFM.value,
            experimental_data=[value for value in experimental_data_dict_mfm.values()],
        )
experiment.measurements.append(mfm)

ValidationError: 1 validation error for Measurement
experimental_data -> 3 -> quantity
  value is not a valid enumeration member; permitted: 'Time', 'Voltage', 'Current', 'Mass', 'Mass flow rate', 'Date time', 'Fraction', 'Signal', 'Peak number', 'Retention time', 'Peak type', 'Peak area', 'Peak height', 'Peak area percentage', 'Slope', 'Intercept', 'Coefficient of determination' (type=type_error.enum; enum_values=[<Quantity.TIME: 'Time'>, <Quantity.VOLTAGE: 'Voltage'>, <Quantity.CURRENT: 'Current'>, <Quantity.MASS: 'Mass'>, <Quantity.MASSFLOWRATE: 'Mass flow rate'>, <Quantity.DATETIME: 'Date time'>, <Quantity.FRACTION: 'Fraction'>, <Quantity.SIGNAL: 'Signal'>, <Quantity.PEAKNUMBER: 'Peak number'>, <Quantity.RETENTIONTIME: 'Retention time'>, <Quantity.PEAKTYPE: 'Peak type'>, <Quantity.PEAKAREA: 'Peak area'>, <Quantity.PEAKHEIGHT: 'Peak height'>, <Quantity.PEAKAREAPERCENTAGE: 'Peak area percentage'>, <Quantity.SLOPE: 'Slope'>, <Quantity.INTERCEPT: 'Intercept'>, <Quantity.COEFFDET: 'Coefficient of determination'>])

In [None]:
experimental_data_df_mfm

In [None]:
print(experimental_data_dict_mfm.items())

In [None]:
# truncated_mfm_experimental_data_df= mfm_experimental_data_df.truncate(after=10)
# truncated_mfm_experimental_data_df

### GC data

Provide name of the subdirectory containing the mass flow meter measurement data.

In [None]:
path_gc = path_raw_data / '02_GC'

Search directory for further subdirectories and print them.

In [None]:
subdirectories_gc = {index:directory for index, directory in enumerate(path_gc.iterdir())}
for index, directory in subdirectories_gc.items():
    print(f"{index}: {directory.name}")

Choose subdirectory by its index.

In [None]:
subdirectory_index_gc = 0
selected_subdirectory_gc = subdirectories_gc[subdirectory_index_gc]
print(selected_subdirectory_gc)

Provide suffix of the file that contains the data.

In [None]:
suffix_gc = 'csv'

Parse GC output files using the ``GCParser`` module. Show available files contained in the selected directory.

In [None]:
gcparser = GCParser(selected_subdirectory_gc, suffix_gc)
files_dict_gc = gcparser.available_files
for index, gc_file in files_dict_gc.items():
    print(f"{index}: {gc_file.name}")

Select GC files to be parsed.

Metadata

In [None]:
gc_metadata_file = dict_of_gc_files[2]
gc_metadata_file

Experimental data

In [None]:
gc_experimental_data_file = dict_of_gc_files[3]
gc_experimental_data_file

Extract the metadata and experimental data from them and load into the dataset.

In [None]:
gc_metadata_df, gc_metadata= gcparser.extract_metadata(gc_metadata_file)
gc_experimental_data_df, gc_experimental_data = gcparser.extract_exp_data(gc_experimental_data_file)
gc = lib.Measurement(
    measurement_type=lib.enums.MeasurementType.GC.value,
    metadata=[value for value in gc_metadata.values()],
    experimental_data=[value for value in gc_experimental_data.values()]
)
experiment.measurements.append(gc)
gc_metadata_df

In [None]:
gc_experimental_data_df

In [None]:
# hplc_path = raw_data_path / '04_HPLC'
# pressure_path = raw_data_path / '05_Pressure'

Print current state of experiment object.

In [None]:
print(experiment.json())

---
## Analysis
---

Assign peak areas to species.

The peak areas recorded by the GC have to be matched with the correct species. The individial ``Area`` is selected by its corresponding ``Peak_Number``. It is possible that the same species is accountable for multiple peaks, i.d. multiple peaks are assigned to the same species.


In [None]:
assign_peak_dict={
    'H2': [1],
    'CO2': [2],
    'CO': [6],
    'CH4': [3],
    # 'C2H4': [5],
    # 'C2H6': [4],
}
peak_area_dict = assign_peaks(dataset, assign_peak_dict)

for species, peak_area in peak_area_dict.items():
    print(f"{species}: {peak_area}")

Set calibration input values and import into the data model.

To determine the concentrations of the individual species, a calibration has to be performed in advance to match the individual values for ``Area`` with their corresponding concentrations.

In [None]:
calibration_input_dict={
    'H2': [
        lib.enums.Species.HYDROGEN,
        {
            'peak_areas': [71,153,330],
            'concentrations': [5,10,20]
        },
    ],
    'CO':[
        lib.enums.Species.CARBONMONOXIDE,
        {
            'peak_areas': [797,1328,7223],
            'concentrations': [0.5,1,5]
        }
    ],
    'CO2': [
        lib.enums.Species.CARBONDIOXIDE,
        {
            'peak_areas': [0,38653],
            'concentrations': [0,50]
        }
    ],
    'CH4':[
        lib.enums.Species.METHANE,
        {
            'peak_areas': [5727,11991],
            'concentrations': [5,10]
        }
    ],
    # 'C2H4':[
    #     lib.enums.Species.ETHENE,
        # {
            # 'peak_areas': [1122,4864,7297],
            # 'concentrations': [0.5,2,3]
        # }
    # ],
    # 'C2H6':[
    #     lib.enums.Species.ETHANE,
        # {
            # 'peak_areas': [0,12168],
            # 'concentrations': [0,5]
        # }
    # ],
}

Calibrate using the ``calibrate`` method of the ``Calculator`` module.

In [None]:
calculator=Calculator()
calibration_df, calibration_dict=calculator.calibrate(calibration_input_dict)
calibration_df
# for species, value in calibration_dict.items():
#     print(f"{species}: {value}")
#     # print(lib.Calibration(value))

In [None]:
analysis = lib.Analysis()
analysis.calibrations = [calibration for calibration in calibration_dict.values()]
experiment.analysis = analysis

Print current state of the experiment object.

In [None]:
print(experiment.json())

Calculate ``volumetric`` ``fractions`` in % out of the peak areas using the determined calibration curve.

In [None]:
volumetric_fractions_df = calculator.calculate_volumetric_fractions(peak_area_dict=peak_area_dict, calibration_df=calibration_df)
volumetric_fractions_df

Set the ``correction`` ``factors``.

In [None]:
correction_factors_dict= {
    'H2':1.01,
    'CO':0.74,
    'CO2':1.00,
    'CH4':0.76,
    # 'C2H4':,
    # 'C2H6':,
}

Calculate the ``conversion`` ``factor`` using the correction factors.

In [None]:
conversion_factor = calculator.calculate_conversion_factor(
    volumetric_fractions_df=volumetric_fractions_df, correction_factors_dict=correction_factors_dict
)
conversion_factor

Get ``volumetric`` ``flow`` ``mean`` in ml/min at the time of the GC measurement.

The mass flow at the time of the GC measurement is determined by matching the time of the gc measurement with the corresponding times of the mass flow measurements. Errors in the mass flows due to strong fluctuations are minimized by calculating the mean by averaging over a certain number (=``radius``) of measuring points before and after the time of the GC measurement. The radius has to be specified in accordance with the strength of fluctuations.

In [None]:
mean_radius = 10
volumetric_flow_mean = get_volumetric_flow_mean(experiment, mean_radius)
volumetric_flow_mean

Calculate the ``real`` ``volumetric`` ``flow`` in ml/min as a product of the ``volumetric`` ``flow`` ``mean`` and the ``conversion`` ``factor``.

In [None]:
real_volumetric_flow = volumetric_flow_mean*conversion_factor
real_volumetric_flow

In [None]:
# vol_flow_real= calculator.calculate_real_volumetric_flow(conversion_factor = conversion_factor, measured_volumetric_flow_mean = vol_flow_mean )
# vol_flow_real

Calculate volumetric flow fractions in %.

In [None]:
volumetric_flow_fractions_df=calculator.calculate_volumetric_flow_fractions(
    real_volumetric_flow=real_volumetric_flow, volumetric_fractions_df=volumetric_fractions_df
)
volumetric_flow_fractions_df

Calculate material flow in mmol/min.

In [None]:
material_flow_df = calculator.calcualte_material_flow(volumetric_flow_fractions_df=volumetric_flow_fractions_df)
material_flow_df

Get initial current in mA and initial time in s.

In [None]:
initial_current, initial_time = get_initial_time_and_current(experiment)
print(f'Initial current in mA: {initial_current}')
print(f'Initial time in s: {initial_time}')

Calculate theoretical material flow in mmol/min.

In [None]:
electrode_surface_area = 1.0 # cm^2
theoretical_material_flow_df=calculator.calculate_theoretical_material_flow(
    initial_current=initial_current, initial_time=initial_time, electrode_surface_area=electrode_surface_area
)
theoretical_material_flow_df

Calculate Faraday efficiency and load into dataset.

In [None]:
faraday_efficiency_df = material_flow_df['Material_flow'] / theoretical_material_flow_df['Theoretical_material_flow']
faraday_efficiency_df

---
## DaRUS upload
---

In [None]:
dataset.experiments.append(experiment)

In [None]:
with open(json_files[index_dataset], "w") as f:
    f.write(dataset.json())

In [None]:
button = widgets.Button(description="Append experiment", layout=widgets.Layout(width='30%', height='80px'))
button.style.button_color = 'darkcyan'
button.style.text_color = 'lightgrey'
button.style.font_size = '30px'


output = widgets.Output()

display(button, output)

def click_on_button(b):
    with output:
        print("Experiment successfully appended.")

button.on_click(click_on_button)

In [None]:
# %%html
# <style>
# .cell-output-ipywidget-background {
#    background-color: transparent !important;
# }
# .cell-output-ipywidget-foreground {


    
# .jp-OutputArea-output {
#    background-color: transparent;
# }  
# </style>

In [None]:
%%html
<style>
.cell-output-ipywidget-background {
    background-color: transparent !important;
}
:root {
    --jp-widgets-color: var(--vscode-editor-foreground);
    --jp-widgets-font-size: var(--vscode-editor-font-size);
}  
</style>