# <center>Workflow for on-line GC and HPLC analysis in flow chemistry</center>
# <center>2.1 Experimental notebook - Parsing</center>

---

This is the ``Experimental`` ``notebook`` ``2.1 "Parsing"``, where all the relevent data of the experiments are read in from different ressources. For each individual experiment this workflow is to be executed once, and the data can be appended to the project's dataset.

---

---
## Section 0: Imports, Paths, and Logging
---

In this section all the necessary python packages are imported, the path to this notebook and the logger for this notebook is set up.

In [67]:
# Activate autoreload to keep on track with changing modules #
%reload_ext autoreload
%autoreload 2

# Import standard libraries #
import os
import json
import logging
from pathlib import Path
import ipywidgets as widgets
from IPython.display import display, Markdown

# Import librarian module for file directory handling #
from datamodel_b07_tc.tools import Librarian

# Import modified sdRDM objects #
from datamodel_b07_tc.modified.experiment import Experiment
from datamodel_b07_tc.modified.measurement import Measurement
from datamodel_b07_tc.modified.plantsetup import PlantSetup

# Import datamodel from sdRDM #
from sdRDM import DataModel

# Import tools for parsing and calibration of the raw data #
from datamodel_b07_tc.tools import Calibrator
from datamodel_b07_tc.tools import gc_parser
from datamodel_b07_tc.tools import gstatic_parser
from datamodel_b07_tc.tools import mfm_parser
# from datamodel_b07_tc.tools import DEXPI2sdRDM

# from sdRDM.generator import generate_python_api
# generate_python_api('specifications/datamodel_b07_tc.md', '', 'datamodel_b07_tc')

In [3]:
#Define paths for loggin output #
root                = Path.cwd()
logging_config_path = root / "datamodel_b07_tc/tools/logging/config_exp_2_1.json"

# Read in logger specs and configurate logger (set name to current notebook) #
with open(logging_config_path) as logging_config_json: logging.config.dictConfig( json.load( logging_config_json ) )
logger = logging.getLogger(__name__)

# Set the level of thid-party logger to avoid dumping too much information #
third_party_module_loggers = ['markdown_it', 'h5py', 'numexpr', 'git']
for logger_ in third_party_module_loggers: logging.getLogger(logger_).setLevel('WARNING')

# Initialize the librarian with root directory of this notebook #
librarian = Librarian(root_directory=root)

---
## Section 1: Dataset and data model parsing
---
In this section the data model and the dataset as well as all the output files necessary for the analysis notenook are parsed.  

In [77]:

def search_files_in_subdirectory(root_directory: Path, directory_keys: list[str], file_filter: str, verbose: bool = None) -> Path:
    """
    Function that loobs through Path objects containing a main directory. In this directory it is recoursevly searched for sub directories. 
    In the last sub directory files with the suffix 'file_filter' are searched and returned

    Args:
        root_directory (Path): Root directory
        directory_keys (list[str]): List of subdirectories that should be recoursevly searched
        file_filter (str): Suffix of files that should be found in last given sub directory
        verbose (bool, optional): Possiblity to printout all subdirectories in each directory listed. Defaults to None.

    Raises:
        KeyError: If either the specified sub directory or file could not be found

    Returns:
        subdirectory_files (Path): Path object containing all files found in the subdirectory
    """

    # First search for every nested sub directory in provided root directory #
    root = root_directory
    for j,directory_key in enumerate(directory_keys):
        try:
            idx_sub_directory = [i for i in range(len(root)) if root[i].parts[-1] == directory_key ][0]
            if j < len(directory_keys)-1: 
                root          = librarian.enumerate_subdirectories(directory=root[idx_sub_directory])
        except:
            raise KeyError("Defined key: '%s' cannot be found in the given root directory: %s"%(directory_key,root[0].parent))

    # Search for all files that match the given filter in the specified sub directory #
    subdirectory_files = librarian.enumerate_files(directory=root[idx_sub_directory], filter=file_filter, verbose=verbose)   
    if not bool(subdirectory_files): 
        raise KeyError("No files with filter: '%s' found in the given sub directory: %s"%(file_filter,root_directory[idx_sub_directory]))
    
    return subdirectory_files


def dropdown_files(files: Path, description: str = "Files"):
     
    dropdown = widgets.Dropdown(
        options=[(path.parts[-1],idx) for idx,path in files.items()],
        description=description,
        layout=widgets.Layout(width='auto'),
        style={'description_width': 'auto'} 
        )
    
    return dropdown

In [78]:
# Check for all available subdirectories #
root                          = Path.cwd()
root_subdirectories           = librarian.enumerate_subdirectories(directory=root)

# Search for subdirectory "datasets" and in it for all dataset json files #

json_dataset_files            = search_files_in_subdirectory(root_directory=root_subdirectories, directory_keys=["datasets"], file_filter="json", verbose=False)
json_dropdown                 = dropdown_files(json_dataset_files,description="Datasets")

dataset, lib = DataModel.parse( json_dataset_files[ json_dropdown.value ] )

## Search for raw data ##

# Potentiostatic data #
potentiostatic_raw_data_files = search_files_in_subdirectory(root_directory=root_subdirectories, directory_keys=["data","Rohdaten","01_EChem","CAD14-Cu@AB"], file_filter="DTA", verbose=False)
potentiostatic_dropdown       = dropdown_files(potentiostatic_raw_data_files,description="Potentiostatic raw data")

# Mass flow meter data #
mfm_raw_data_files            = search_files_in_subdirectory(root_directory=root_subdirectories, directory_keys=["data","Rohdaten","03_MFM","CAD14-Cu@AB"], file_filter="csv", verbose=False)
mfm_dropdown                  = dropdown_files(mfm_raw_data_files,description="Mass flow meter raw data")

# GC data #


display( Markdown("### Choose a dataset") )
display( json_dropdown )

display( Markdown("### Choose raw data files") )

display( widgets.HBox([ potentiostatic_dropdown, mfm_dropdown ]) )

# Read in selected raw data

#potentiostatic_metadata_df, potentiostatic_measurement = Measurement.from_parser( parser=gstatic_parser, metadata_path=potentiostatic_raw_data_files[ potentiostatic_dropdown.value ] )
#mfm_experimental_data_df, mfm_measurement = Measurement.from_parser( parser=mfm_parser, experimental_data_path=mfm_raw_data_files[ mfm_dropdown.value ] )





### Choose a dataset

Dropdown(description='Datasets', layout=Layout(width='auto'), options=(('b07.json', 0),), style=DescriptionSty…

### Choose raw data files

HBox(children=(Dropdown(description='Potentiostatic raw data', layout=Layout(width='auto'), options=(('GSTATIC…

In [153]:
import ipywidgets as widgets
from IPython.display import display, clear_output
import os

folders = root_subdirectories

folder_dropdown = widgets.Dropdown(description='Select folder:',
                                   options=[(path.parts[-1],path) for idx,path in folders.items()],
                                   layout=widgets.Layout(width='auto'),
                                   style={'description_width': 'auto'})

file_dropdown   = widgets.Dropdown(description='Select file:',layout=widgets.Layout(width='auto'),style={'description_width': 'auto'})
button_go_for   = widgets.Button(description='Move into directory',layout=widgets.Layout(width='auto'))
button_go_back  = widgets.Button(description='Move one diretory back',layout=widgets.Layout(width='auto'))
file_type_text  = widgets.Text( description='File type:', placeholder='Enter type here (e.g.: csv, json, ...)',layout=widgets.Layout(width='auto'),style={'description_width': 'auto'})

# to do:
# field where u see current path

folder_list = []

# Function to navigate into the selected subfolder
def go_to_subfolder(_):
    subfolders              = librarian.enumerate_subdirectories(directory=folder_dropdown.value)
    folder_dropdown.options = [ (path.parts[-1],path) for idx,path in subfolders.items() ] if bool(subfolders) else [ ("No subdirectories",folder_dropdown.value) ]
    parent                  = folder_dropdown.value.parent
    print("parernt: %s"%parent)

def go_to_parentfolder(_):
    # If current directory has no subdirectories then the folder_dropdown.value is the parent 
    # if the current directory has subdirectories then the folder_dropdown.value is the first of these subdirectories

    print("parent back:",folder_dropdown.value.parent)
    # Check if no subdiretories are there
    subfolders              = librarian.enumerate_subdirectories(directory=folder_dropdown.value)
    # If the current directory has subdirectories, I still wan
    parentfolder            = folder_dropdown.value.parent.parent if subdirects else folder_dropdown.value.parent
    parentfolders           = librarian.enumerate_subdirectories(directory=parentfolder)
    folder_dropdown.options = [(path.parts[-1],path) for idx,path in parentfolders.items()]

# Functions for the buttons #
button_go_for.on_click(go_to_subfolder)
button_go_back.on_click(go_to_parentfolder)

# Display the widgets
display(folder_dropdown, button_go_for, button_go_back, file_type_text, file_dropdown)


Dropdown(description='Select folder:', layout=Layout(width='auto'), options=(('.git', WindowsPath('c:/Users/da…

Button(description='Move into directory', layout=Layout(width='auto'), style=ButtonStyle())

Button(description='Move one diretory back', layout=Layout(width='auto'), style=ButtonStyle())

Text(value='', description='File type:', layout=Layout(width='auto'), placeholder='Enter type here (e.g.: csv,…

Dropdown(description='Select file:', layout=Layout(width='auto'), options=(), style=DescriptionStyle(descripti…

parernt: c:\Users\darouich\OneDrive\Dokumente\datamodel_b07_tc\datamodel_b07_tc\data
parent back: c:\Users\darouich\OneDrive\Dokumente\datamodel_b07_tc\datamodel_b07_tc\data


NameError: name 'subdirects' is not defined

In [144]:
folder_dropdown.value

In [142]:
folder_dropdown.value.parent.parent

WindowsPath('c:/Users/darouich/OneDrive/Dokumente/datamodel_b07_tc/datamodel_b07_tc')

In [91]:
os.getcwd()

'c:\\Users\\darouich\\OneDrive\\Dokumente\\datamodel_b07_tc\\datamodel_b07_tc'

In [80]:
# Search for mass flow meter data folder #
idx_gc_direct  = [i for i in range(len(raw_data_subdirectories)) if str(raw_data_subdirectories[i]).split("/")[-1] == "02_GC" ][0]
gc_directories = librarian.enumerate_subdirectories(directory=raw_data_subdirectories[idx_gc_direct])

# Serach for gc subdirectories #
idx_gc_sub_dir    = [i for i in range(len(gc_directories)) if str(gc_directories[i]).split("/")[-1] == "CAD14-Cu@AB" ][0]
gc_subdirectories = librarian.enumerate_subdirectories(directory=gc_directories[idx_gc_sub_dir])

gc_subdirectories

IndexError: list index out of range

In [None]:
# Search for mass flow meter data folder #
idx_gc_direct  = [i for i in range(len(raw_data_subdirectories)) if str(raw_data_subdirectories[i]).split("/")[-1] == "02_GC" ]
gc_directories = librarian.enumerate_subdirectories(directory=raw_data_subdirectories[idx_gc_direct])

# Serach for gc subdirectories #
idx_gc_sub_dir    = [i for i in range(len(gc_directories)) if str(gc_directories[i]).split("/")[-1] == "CAD14-Cu@AB" ]
gc_subdirectories = librarian.enumerate_subdirectories(directory=gc_directories[idx_gc_sub_dir])

# Select subdirectory of wanted experiment from given directories #
idx_gc_sub_sub_dir   = 0
gc_subsubdirectories = librarian.enumerate_subdirectories(directory=gc_subdirectories[idx_gc_sub_sub_dir])

# Gather all the gc raw data files #
gc_raw_data_files_list = []

# Select the indices of the subdirectories that should be read in
gc_raw_data_subdir_idx = [ 3, 4, 5 ] 

gc_raw_data_files_list = [librarian.enumerate_files(directory=gc_subsubdirectories[i], filter='CSV') for i in gc_raw_data_subdir_idx]

# Read out all the data from the provided gc files #
gc_experimental_data_df_list = []
gc_metadata_df_list = []
gc_measurements_list = []

for data_file in gc_raw_data_files_list:
    gc_metadata_df, gc_experimental_data_df, gc_measurement = Measurement.from_parser(
        parser=gc_parser,
        metadata_path=data_file[0],
        experimental_data_path=data_file[1]
    )
    gc_experimental_data_df_list.append(gc_experimental_data_df)
    gc_metadata_df_list.append(gc_metadata_df)
    gc_measurements_list.append(gc_measurement)

---
## Section 2: Plant setup parsing
---

Instantiate 'experiment' object.

In [22]:
experiment = Experiment()

## Get plan setup given as in dexpi format ##

#idx_rawdatafolder       = [i for i in range(len(data_subdirectories)) if str(data_subdirectories[i]).split("/")[-1] == "plant_setup" ]
#plant_setup_files = librarian.enumerate_files(dirctory=data_subdirectories[idx_rawdatafolder], filter='xml')
#plant_setup = PlantSetup.from_parser(parser=DEXPI2sdRDM, path=plant_setup_files[0])
#experiment.plant_setup = plant_setup

---
## Section 3: Potenstiostatic data parsing
---
Select path to the potentiostatic data and print available subdirectories.

---
## Section 4: MFM data parsing
---
Provide name of the subdirectory containing the mass flow meter measurement data.

---
## Section 5: GC data parsing
---

Parent directory: 
 /mnt/c/Users/rscho/Documents/GitHub/datamodel_b07_tc/data/Rohdaten/02_GC 
Available subdirectories:
0: .../CAD14-Cu@AB


In [None]:
## Combine all gathered data in the experiments object ##

experiment.measurements = [potentiostatic_measurement, mfm_measurement, *gc_measurements_list]

---
## Section 6: Calibration data parsing
---

Search for calibation files in the 'calibration' directory.

In [42]:
calibration_files = librarian.enumerate_files(directory=data_subdirectories[0])

Directory: 
 /mnt/c/Users/rscho/Documents/GitHub/datamodel_b07_tc/data/calibration 
Available files:
0: calibration.json


Initialize calibrator with an available calibration file selected by its index.

In [43]:
calibrator = Calibrator.from_json_file(path_to_json_file=calibration_files[0])

Calibrate and return analysis object with calibration parameters just computed. <br> Append the resulting SpeciesData objects to the experiment object.

In [44]:
species_data_list = calibrator.calibrate()
experiment.species_data = species_data_list

---
## Section 7: Parsing auxiliary data
---

### Correction factors

Search for correction factors files in the 'correction factors' directory.

In [45]:
correction_factors_files = librarian.enumerate_files(directory=data_subdirectories[1])

Directory: 
 /mnt/c/Users/rscho/Documents/GitHub/datamodel_b07_tc/data/correction_factors 
Available files:
0: correction_factors.json


Load correction factors into the experiment object.

In [46]:
experiment.read_correction_factors(correction_factors_files[0])

### Farady coefficients

Search for faraday coefficients files in the 'correction factors' directory.

In [47]:
faraday_coefficients_files = librarian.enumerate_files(directory=data_subdirectories[2])

Directory: 
 /mnt/c/Users/rscho/Documents/GitHub/datamodel_b07_tc/data/faraday_coefficients 
Available files:
0: faraday_coefficients.json


Load faraday coefficients into the experiment object.

In [48]:
experiment.read_faraday_coefficients(faraday_coefficients_files[0])

### Electrode surface area

Set value for the surface area of the electrode.

In [49]:
electrode_surface_area = 1.0 # cm^2

---
## Section 8: Appending parsed data to dataset
---

Print current state of experiment object.

In [50]:
# print(experiment.json())

Append experiment object to the dataset.

In [51]:
dataset.experiments.append(experiment)

Replace 'old' dataset by its extended version containing all the parsef data.

In [52]:
with open(json_dataset, "w") as f:
    f.write(dataset.json())