In [1]:
import configparser
from pathlib import Path

import numpy as np
import pandas as pd

pd.set_option('display.max_colwidth', 255)


# Exploring How EddyPro config files work.

We would like to be able to quickly modify eddypro config files so that we can
* Run long stretches of data in multiple parallel runs: we must modify the directory structure and the date range for eddypro runs
* Incorporate uncertainty into data processing by varying metadata and input parameters: we must modify parameter values and correction methods
* Quickly run EddyPro on a new dataset: we need to be able to modify the directory structure and file convention.


## Run Long Stretches of Data in Parallel

Each parallel worker will need its own `.eddypro` file. Additionally, the `.eddypro` file for each parallel worker will need its own date range.

To see what we need to change in the INI file, let's make two parallel runs using the GUI, then open the INI files and compare them.

In [2]:
# read in ini files from 2 separate runs
wd = Path('eddypro')
ini_files = [wd/'ini/date1.eddypro', wd/'ini/date2.eddypro']
configs = {fn: configparser.ConfigParser() for fn in ini_files}
for fn in configs:
    configs[fn].read(fn)

# convert to "long" format dataframe
fns = []
sections = []
keys = []
values = []
for fn, config in configs.items():
    for section in config.sections():
        for k, v in config[section].items():
            fns.append(fn.name)
            sections.append(section)
            keys.append(k)
            values.append(v)
config_df = pd.DataFrame(dict(fn=fns, section=sections, key=keys, value=values))
config_df


Unnamed: 0,fn,section,key,value
0,date1.eddypro,FluxCorrection_SpectralAnalysis_General,add_sonic_lptf,1
1,date1.eddypro,FluxCorrection_SpectralAnalysis_General,ex_dir,
2,date1.eddypro,FluxCorrection_SpectralAnalysis_General,ex_file,/Users/alex/Documents/Work/UWyo/Research/Flux Pipeline Project/Platinum_EC/eddypro/test_runs/parallel_lostcreek_shared/output/eddypro_worker_1_fluxnet_2023-07-03T160205_adv.csv
3,date1.eddypro,FluxCorrection_SpectralAnalysis_General,horst_lens,2
4,date1.eddypro,FluxCorrection_SpectralAnalysis_General,sa_bin_spectra,
...,...,...,...,...
739,date2.eddypro,RawProcess_TimelagOptimization_Settings,to_pg_range,1.5
740,date2.eddypro,RawProcess_TimelagOptimization_Settings,to_start_date,2020-06-26
741,date2.eddypro,RawProcess_TimelagOptimization_Settings,to_start_time,00:00
742,date2.eddypro,RawProcess_TimelagOptimization_Settings,to_subset,0


In [3]:
config_df.loc[~config_df.duplicated(subset=['section', 'key', 'value'], keep=False)].sort_values(by=['section', 'key', 'fn'])

Unnamed: 0,fn,section,key,value
80,date1.eddypro,Project,file_name,/Users/alex/Documents/Work/UWyo/Research/Flux Pipeline Project/Eddypro-ec-testing/eddypro/ini/date1.eddypro
452,date2.eddypro,Project,file_name,/Users/alex/Documents/Work/UWyo/Research/Flux Pipeline Project/Eddypro-ec-testing/eddypro/ini/date2.eddypro
95,date1.eddypro,Project,last_change_date,2023-07-06T18:41:46
467,date2.eddypro,Project,last_change_date,2023-07-06T18:42:17
106,date1.eddypro,Project,pr_end_time,12:30
478,date2.eddypro,Project,pr_end_time,23:30
107,date1.eddypro,Project,pr_start_date,2020-06-26
479,date2.eddypro,Project,pr_start_date,2020-06-27
108,date1.eddypro,Project,pr_start_time,07:30
480,date2.eddypro,Project,pr_start_time,12:30


So, what did we find?

When changing only the date window of the run, the following settings were modified:
*  `FluxCorrection_SpectralAnalysis_General/ex_file` differs between the two configs. I believe that when first calling `eddypro_rp`, this entry should be blank. This is referred to as the "essentials" file, but I don't know what it does. `worker_2.eddypro` has this entry blank, because it was never able to complete a run; it hangs at the first processing step.
*  `Project/file_name` should point to the absolute path of the INI file.
*  `Project/last_change_date` can be left blank
*  `Project/pr_start_date` and `Project/pr_end_date` indicate the start and end dates to use in this processing in `yyyy-mm-dd` format.
*  `Project/pr_start_time` and `Project/pr_end_time` indicate the start and end times to use in this processing in `HH:MM` format.
*  `Project/project_id` specifies an ID to include in the output file names
*  `RawProcess_General/dec_date` specifies the magnetic declination date. Not relevant here.
*  `RawProcess_TiltCorrection_Settings` doesn't matter

So, all we need to focus on is the `ex_file` entry, the `file_name` entry, the `project_id` entry, and the `pr_start/end_time/date` entries.

Let's try it.

In [94]:
wd_new = Path('/Users/alex/Documents/Work/UWyo/Research/Flux Pipeline Project/Platinum_EC/eddypro/test_runs/parallel_lostcreek_python_parallel')
Path.mkdir(wd_new, exist_ok=True)

# start/end time/date
starts = pd.date_range('2020-06-21 00:00', '2020-07-21 23:30', freq='1D')
ends = starts + pd.Timedelta('1D') - pd.Timedelta('30m')  # dates are inclusive, so subtract 30min
start_dates = starts.strftime(date_format='%Y-%m-%d')
start_times = starts.strftime(date_format='%H:%M')
end_dates = ends.strftime(date_format='%Y-%m-%d')
end_times = ends.strftime(date_format='%H:%M')

project_ids = [f'worker_{i}' for i in range(len(starts))]

file_names = [wd_new / f'{project_id}.eddypro' for project_id in project_ids]

base_config = configs[[i for i in configs.keys()][0]]  # template config file that we read in earlier

# this is copied from another project, so we need to change the output path.
out_path = '/Users/alex/Documents/Work/UWyo/Research/Flux Pipeline Project/Platinum_EC/eddypro/test_runs/parallel_lostcreek_python_parallel/output'

for i, fn in enumerate(file_names):
    # empty ex_file
    config.set(section='FluxCorrection_SpectralAnalysis_General', option='ex_file', value='')
    # point to file absolute path
    config.set(section='Project', option='file_name', value=fn.absolute().__str__())
    config.set(section='Project', option='last_change_date', value='')
    config.set(section='Project', option='pr_start_date', value=start_dates[i])
    config.set(section='Project', option='pr_end_date', value=end_dates[i])
    config.set(section='Project', option='pr_start_time', value=start_times[i])
    config.set(section='Project', option='pr_end_time', value=end_times[i])
    config.set(section='Project', option='project_id', value=project_ids[i])
    config.set(section='RawProcess_General', option='dec_date', value=start_dates[i])
    config.set(section='Project', option='out_path', value=out_path)

    with open(fn, 'w') as configfile:
        configfile.write(';EDDYPRO_PROCESSING\n')  # header line
        config.write(fp=configfile, space_around_delimiters=False)


Surprisingly, this works! However, we still have to solve the issue of eddypro "hanging" while reading input files sometimes.