Enhancement/yaml diagnostics #227

mnlevy1981 · 2018-01-29T20:09:13Z

This pull request sets up infrastructure (scripts and more YAML / JSON files) to allow MARBL to define what diagnostics are available without building / running any Fortran code. The script MARBL_tools/MARBL_generate_diagnostics_file.py creates a text file of the format

diagnostic_short_name : frequency_operator[, frequency2_operator2, ..., frequencyN_operatorN]

and each GCM will need to convert it to whatever format it uses for setting up diagnostic output. The corresponding POP changes are in place, but I still need to update how the POP-owned MARBL diagnostics (tracer states, etc) are included in the tavg file before bringing this branch onto master.

I've put a few diagnostics into default_diagnostics.yaml, but still have lots more to add. I've also converted YAML -> JSON (but still need to write a script to read the JSON file), which required skipping the consistency check in yaml_to_json.py. I want to update yaml_to_json.py to take a list of YAML files and convert them all to JSON (so user will specify a directory for the output but the script will come up with file names on its own), but haven't done that yet.

Still need to finish adding diagnostics to default_diagnostics.yaml and also clearly define the schema being used (it will be less complicated than the schema in default_settings.yaml), but now I can also rough out a script to generate something similar to a tavg contents file.

Descriptions in both yaml_to_json.py and MARBL_generate_settings_file.py were out of date

This is just the shell that will contain a new class in MARBL_tools. It still needs hooks to the default_diagnostics.json file, plus a lot of other stuff... Tested with: \# (1) import MARBL tools import sys sys.path.append($MARBLROOT) import MARBL_tools \# (2) create a settings object settings_obj = MARBL_tools.MARBL_settings_class('../autogenerated_src/default_settings.json') \# (3) create a diagnostics object diags_obj = MARBL_tools.MARBL_diagnostics_class('../autogenerated_src/default_diagnostics.json', settings_obj, input_diagnostics_file=[test file]) format for the input_diagnostics_file will be diagnostic_name : frequency and '#' will be treated as comments. Comment separated frequencies => output the same variable at multiple frequencies.

Also continued work on MARBL_diagnostics_file_class.py, which now reads frequencies from JSON and creates a dictionary (key is diagnostic name, value is recommended frequency). Still need to add support for reading a text file to change frequency from default in cases where user wants non-standard output.

Unlike generate_settings_file(), which uses --settings_file_in to provide a text file with individual settings file overrides, generate_diagnostics_file() will not allow values from the JSON file to be overridden via text file input. It will be up to the GCM to provide a way for the user to change the diagnostics from the default (CESM will allow users to put marbl_diagnostics into SourceMods/src.pop) Also added some comments to the top of the diagnostic file output.

Want generate_diagnostics_file() to take MARBL_diagnostics_class object as an argument so that it can be called cleanly from the GCM.

no need to have these two fields in default_diagnostics.yaml / json; I think STF_O2 will be removed from MARBL entirely, and FG_ALT_CO2 will be added back in once a flag for "provide _ALT_CO2 fields" exists.

Surface tracer fluxes are available in the driver if the GCM wants to save them; no need to use the marbl diagnostic framework as well.

Allowable operator values are none, average, instantaneous, minimum, and maximum; MARBL_utils:diagnostics_dictionary_is_consistent now actually includes some checks (such as "are frequency and operator valid?", and the MARBL_generate_diagnostics_file.py script now returns text formatted as DIAGNOSTIC : frequency_operator (where before, the _operator was not included)

Script on GCM side will determine whether or not to include them in final lists of diagnostics.

Unless ciso_on = .true. in MARBL settings, the diagnostics associated with carbon isotopes should not be included in the diagnostic file. I added a key to the diagnostic dictionary ("module"), and diagnostics where module = ciso are only included if ciso_on = .true. I added a few ciso surface diagnostics to test this out but have lots more to add in the next commit.

Setting "operator = none" is confusing, so if frequency = never then operator just has to be any valid operator ("instantaneous", "average", "minimum", or "maximum").

default_settings.json now contains _tracer_list instead of _tracer_cnt; it is a dictionary that is used in conjunction with the rest of settings_dict to determine which tracers are being requested. get_tracer_count now returns the length of the list of all requested tracers (and there is a get_tracer_list routine to provide the tracers themselves as well).

Diagnostics need tracer long-name and units

The YAML and JSON file now have a '_tracers' key, and the consistency check in MARBL_utils.py knows what to do with it. Still does not produce any tracer-specific diagnostics, though (that needs to go into MARBL_diagnostics_file_class.py)

Diagnostics are going to need to know tracer long name and units in order to get the tracer-specific metadata right.

This is kludgy -- ideally, MARBL_diagnostics_file_class::_process_diagnostic_frequency would be able to determine if a tracer is listed in tracer_restore_vars entirely based on logic outlined in the JSON file... but instead __init__ has a block of code under the comment \# Special treatment for tracers, PFTs, etc that, in part, passes information about whether the tracer was found in tracer_restore_vars or not.

Carbon tracers are [auto_name]13C and [auto_name]14C, original entry used [auto_name]C13 and [auto_name]C14 which are not valid tracer names.

Based on discussions at Jan 23rd MARBL meeting, a lot of re-writing code that processes default_diagnostics.json. Some highlights: 1. instead of a 'module' key, an optional 'dependencies' key is used to specify diagnostics that are not always defined 2. Better use of templating for per-tracer diagnostics (still need to expand to PFTs) The process is now: 1. default_diagnostics.json is read into self._diagnostics 2. self._diagnostics is "processed" into self._resolved_diagnostics: i. diagnostics that are not available are removed ii. all templates are filled iii. if 'frequency' is a dict, proper value is determined iv. frequency / operator are converted to lists even if just single value 3. self.diagnostics_dict is determined from self._resolved_diagnostics -- I don't know if we really want this to be part of the diagnostics class, or if this should really be part of the generate_diagnostics script. If the latter, I will rename self._resolved_diagnostics to self.diagnostics or something similar.

No need to store the frequency_operator string in diagnostics class, it can be constructed on the fly by MARBL_generate_diagnostics_file.py

Needed to rework the logic in _expand_template_value() because the location of the autotroph metadata is fundamentally different from corresponding tracer metadata: settings_dict['autotrophs(auto_ind)%sname'] vs settings_dict['_tracer_dict'][tracer_name]['short_name']. So we need to be able to loop over either auto_ind or tracer_name depending on what template we are expanding. Also noted where in the code per-autotroph dependencies will be checked (i.e. some diagnostics only apply to autotrophs that are calcifiers or silicifiers)

To support some autotroph metadata, I reworked how some of the dependency logic was determined. determined. "template_fill_dict" is turning out to be extremely useful, as it can now play a role in determining the diagnostic name (and values of some diagnostic metadata) as well as in the dependency check and determining the proper frequency of diagnostic output.

Also started to add support for per-zooplankton diags

With this commit, I believe all diagnostics are now defined in the YAML and properly processed by MARBL_generate_diagnostics_file.py

default_settings.yaml had been using //lname// to denote something to be replaced by either the autotroph long name or the zooplankton long name; to be consistent with default_diagnostics.yaml it should use ((autotroph_lname)) and ((zooplankton_lname))

Regardless of whether users prefer ".true. / .false.", "True / False", or "T / F" when setting a logical parameter via an input file, MARBL needs to be consistent internally. There are lots of checks for [variable] = ".true." in MARBL's Python so that's the format other values get converted to.

Clarified / added comments, and also cleaned up how template_fill_dict is set up

Instead of sorting tracers by module and PFT properties, now use dependencies dictionary (much like the diagnostics YAML file) Also made the tracers dictionary a seperate object in MARBL_settings rather than an entry in settings_dict[].

* renamed _array_size to _array_shape in settings YAML * cleaned up docstring for _get_array_info() * replaced "isinstance(__,unicode)" with "isinstance(__,type(u''))" (latter also works with python3) * renamed _get_dim_size -> _get_value [it's a generic routine even if it's currently only used to get shape of arrays] * replaced an enumerate with a zip() * Cleaned up error message when yaml_to_json can't import PyYAML

1. Only define autotroph _Qp diagnostic if running with variable P:C 2. Provide some documentation in default_diagnostics.yaml

Introduced MARBL_share.py to contain routines used by the two file_class.py files -- also cleaned up some of the code in each of those two files to allow as much as possible to be moved to MARBL_share.py. Subroutines in MARBL_share.py are accessed via MARBL_tools.

mnlevy1981 added 30 commits December 7, 2017 11:17

Update python file docstrings

4abe9e2

Descriptions in both yaml_to_json.py and MARBL_generate_settings_file.py were out of date

Better generate_diagnostics_file arguments

44d6dbe

Want generate_diagnostics_file() to take MARBL_diagnostics_class object as an argument so that it can be called cleanly from the GCM.

Need generate_diagnostics_file in __init__.py

6465fb6

Remove STF_O2 and FG_ALT_CO2 from diagnostic YAML

ade3c13

no need to have these two fields in default_diagnostics.yaml / json; I think STF_O2 will be removed from MARBL entirely, and FG_ALT_CO2 will be added back in once a flag for "provide _ALT_CO2 fields" exists.

Remove STF_O2 from marbl diagnostics

cb5e297

Surface tracer fluxes are available in the driver if the GCM wants to save them; no need to use the marbl diagnostic framework as well.

Add ALT_CO2 diags back to dictionary

797c760

Script on GCM side will determine whether or not to include them in final lists of diagnostics.

More diagnostics in the yaml / json

f117ced

Remove 'none' from valid operators list

1961b95

Setting "operator = none" is confusing, so if frequency = never then operator just has to be any valid operator ("instantaneous", "average", "minimum", or "maximum").

Add more diagnostics to YAML and JSON dicts

b08a1ad

Add more diagnostics to YAML / JSON

451b809

Add more tracer metadata to default_settings

f189085

Diagnostics need tracer long-name and units

Rudimentary support for per-tracer diagnostics

4daf559

The YAML and JSON file now have a '_tracers' key, and the consistency check in MARBL_utils.py knows what to do with it. Still does not produce any tracer-specific diagnostics, though (that needs to go into MARBL_diagnostics_file_class.py)

Rather than a list, store a dict of tracers

0ead5cb

Diagnostics are going to need to know tracer long name and units in order to get the tracer-specific metadata right.

Fix tracer names for ciso autotroph

2c1f2f9

Carbon tracers are [auto_name]13C and [auto_name]14C, original entry used [auto_name]C13 and [auto_name]C14 which are not valid tracer names.

diagnostics_dict is now full dictionary

16e4b4f

No need to store the frequency_operator string in diagnostics class, it can be constructed on the fly by MARBL_generate_diagnostics_file.py

Add more diagnostics to YAML file

355bc9a

Also started to add support for per-zooplankton diags

Add per-autotroph ciso-only diagnostics

06d992e

With this commit, I believe all diagnostics are now defined in the YAML and properly processed by MARBL_generate_diagnostics_file.py

mnlevy1981 added 10 commits January 31, 2018 12:45

generate_diagnostics now has --append option

389e1bc

Python code clean-up

e8f2690

Clarified / added comments, and also cleaned up how template_fill_dict is set up

Change format of tracer defintion in YAML file

e5b9086

Instead of sorting tracers by module and PFT properties, now use dependencies dictionary (much like the diagnostics YAML file) Also made the tracers dictionary a seperate object in MARBL_settings rather than an entry in settings_dict[].

More diagnostic clean-up

09fdc45

1. Only define autotroph _Qp diagnostic if running with variable P:C 2. Provide some documentation in default_diagnostics.yaml

Clean up a few more comments in python code

fdcc4f6

Clean up white space in YAML files

a80eab2

Merge branch 'master' into enhancement/YAML-diagnostics

6edc61f

mnlevy1981 merged commit 6edc61f into marbl-ecosys:master Feb 13, 2018

mnlevy1981 deleted the enhancement/YAML-diagnostics branch February 13, 2018 16:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhancement/yaml diagnostics #227

Enhancement/yaml diagnostics #227

mnlevy1981 commented Jan 29, 2018

Enhancement/yaml diagnostics #227

Enhancement/yaml diagnostics #227

Conversation

mnlevy1981 commented Jan 29, 2018