Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement/yaml diagnostics #227

Merged
merged 40 commits into from
Feb 13, 2018
Merged

Enhancement/yaml diagnostics #227

merged 40 commits into from
Feb 13, 2018

Conversation

mnlevy1981
Copy link
Collaborator

This pull request sets up infrastructure (scripts and more YAML / JSON files) to allow MARBL to define what diagnostics are available without building / running any Fortran code. The script MARBL_tools/MARBL_generate_diagnostics_file.py creates a text file of the format

diagnostic_short_name : frequency_operator[, frequency2_operator2, ..., frequencyN_operatorN]

and each GCM will need to convert it to whatever format it uses for setting up diagnostic output. The corresponding POP changes are in place, but I still need to update how the POP-owned MARBL diagnostics (tracer states, etc) are included in the tavg file before bringing this branch onto master.

I've put a few diagnostics into default_diagnostics.yaml, but still have lots
more to add. I've also converted YAML -> JSON (but still need to write a script
to read the JSON file), which required skipping the consistency check in
yaml_to_json.py.

I want to update yaml_to_json.py to take a list of YAML files and convert them
all to JSON (so user will specify a directory for the output but the script
will come up with file names on its own), but haven't done that yet.
Still need to finish adding diagnostics to default_diagnostics.yaml and also
clearly define the schema being used (it will be less complicated than the
schema in default_settings.yaml), but now I can also rough out a script to
generate something similar to a tavg contents file.
Descriptions in both yaml_to_json.py and MARBL_generate_settings_file.py were
out of date
This is just the shell that will contain a new class in MARBL_tools. It still
needs hooks to the default_diagnostics.json file, plus a lot of other stuff...

Tested with:

\# (1) import MARBL tools
import sys
sys.path.append($MARBLROOT)
import MARBL_tools

\# (2) create a settings object
settings_obj = MARBL_tools.MARBL_settings_class('../autogenerated_src/default_settings.json')

\# (3) create a diagnostics object
diags_obj =
MARBL_tools.MARBL_diagnostics_class('../autogenerated_src/default_diagnostics.json',
settings_obj, input_diagnostics_file=[test file])

format for the input_diagnostics_file will be

diagnostic_name : frequency

and '#' will be treated as comments. Comment separated frequencies => output
the same variable at multiple frequencies.
Also continued work on MARBL_diagnostics_file_class.py, which now reads
frequencies from JSON and creates a dictionary (key is diagnostic name, value
is recommended frequency). Still need to add support for reading a text file to
change frequency from default in cases where user wants non-standard output.
Unlike generate_settings_file(), which uses --settings_file_in to provide a
text file with individual settings file overrides, generate_diagnostics_file()
will not allow values from the JSON file to be overridden via text file input.
It will be up to the GCM to provide a way for the user to change the
diagnostics from the default (CESM will allow users to put marbl_diagnostics
into SourceMods/src.pop)

Also added some comments to the top of the diagnostic file output.
Want generate_diagnostics_file() to take MARBL_diagnostics_class object as an
argument so that it can be called cleanly from the GCM.
no need to have these two fields in default_diagnostics.yaml / json; I
think STF_O2 will be removed from MARBL entirely, and FG_ALT_CO2 will be
added back in once a flag for "provide _ALT_CO2 fields" exists.
Surface tracer fluxes are available in the driver if the GCM wants to
save them; no need to use the marbl diagnostic framework as well.
Allowable operator values are none, average, instantaneous, minimum, and
maximum; MARBL_utils:diagnostics_dictionary_is_consistent now actually
includes some checks (such as "are frequency and operator valid?", and
the MARBL_generate_diagnostics_file.py script now returns text formatted
as

DIAGNOSTIC : frequency_operator

(where before, the _operator was not included)
Script on GCM side will determine whether or not to include them in final lists
of diagnostics.
Unless ciso_on = .true. in MARBL settings, the diagnostics associated
with carbon isotopes should not be included in the diagnostic file. I
added a key to the diagnostic dictionary ("module"), and diagnostics
where module = ciso are only included if ciso_on = .true.

I added a few ciso surface diagnostics to test this out but have lots
more to add in the next commit.
Setting "operator = none" is confusing, so if frequency = never then
operator just has to be any valid operator ("instantaneous", "average",
"minimum", or "maximum").
default_settings.json now contains _tracer_list instead of _tracer_cnt; it is a
dictionary that is used in conjunction with the rest of settings_dict to
determine which tracers are being requested. get_tracer_count now returns the
length of the list of all requested tracers (and there is a get_tracer_list
routine to provide the tracers themselves as well).
Diagnostics need tracer long-name and units
The YAML and JSON file now have a '_tracers' key, and the consistency
check in MARBL_utils.py knows what to do with it. Still does not produce
any tracer-specific diagnostics, though (that needs to go into
MARBL_diagnostics_file_class.py)
Diagnostics are going to need to know tracer long name and units in
order to get the tracer-specific metadata right.
This is kludgy -- ideally,
MARBL_diagnostics_file_class::_process_diagnostic_frequency would be
able to determine if a tracer is listed in tracer_restore_vars entirely
based on logic outlined in the JSON file... but instead __init__ has a
block of code under the comment

\# Special treatment for tracers, PFTs, etc

that, in part, passes information about whether the tracer was found in
tracer_restore_vars or not.
Carbon tracers are [auto_name]13C and [auto_name]14C, original entry
used [auto_name]C13 and [auto_name]C14 which are not valid tracer names.
Based on discussions at Jan 23rd MARBL meeting, a lot of re-writing code that
processes default_diagnostics.json. Some highlights:

1. instead of a 'module' key, an optional 'dependencies' key is used to specify
diagnostics that are not always defined
2. Better use of templating for per-tracer diagnostics (still need to expand to
PFTs)

The process is now:

1. default_diagnostics.json is read into self._diagnostics
2. self._diagnostics is "processed" into self._resolved_diagnostics:
   i.   diagnostics that are not available are removed
   ii.  all templates are filled
   iii. if 'frequency' is a dict, proper value is determined
   iv.  frequency / operator are converted to lists even if just single value
3. self.diagnostics_dict is determined from self._resolved_diagnostics
-- I don't know if we really want this to be part of the diagnostics class, or
if this should really be part of the generate_diagnostics script. If the
latter, I will rename self._resolved_diagnostics to self.diagnostics or
something similar.
No need to store the frequency_operator string in diagnostics class, it can be
constructed on the fly by MARBL_generate_diagnostics_file.py
Needed to rework the logic in _expand_template_value() because the location of
the autotroph metadata is fundamentally different from corresponding tracer
metadata: settings_dict['autotrophs(auto_ind)%sname'] vs
settings_dict['_tracer_dict'][tracer_name]['short_name']. So we need to be able
to loop over either auto_ind or tracer_name depending on what template we are
expanding.

Also noted where in the code per-autotroph dependencies will be checked (i.e.
some diagnostics only apply to autotrophs that are calcifiers or silicifiers)
To support some autotroph metadata, I reworked how some of the
dependency logic was determined. determined. "template_fill_dict" is
turning out to be extremely useful, as it can now play a role in
determining the diagnostic name (and values of some diagnostic metadata)
as well as in the dependency check and determining the proper frequency
of diagnostic output.
Also started to add support for per-zooplankton diags
With this commit, I believe all diagnostics are now defined in the YAML
and properly processed by MARBL_generate_diagnostics_file.py
default_settings.yaml had been using //lname// to denote something to be
replaced by either the autotroph long name or the zooplankton long name;
to be consistent with default_diagnostics.yaml it should use
((autotroph_lname)) and ((zooplankton_lname))
Regardless of whether users prefer ".true. / .false.", "True / False",
or "T / F" when setting a logical parameter via an input file, MARBL
needs to be consistent internally. There are lots of checks for
[variable] = ".true." in MARBL's Python so that's the format other
values get converted to.
Clarified / added comments, and also cleaned up how template_fill_dict is set
up
Instead of sorting tracers by module and PFT properties, now use
dependencies dictionary (much like the diagnostics YAML file)

Also made the tracers dictionary a seperate object in MARBL_settings
rather than an entry in settings_dict[].
* renamed _array_size to _array_shape in settings YAML
* cleaned up docstring for _get_array_info()
* replaced "isinstance(__,unicode)" with "isinstance(__,type(u''))"
  (latter also works with python3)
* renamed _get_dim_size -> _get_value [it's a generic routine even if
  it's currently only used to get shape of arrays]
* replaced an enumerate with a zip()
* Cleaned up error message when yaml_to_json can't import PyYAML
1. Only define autotroph _Qp diagnostic if running with variable P:C
2. Provide some documentation in default_diagnostics.yaml
Introduced MARBL_share.py to contain routines used by the two
file_class.py files -- also cleaned up some of the code in each of those
two files to allow as much as possible to be moved to MARBL_share.py.

Subroutines in MARBL_share.py are accessed via MARBL_tools.
@mnlevy1981 mnlevy1981 merged commit 6edc61f into marbl-ecosys:master Feb 13, 2018
@mnlevy1981 mnlevy1981 deleted the enhancement/YAML-diagnostics branch February 13, 2018 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant