Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add basic support for variable mappings #1124

Merged
merged 50 commits into from Jun 9, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
154ab4a
Add basic support for variable mappings
zklaus May 12, 2021
363e75c
Move get_variable_mappings to _config
zklaus May 14, 2021
889173e
Add handling of mip and short_name to get_variable_mappings
zklaus May 14, 2021
204bd13
Move to new directory layout with importlib_resources
zklaus May 16, 2021
1b5fbd1
Introduce deep_update functionality
zklaus May 16, 2021
4cbb17b
Fix dataset handling
zklaus May 16, 2021
90ba75f
Use lowercase for project in filename
zklaus May 16, 2021
0cf7ad7
Allow for empty var_mapping to support existing fixes
zklaus May 16, 2021
3d5cbd8
Return empty dict instead of None to signal "no mappings"
zklaus May 17, 2021
97b0243
Change conditional import to work around mypy bug python/mypy#1153
zklaus May 17, 2021
3df52f5
Add importlib_resources to doc requirements
zklaus May 17, 2021
160e359
Improve code quality
zklaus May 17, 2021
69d7301
Add user config directory
zklaus May 17, 2021
c68aaf2
Move project variable mappings handling out of Recipe class
zklaus May 17, 2021
037422e
Add rudimentary docstring
zklaus May 17, 2021
b842a0b
Use variable details instead of variable mappings for better terminology
zklaus May 17, 2021
66a4480
Address renaming and logging suggestions
zklaus Jun 1, 2021
449daa6
Pass extra_facets through recipe to allow for easy customization
zklaus Jun 1, 2021
801c97b
Pre-commit changes
zklaus Jun 1, 2021
0624c7d
Pass extra_facets also to fx variables
zklaus Jun 1, 2021
3ff7376
Pre-commit changes
zklaus Jun 1, 2021
19bd7fb
Rename for consistency
zklaus Jun 3, 2021
857b726
Pre-commit changes
zklaus Jun 3, 2021
8037ec6
Add extra_facets_dir option to config_user.yml
zklaus Jun 3, 2021
b26362c
Add validator for new config option to experimental interface
zklaus Jun 3, 2021
2016c8b
Add mapping_key to get_cube_from_list for fixes
zklaus Jun 3, 2021
f8db25e
Simplify generation of tuple validator
zklaus Jun 4, 2021
5cf38c3
Pass entire variable dict to fix and add_fx_variables instead of only…
zklaus Jun 4, 2021
1bb84e6
Don't check for exact argument match if the preprocessor takes *args …
zklaus Jun 4, 2021
7f60151
Fix recipe tests to check agains new, more comprehensive dicts
zklaus Jun 4, 2021
384d249
Remove extra_facets_dir from example config-user.yml file
zklaus Jun 4, 2021
6dae6ad
Add basic documentation
zklaus Jun 4, 2021
6a66072
Complete documentation
zklaus Jun 4, 2021
783298f
Fix test failing because of coverage upload
bouweandela Jun 7, 2021
6dd52f8
Remove dubious caching
zklaus Jun 8, 2021
0a73a18
Add docstrings
zklaus Jun 8, 2021
ccf622e
Minor improvements
zklaus Jun 8, 2021
686c723
Add basic test for _deep_update
zklaus Jun 8, 2021
51d7f76
Add basic tests for _load_extra_facets
zklaus Jun 8, 2021
d8a42e3
Simplify handling of fx vars
zklaus Jun 8, 2021
75adca1
Fix mypy issues
zklaus Jun 9, 2021
004cafd
Remove mapping_key
zklaus Jun 9, 2021
f888f8d
Fix fx preprocessor test
zklaus Jun 9, 2021
79915cd
Improve formatting
zklaus Jun 9, 2021
731794e
Moving extra facet documentation to better places
zklaus Jun 9, 2021
3d362d0
Handle extra_facets as dictionary instead of kwargs where possible
zklaus Jun 9, 2021
99d5cfa
Add empty defaults to extra_facets to keep tests working
zklaus Jun 9, 2021
78e6e9e
Use better default for extra_facets in method signatures
zklaus Jun 9, 2021
9e24515
Update documentation with backlinks to main description
zklaus Jun 9, 2021
85918f2
Merge branch 'main' into variable-mappings
zklaus Jun 9, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions .circleci/config.yml
Expand Up @@ -47,6 +47,7 @@ jobs:
- coverage-reporter/send_report:
coverage-reports: 'test-reports/coverage.xml'
project-token: $CODACY_PROJECT_TOKEN
skip: true # skip if project-token is not defined (i.e. on a fork)

install:
# Test installation
Expand Down
18 changes: 18 additions & 0 deletions doc/develop/fixing_data.rst
Expand Up @@ -353,3 +353,21 @@ For example for monthly data, place the files in the ``/Tier3/MSWEP/latestversio
For monthly data (V220), the data must be postfixed with the date, i.e. rename ``global_monthly_050deg.nc`` to ``global_monthly_050deg_197901-201710.nc``

For more info: http://www.gloh2o.org/

.. _extra-facets-fixes:

Use of extra facets in fixes
============================
Extra facets are a mechanism to provide additional information for certain kinds
of data. The general approach is described in :ref:`extra_facets`. Here, we
describe how they can be used in fixes to mold data into the form required by
the applicable standard. For example, if the input data is part of an
observational product that delivers surface temperature with a variable name of
`t2m` inside a file named `2m_temperature_1950_monthly.nc`, but the same
variable is called `tas` in the applicable standard, a fix can be created that
reads the original variable from the correct file, and provides a renamed
variable to the rest of the processing chain.

Normally, the applicable standard for variables is CMIP6.

For more details, refer to existing uses of this feature as examples.
70 changes: 70 additions & 0 deletions doc/quickstart/configure.rst
Expand Up @@ -320,3 +320,73 @@ following documentation section:

These four items here are named people, references and projects listed in the
``config-references.yml`` file.

.. _extra_facets:

Extra Facets
============

Sometimes it is useful to provide extra information for the loading of data,
particularly in the case of native model data, or observational or other data,
that generally follows the established standards, but is not part of the big
supported projects like CMIP, CORDEX, obs4MIPs.

To support this, we provide the extra facets facilities. Facets are the
key-value pairs described in :ref:`Datasets`. Extra facets allows for the
addition of more details per project, dataset, mip table, and variable name.

More precisely, one can provide this information in an extra yaml file, named
`{project}-something.yml`, where `{project}` corresponds to the project as used
by ESMValTool in :ref:`Datasets` and "something" is arbitrary.

Format of the extra facets files
--------------------------------
The extra facets are given in a yaml file, whose file name identifies the
project. Inside the file there is a hierarchy of nested dictionaries with the
following levels. At the top there is the `dataset` facet, followed by the `mip`
table, and finally the `short_name`. The leaf dictionary placed here gives the
extra facets that will be made available to data finder and the fix
infrastructure. The following example illustrates the concept.

.. _extra-facets-example-1:

.. code-block:: yaml
:caption: Extra facet example file `native6-era5.yml`

ERA5:
Amon:
tas: {source_var_name: "t2m", cds_var_name: "2m_temperature"}


Location of the extra facets files
----------------------------------
Extra facets files can be placed in several different places. When we use them
to support a particular use-case within the ESMValTool project, they will be
provided in the sub-folder `extra_facets` inside the package
`esmvalcore._config`. If they are used from the user side, they can be either
placed in `~/.esmvaltool/extra_facets` or in any other directory of the users
choosing. In that case this directory must be added to the `config-user.yml`
file under the `extra_facets_dir` setting, which can take a single directory or
a list of directories.

The order in which the directories are searched is

1. The internal directory `esmvalcore._config/extra_facets`
2. The default user directory `~/.esmvaltool/extra_facets`
3. The custom user directories in the order in which they are given in
`config-user.yml`.

The extra facets files within each of these directories are processed in
lexicographical order according to their file name.

In all cases it is allowed to supersede information from earlier files in later
files. This makes it possible for the user to effectively override even internal
default facets, for example to deal with local particularities in the data
handling.

Use of extra facets
-------------------
For extra facets to be useful, the information that they provide must be
applied. There are fundamentally two places where this comes into play. One is
:ref:`the datafinder<extra-facets-data-finder>`, the other are
:ref:`fixes<extra-facets-fixes>`.
32 changes: 32 additions & 0 deletions doc/quickstart/find_data.rst
Expand Up @@ -303,3 +303,35 @@ flexible concatenation between two cubes, depending on the particular setup:
Note that two cube concatenation is the base operation of an iterative process of reducing multiple cubes
from multiple data segments via cube concatenation ie if there is no time-overlapping data, the
cubes concatenation is performed in one step.

.. _extra-facets-data-finder:

Use of extra facets in the datafinder
=====================================
Extra facets are a mechanism to provide additional information for certain kinds
of data. The general approach is described in :ref:`extra_facets`. Here, we
describe how they can be used to locate data files within the datafinder
framework. This is useful to build paths for directory structures and file names
that follow a different system than the established DRS for, e.g. CMIP.
A common application is the location of variables in multi-variable files as
often found in climate models' native output formats.

Another use case is files that use different names for variables in their
file name than for the netCDF4 variable name.

To apply the extra facets for this purpose, simply use the corresponding tag in
the applicable DRS inside the `config-developer.yml` file. For example, given
the extra facets in :ref:`extra-facets-example-1`, one might write the
following.

.. _extra-facets-example-2:

.. code-block:: yaml
:caption: Example drs use in `config-developer.yml`

native6:
input_file:
default: '{name_in_filename}*.nc'

The same replacement mechanism can be employed everywhere where tags can be
used, particularly in `input_dir` and `input_file`.
1 change: 1 addition & 0 deletions doc/requirements.txt
@@ -1,6 +1,7 @@
autodocsumm
dask[array]
fiona
importlib_resources
jinja2
netCDF4
numpy
Expand Down
2 changes: 2 additions & 0 deletions esmvalcore/_config/__init__.py
Expand Up @@ -3,6 +3,7 @@
get_activity,
get_institutes,
get_project_config,
get_extra_facets,
load_config_developer,
read_config_developer_file,
read_config_user_file,
Expand All @@ -14,6 +15,7 @@
'read_config_user_file',
'read_config_developer_file',
'load_config_developer',
'get_extra_facets',
'get_project_config',
'get_institutes',
'get_activity',
Expand Down
50 changes: 50 additions & 0 deletions esmvalcore/_config/_config.py
@@ -1,8 +1,11 @@
"""Functions dealing with config-user.yml / config-developer.yml."""
import collections.abc
import datetime
import logging
import os
import sys
import warnings
from functools import lru_cache
from pathlib import Path

import yaml
Expand All @@ -13,6 +16,46 @@

CFG = {}

if sys.version_info[:2] >= (3, 9):
# pylint: disable=no-name-in-module
from importlib.resources import files as importlib_files
else:
from importlib_resources import files as importlib_files


def _deep_update(dictionary, update):
for key, value in update.items():
if isinstance(value, collections.abc.Mapping):
dictionary[key] = _deep_update(dictionary.get(key, {}), value)
else:
dictionary[key] = value
return dictionary


@lru_cache
def _load_extra_facets(project, extra_facets_dir):
config = {}
config_paths = [
importlib_files("esmvalcore._config") / "extra_facets",
Path.home() / ".esmvaltool" / "extra_facets",
]
config_paths.extend([Path(p) for p in extra_facets_dir])
for config_path in config_paths:
config_file_paths = config_path.glob(f"{project.lower()}-*.yml")
for config_file_path in sorted(config_file_paths):
logger.debug("Loading extra facets from %s", config_file_path)
with config_file_path.open() as config_file:
zklaus marked this conversation as resolved.
Show resolved Hide resolved
config_piece = yaml.safe_load(config_file)
if config_piece:
_deep_update(config, config_piece)
return config


def get_extra_facets(project, dataset, mip, short_name, extra_facets_dir):
"""Read configuration files with additional variable information."""
project_details = _load_extra_facets(project, extra_facets_dir)
return project_details.get(dataset, {}).get(mip, {}).get(short_name, {})


def read_config_user_file(config_file, folder_name, options=None):
"""Read config user file and store settings in a dictionary."""
Expand Down Expand Up @@ -61,6 +104,7 @@ def read_config_user_file(config_file, folder_name, options=None):
'output_file_type': 'png',
'output_dir': 'esmvaltool_output',
'auxiliary_data_dir': 'auxiliary_data',
'extra_facets_dir': tuple(),
'save_intermediary_cubes': False,
'remove_preproc_dir': True,
'max_parallel_tasks': None,
Expand All @@ -83,6 +127,12 @@ def read_config_user_file(config_file, folder_name, options=None):
cfg['output_dir'] = _normalize_path(cfg['output_dir'])
cfg['auxiliary_data_dir'] = _normalize_path(cfg['auxiliary_data_dir'])

if isinstance(cfg['extra_facets_dir'], str):
cfg['extra_facets_dir'] = (_normalize_path(cfg['extra_facets_dir']), )
else:
cfg['extra_facets_dir'] = tuple(
_normalize_path(p) for p in cfg['extra_facets_dir'])

cfg['config_developer_file'] = _normalize_path(
cfg['config_developer_file'])

Expand Down