Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rethinking the configuration file #795

Open
stefsmeets opened this issue Sep 24, 2020 · 9 comments
Open

Rethinking the configuration file #795

stefsmeets opened this issue Sep 24, 2020 · 9 comments
Labels
enhancement New feature or request

Comments

@stefsmeets
Copy link
Contributor

stefsmeets commented Sep 24, 2020

Hi everyone,

Over the last week I have been working on implementing an importable config item (#785). As part of this PR, I have been discussing with @bouweandela and @Peter9192 if this would be a good time to rethink how the config file is structured. I would like to start a discussion on how to change the configuration.

There are several issues on the topc. I think with #785 we have made some strides towards tackling nearly all of them. In the next post I will detail we propose to set it up.

#794: Example config file could use some improvements
#793: "Default" is not a great name for a DRS in the configuration
#494: Support per dataset DRS
#129: Specify directory stucture per path instead of per project in config-user.yml
#100: Config-user.yml rootpath.
#93: Too many unnessary settings in config-user.yml
#64: Make user configuration file more user friendly
#2280: ignore .esmvaltool/config-user.yml when a custom --config-file is specified

In particular, I also want to draw your attention #93, to remove some settings from the default user config file. Which settings are really necessary, and which should be deprecated? Are there new keywords that should be added?

Suggestions from @bouweandela to be moved to the diagnostic script settings in the recipe (#93):
write_plots: true
write_netcdf: true
output_file_type: png # [ps]/pdf/png/eps/epsi
profile_diagnostic: false -> only interesting for Python diagnostic developers
save_intermediary_cubes: false -> for debugging purposes, probably only interesting for diagnostic developers
exit_on_warning: false -> specific to NCL diagnostics, probably only interesting for diagnostic developers
Using these options in the configuration file instead of the recipe would need to be deprecated in the upcoming release (2.1, planned for October 2020, feature freeze October 5th) and removed two releases later (2.3, planned June 2021), see also the release schedule.

Also:
config_developer_file: null -> No longer needed with the PR, could be tweaked to specify the default data reference syntax

@stefsmeets stefsmeets added the enhancement New feature or request label Sep 24, 2020
@stefsmeets
Copy link
Contributor Author

stefsmeets commented Sep 24, 2020

We have been working with the following scenarios:

  1. As a user, the default settings should work out-of-the-box
  2. As a user, I want to specify a directory with some data
  3. As an advanced user, I want to tweak the directory structure to my liking
  4. As a cluster administrator, I want to define custom data reference syntax
  5. As a cluster user, I want to easily quickly initialize configs for different machines
  6. As a cluster user, I want to add an additional data path with some extra data I downloaded
  7. As a developer, the rootpath/drs/cfg-developer structure is a bit of a mess and confusing to work with

So in the PR I have done away with the cfg-developer.yml, and split it up into different DRS specifications per machine (DRKZ, Jasmin, etc.)

In the console you can list them:

~$ esmvaltool config list
# Available site-specific configs
# Use `esmvaltool config get [config_name]` to copy them
- badc
- bsc
- cp4cds
- dkrz
- ethz
- jasmin
- rcast
- smhi
- user [default]

# Available configs in /home/stef/.esmvaltool
- config-dkrz.yml
- config-jasmin.yml
- config-user.yml [default]

To get the default config (config-user.yml which embeds drs_user.yml):

~$ esmvaltool config get --overwrite
2020-09-24 08:16:00,658 UTC [23612] INFO    Overwriting file /home/stef/.esmvaltool/config-user.yml.
2020-09-24 08:16:00,658 UTC [23612] INFO    Writing config file to /home/stef/.esmvaltool/config-user.yml.

To get a specific config:

~$ esmvaltool config get ethz
2020-09-24 08:16:48,565 UTC [23622] INFO    Writing config file to /home/stef/.esmvaltool/config-ethz.yml.

In the config itself, I have removed the drs/rootpath mappings.

Instead, there is a new keyword called default_inputpath. This is the default rootpath to use for all projects.

# Default data location (can be a list)
default_inputpath: ~/default_inputpath

The config-developer.yml is now called data_reference_syntax.yml and contains the default values. If you want to customize the data reference syntax, you can do that as follows:

data_reference_syntax:
  CMIP6:
    rootpath:
      - ~/cmip6_inputpath1
      - ~/cmip6_inputpath2
  CMIP5:
    rootpath: ~/cmip5_inputpath

The rootpath here will update the default rootpath.

You can also add custom entries. If you prefix them with one of the existing ones (CMIP5, CMIP6, etc.), it will pull the defaults from there.

So for ETHZ, it would look like. This is defined in a file called drs-ethz.yml. The idea is that having all the settings in one place, will make it easier to maintain. It looks like this:

data_reference_syntax:
  CMIP6_ETHZ:
    rootpath: /net/atmos/data/cmip6
    input_dir: '{exp}/{mip}/{short_name}/{dataset}/{ensemble}/{grid}/'
  CMIP5_ETHZ:
    rootpath: /net/atmos/data/cmip5
    input_dir: '{exp}/{mip}/{short_name}/{dataset}/{ensemble}/'
  CMIP3_ETHZ:
    rootpath: /net/atmos/data/cmip3
  OBS_ETHZ:
    rootpath: /net/exo/landclim/PROJECTS/C3S/datadir/obsdir/

Internally, CMIP6_ETHZ maps to the CMIP6 defaults, and creates an additional DRS entry called CMIP6_ETHZ to look for data. This makes it easy to add your own data locations without messing up the site specific ones. The default DRS is listed in data_reference_syntax.yml.

Curious to hear what you think!

@Peter9192
Copy link
Contributor

Peter9192 commented Sep 24, 2020

To get a specific config:

~$ esmvaltool config get ethz
2020-09-24 08:16:48,565 UTC [23622] INFO    Writing config file to /home/stef/.esmvaltool/config-ethz.yml.

so to be clear, this combines the 'standard' settings from config-user.yml with the 'site-specific' drs-ethz.yml files (both hidden from the user) into a single config-ethz.yml which is then copied to the user's home directory, where it can be further adapted and tailored to the user's needs. Right?

@stefsmeets
Copy link
Contributor Author

stefsmeets commented Sep 24, 2020

so to be clear, this combines the 'standard' settings from config-user.yml with the 'site-specific' drs-ethz.yml files (both hidden from the user) into a single config-ethz.yml which is then copied to the user's home directory, where it can be further adapted and tailored to the user's needs. Right?

Yeah, that's correct.

@Peter9192
Copy link
Contributor

Awesome! 😎

@bouweandela
Copy link
Member

I updated the description of the issue to be more specific about how to move options from config-user.yml to the recipe.

@stefsmeets
Copy link
Contributor Author

I updated the description of the issue to be more specific about how to move options from config-user.yml to the recipe.

With the way we load the config in the PR, we can quite easily add deprecation validators to let people know that these settings will be deprecated.

@stefsmeets
Copy link
Contributor Author

@ESMValGroup/esmvaltool-coreteam

@bsolino
Copy link
Contributor

bsolino commented Apr 23, 2021

This is awesome, I really like this idea!

However, if both settings are combined into the same file I'm not sure how this will work when new datasets are added, especially in the context of a cluster.

Wouldn't it be a better solution for the configuration file in the user folder to reference the drs file they prefer to use? Cluster users are not expected to want to change the DRS's anyway, and that allows the user to have access to new datasets without changing their configuration.

To reference those files I think it would be better not to use direct paths, so it's not tied to a specific ESMValTool repository. My suggestion would be to reference the cluster, or allowing the option "user", who then would look for a data_reference_syntax.yml file in the configuration folder.

Eg. (in config_user.yml)

drs: ethz

or

drs: user

@zklaus
Copy link
Contributor

zklaus commented May 27, 2021

I like the idea of rethinking the config system. Some technical pointers:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants