Support per dataset DRS #494

bouweandela · 2020-02-14T15:29:05Z

Now that we have support for reading in the ERA5 dataset in it's native format through the native6 project (#447), it would be good to add support for a per dataset DRS, so other datasets (which will typically come with their own DRS) can also be supported using the native6 project.

The text was updated successfully, but these errors were encountered:

valeriupredoi · 2020-02-14T16:06:11Z

config-developer.yml or you thinking something fancier so we don't overcrowd the file? 🍺

bouweandela · 2020-02-24T16:12:01Z

Yes, just in config-developer.yml

bouweandela · 2020-05-18T14:21:00Z

Related to #641

bsolino · 2021-04-23T18:01:48Z

I have made this suggestion of how the config_developer file could look in #970 (comment), but I think it could be relevant to this discussion:

native6:
  default:
    input_dir:
      default: 'Tier{tier}/{dataset}/{latestversion}/{frequency}/{short_name}'
      BSC: '{project}/{dataset}_{version}/{area}_{frequency}_{grid}'
      RCAST: '...'
    input_file:
      default: '*.nc'
      BSC: '*.nc'
      RCAST: '*.nc'
  ICON:
    input_dir:
      default: '{model_version}_{model_component}_{experiment}_{grid}_{id}'
    input_file:
      default: '{model_version}_{model_component}_{experiment}_{grid}_{id}_{var_type}*.nc'
  EMAC:
    ...

zklaus · 2021-05-18T09:30:52Z

@Peter9192 has withdrawn his initial attempt in #970. @bsolino could you pick this up? Perhaps @senesis could help?

senesis · 2021-05-19T15:29:35Z

There is an ambiguity in the comments above, about what "a dataset" means, as this can be :

either the initial meaning such as the ERA5 dataset, or a given model (so ; the value under the 'dataset' key of an entry of the dataset section of recipes)
or an entry of the datasets section of a recipe

My feeling is that one should be able to redefine both the DRS label and the rootpath in each datasets section entry (i.e. change the default value that are derived form the config-user file.

This would allow to avoid changing the structure of the project entry in config-dev file : we would just have to list all possible DRS without grouping by model. Using model name in DRS labels could be helpful

senesis · 2021-05-21T09:12:45Z

@Peter9192 has withdrawn his initial attempt in #970. @bsolino could you pick this up? Perhaps @senesis could help?

I may help. My proposal would be :
In the config-dev file, e.g.

native6:
  cmor_strict: false
  input_dir:
    default: 'Tier{tier}/{dataset}/{latestversion}/{frequency}/{short_name}'
    BSC: '{project}/{dataset}_{version}/{area}_{frequency}_{grid}'
    ICON_DRS : '{model_version}_{model_component}_{experiment}_{grid}_{id}'
    IPSL_A:'{account}/{model}/{status}/{exp}/{simulation}/{igcm_dir}/Analyse/{freq}'
    IPSL_O: '{account}/{model}/{status}/{exp}/{simulation}/{igcm_dir}/Output/{freq}' 
  input_file:
    default: '*.nc'
    BSC: '*.nc'
    ICON : '{model_version}_{model_component}_{experiment}_{grid}_{id}_{var_type}*.nc'
    IPSL_A: '{simulation}_*_{file_var_name}.nc'
    IPSL_O: '{simulation}_*_{varname_in_filename}.nc'
  output_file: '{project}_{dataset}_{type}_{version}_{mip}_{short_name}'
  cmor_type: 'CMIP6'
  cmor_default_table_prefix: 'CMIP6_'

and in the config-user file, e.g. :

drs:
  native6: ICON_DRS

and in any recipe, e.g. :

datasets:
  - {dataset: ICON,  project: native6, exp: historical,  ...}
  - {dataset: IPSL,  project: native6, exp: historical,  ..., DRS: IPSL_A}
  - {dataset: IPSL,  project: native6, exp: hist_alt  ,  ..., DRS: IPSL_O}

bsolino · 2021-05-25T07:36:12Z

@senesis Your suggestion has the same issue that I was trying to address in my proposal, in that it's mixing at the same level datasets (eg. ICON, IPSL) and directory structures in machines (eg. BSC, DKRZ). Each machine could have each project stored in a different way, and I believe the purpose of the DRS is to make the data easily accessible to users of those machines by letting them set a DRS for each project: just by setting the DRS in config-user.yml a user can locate all the project data stored in the machine they are using.

Note that I'm having a bit of a terminology issue here, because I am still unsure about the difference between the concepts of a "dataset" and a "project".

Furthermore, your suggestion seems to be coupling recipes with the directory structures in the machines. That would mean that it could be necessary to modify each recipe every time you want to execute it on a different machine.

The way I see it, we need two keys to identify the directory structure: one that refers to the machine and one that refers to the dataset. So, the configuration file should be set up in a way that makes possible to combine those keys.

bsolino · 2021-05-25T07:52:43Z

There is an ambiguity in the comments above, about what "a dataset" means, as this can be :

* either the initial meaning such as the ERA5 dataset, or a given model (so ; the value under the 'dataset' key of an entry of the dataset section of recipes)

* or an entry of the datasets section of a recipe

I understand that this refers to the {dataset} key in the default route, is that correct?

I think that key exists because the original intent of native6 was to set up the data in a way that the directory structure of each dataset would be identical, and so datasets could be differenciated by the folder in which they were located. If we move forward with this idea of allowing different directory structures for each dataset, perhaps that key could be unnecessary.

I believe the original idea was from @bouweandela. Bouwe, do you have any specific opinion on this?

Peter9192 · 2021-05-25T14:04:40Z

and in the config-user file, e.g. :
drs:
  native6: ICON_DRS

So how would that work if I want to use both ERA5 and ICON data through the native6 project?

senesis · 2021-05-26T12:52:14Z

So how would that work if I want to use both ERA5 and ICON data through the native6 project?

That way

config-user.yml

drs:
  native6: ICON_DRS

recipe :

datasets:
  - {dataset: ICON,  project: native6, exp: historical,  ...}
  - {dataset: ERA5  project: native6, exp: historical,  ..., DRS: ERA5_DRS}

That feature is useful as soon as a given model may come with two DRS (for instance one experiment is i a shared CMIP6 data store, another one is stored in a private location). My proposal is that also the rootpath can be specified in datasets entries

bouweandela · 2021-05-27T10:16:55Z

That feature is useful as soon as a given model may come with two DRS (for instance one experiment is i a shared CMIP6 data store, another one is stored in a private location)

That is a different issue, let's discuss it here: #129

There is an ambiguity in the comments above, about what "a dataset" means, as this can be :

either the initial meaning such as the ERA5 dataset, or a given model (so ; the value under the 'dataset' key of an entry of the dataset section of recipes)

or an entry of the datasets section of a recipe

I believe the original idea was from @bouweandela. Bouwe, do you have any specific opinion on this?

With dataset I meant the first option mentioned here.

My feeling is that one should be able to redefine both the DRS label and the rootpath in each datasets section entry (i.e. change the default value that are derived form the config-user file.

As pointed out by @bsolino, the idea behind using a DRS is that recipes are machine-independent, so specifying this in the recipe defeats the purpose of using a DRS.

bsolino · 2021-06-01T15:25:45Z

@bouweandela: I meant to ask something else, but I think I didn't make my question very clear. I was wondering on how to deal with the ambiguity of the {dataset} key given that we are going to give support different DRS for the datasets. It seems to me that we will have to rethink the original idea of distinguishing the datasets by their folder.

The main options I can see right now (there may be more that I'm missing) are:

Keep the default as is, only use it if the dataset doesn't have an explicit DRS. New datasets could then be added by the current method. However, "dataset" would take a double meaning, both a key in the config file and a part of the folder structure. I'm not sure that having that double meaning is a good option.
Remove "{dataset}" from the default path. Unfortunately, I'm not sure if it would be possible to do this without rendering the idea of the default path useless.

In summary, I'm not sure how to move forward and avoid the ambiguity without crashing with what has already been done.

senesis · 2021-06-02T07:06:31Z

In my opinion, we are here complicating without any need by trying to put multiple datasets (e.g. ERA5 and ICON) in a same and single project (native6).
This is a very different situation w.r.t. the basic design of config files, i.e. for projects CMIP5 and CMIP6, in which all datasets share the same DRS on a given machine, this DRS being configured at the confg-user level.
Here, we want to handle data sources (e.g. one model or one observation data set) which each have their own set of attributes/facets, and each have their own DRS.
This means that these data sources look very much like projects; and as a matter of fact, defining a project for each allows to handle it without any code change . The only drawback is that it populates both the config-development file and the _fixes directory (rather than populating only the _fixes/natives6 directory). I think that this is a very light price to pay, w.r.t. complicating the config file structure and the code
Said otherwise : as soon as a dataset needs a specific DRS (or better said : as soon as the need for a specific DRS is not linked to the change of machine), it means that such a dataset is rather what I call a 'data source', which is best handled in ESMValTool as a project.

zklaus · 2021-06-07T15:33:57Z

I am not sure if we still want this; We seem to be able to achieve most goals by using projects instead and @Peter9192's PR on the topic (#970) is now abandoned. I'll move this to 2.4.0, but we may well just close it in the end.

senesis · 2021-06-07T16:18:12Z

We seem to be able to achieve most goals by using projects instead ... we may well just close it in the end

I agree if implementing #129 , as @bouweandela suggested

bsolino · 2021-06-09T11:31:12Z

The only drawback is that it populates both the config-development file and the _fixes directory (rather than populating only the _fixes/natives6 directory). I think that this is a very light price to pay, w.r.t. complicating the config file structure and the code

That's an excellent point. I agree that it would be too much for too little gain. And there are other avenues to solve the issues.

For starters, I'm not even sure it is a drawback in the config-developer file, as the total amount of lines added would be quite similar, the difference being mostly organizational. That could be achieved in other means, starting by something as simple as comments on the configuration file.

There could also be populating config-user issues, but I think there might be a better way to approach it. I think I better open a new issue for it.

EDIT: Opened in #1165

bouweandela · 2021-06-17T07:58:49Z

Indeed, let's just create a new project per model that uses a custom DRS, until we have too many projects (e.g. > 10) and we need to come up with something smarter.

zklaus · 2021-06-17T08:38:27Z

Ok, seems we are agreed. Closing this for now, please, anybody, feel free to reopen if you want to discuss this some more.

bouweandela added enhancement New feature or request EMAC labels Feb 14, 2020

mattiarighi added this to To do in Medium priority issues via automation May 25, 2020

stefsmeets mentioned this issue Sep 24, 2020

Rethinking the configuration file #795

Open

bouweandela mentioned this issue Jan 18, 2021

How do we treat reanalysis with members? #945

Open

Peter9192 mentioned this issue Jan 29, 2021

Support per dataset DRS in native6 project #970

Closed

10 tasks

This was referenced May 17, 2021

Add era5 variable mappings #1125

Draft

Coding sprint on native model data #1119

Closed

zklaus added this to the v2.3.0 milestone May 20, 2021

This was referenced May 22, 2021

Handle IPSLCM native formats #1143

Closed

List info_keys (for dataset's alias) is too short for model development use case #1144

Open

zklaus modified the milestones: v2.3.0, v2.4.0 Jun 7, 2021

bsolino mentioned this issue Jun 9, 2021

Simplify DRS and rootpath in user configuration by using defaults or lists #1165

Open

bouweandela mentioned this issue Jun 17, 2021

Project's attribute 'output_file' can be a dict #1152

Closed

10 tasks

zklaus closed this as completed Jun 17, 2021

Medium priority issues automation moved this from To do to Done Jun 17, 2021

bouweandela mentioned this issue Jun 29, 2021

Replace recipe_era5.yml with recipe_daily_era5.yml ESMValGroup/ESMValTool#2182

Merged

11 tasks

bouweandela mentioned this issue Jul 26, 2021

Update Jasmin example config in config-user.yml to include native6 project #1246

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support per dataset DRS #494

Support per dataset DRS #494

bouweandela commented Feb 14, 2020

valeriupredoi commented Feb 14, 2020

bouweandela commented Feb 24, 2020

bouweandela commented May 18, 2020

bsolino commented Apr 23, 2021 •

edited

zklaus commented May 18, 2021

senesis commented May 19, 2021 •

edited

senesis commented May 21, 2021

bsolino commented May 25, 2021

bsolino commented May 25, 2021

Peter9192 commented May 25, 2021

senesis commented May 26, 2021

bouweandela commented May 27, 2021

bsolino commented Jun 1, 2021

senesis commented Jun 2, 2021

zklaus commented Jun 7, 2021

senesis commented Jun 7, 2021

bsolino commented Jun 9, 2021 •

edited

bouweandela commented Jun 17, 2021

zklaus commented Jun 17, 2021

Support per dataset DRS #494

Support per dataset DRS #494

Comments

bouweandela commented Feb 14, 2020

valeriupredoi commented Feb 14, 2020

bouweandela commented Feb 24, 2020

bouweandela commented May 18, 2020

bsolino commented Apr 23, 2021 • edited

zklaus commented May 18, 2021

senesis commented May 19, 2021 • edited

senesis commented May 21, 2021

bsolino commented May 25, 2021

bsolino commented May 25, 2021

Peter9192 commented May 25, 2021

senesis commented May 26, 2021

bouweandela commented May 27, 2021

bsolino commented Jun 1, 2021

senesis commented Jun 2, 2021

zklaus commented Jun 7, 2021

senesis commented Jun 7, 2021

bsolino commented Jun 9, 2021 • edited

bouweandela commented Jun 17, 2021

zklaus commented Jun 17, 2021

bsolino commented Apr 23, 2021 •

edited

senesis commented May 19, 2021 •

edited

bsolino commented Jun 9, 2021 •

edited