-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support per dataset DRS #494
Comments
|
Yes, just in config-developer.yml |
Related to #641 |
I have made this suggestion of how the config_developer file could look in #970 (comment), but I think it could be relevant to this discussion:
|
@Peter9192 has withdrawn his initial attempt in #970. @bsolino could you pick this up? Perhaps @senesis could help? |
There is an ambiguity in the comments above, about what "a dataset" means, as this can be :
My feeling is that one should be able to redefine both the DRS label and the rootpath in each datasets section entry (i.e. change the default value that are derived form the config-user file. This would allow to avoid changing the structure of the project entry in config-dev file : we would just have to list all possible DRS without grouping by model. Using model name in DRS labels could be helpful |
I may help. My proposal would be :
and in the config-user file, e.g. :
and in any recipe, e.g. :
|
@senesis Your suggestion has the same issue that I was trying to address in my proposal, in that it's mixing at the same level datasets (eg. ICON, IPSL) and directory structures in machines (eg. BSC, DKRZ). Each machine could have each project stored in a different way, and I believe the purpose of the DRS is to make the data easily accessible to users of those machines by letting them set a DRS for each project: just by setting the DRS in config-user.yml a user can locate all the project data stored in the machine they are using. Note that I'm having a bit of a terminology issue here, because I am still unsure about the difference between the concepts of a "dataset" and a "project". Furthermore, your suggestion seems to be coupling recipes with the directory structures in the machines. That would mean that it could be necessary to modify each recipe every time you want to execute it on a different machine. The way I see it, we need two keys to identify the directory structure: one that refers to the machine and one that refers to the dataset. So, the configuration file should be set up in a way that makes possible to combine those keys. |
I understand that this refers to the I think that key exists because the original intent of native6 was to set up the data in a way that the directory structure of each dataset would be identical, and so datasets could be differenciated by the folder in which they were located. If we move forward with this idea of allowing different directory structures for each dataset, perhaps that key could be unnecessary. I believe the original idea was from @bouweandela. Bouwe, do you have any specific opinion on this? |
So how would that work if I want to use both ERA5 and ICON data through the native6 project? |
That way config-user.yml
recipe :
That feature is useful as soon as a given model may come with two DRS (for instance one experiment is i a shared CMIP6 data store, another one is stored in a private location). My proposal is that also the rootpath can be specified in datasets entries |
That is a different issue, let's discuss it here: #129
With
As pointed out by @bsolino, the idea behind using a DRS is that recipes are machine-independent, so specifying this in the recipe defeats the purpose of using a DRS. |
@bouweandela: I meant to ask something else, but I think I didn't make my question very clear. I was wondering on how to deal with the ambiguity of the {dataset} key given that we are going to give support different DRS for the datasets. It seems to me that we will have to rethink the original idea of distinguishing the datasets by their folder. The main options I can see right now (there may be more that I'm missing) are:
In summary, I'm not sure how to move forward and avoid the ambiguity without crashing with what has already been done. |
In my opinion, we are here complicating without any need by trying to put multiple datasets (e.g. ERA5 and ICON) in a same and single project (native6). |
I am not sure if we still want this; We seem to be able to achieve most goals by using projects instead and @Peter9192's PR on the topic (#970) is now abandoned. I'll move this to 2.4.0, but we may well just close it in the end. |
I agree if implementing #129 , as @bouweandela suggested |
That's an excellent point. I agree that it would be too much for too little gain. And there are other avenues to solve the issues. For starters, I'm not even sure it is a drawback in the config-developer file, as the total amount of lines added would be quite similar, the difference being mostly organizational. That could be achieved in other means, starting by something as simple as comments on the configuration file. There could also be populating config-user issues, but I think there might be a better way to approach it. I think I better open a new issue for it. EDIT: Opened in #1165 |
Indeed, let's just create a new project per model that uses a custom DRS, until we have too many projects (e.g. > 10) and we need to come up with something smarter. |
Ok, seems we are agreed. Closing this for now, please, anybody, feel free to reopen if you want to discuss this some more. |
Now that we have support for reading in the ERA5 dataset in it's native format through the
native6
project (#447), it would be good to add support for a per dataset DRS, so other datasets (which will typically come with their own DRS) can also be supported using thenative6
project.The text was updated successfully, but these errors were encountered: