-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What about double-counting variables #224
Comments
Currently, a prioritization is made based upon the realms (see output of taskloader) |
For the TM5-IFS part, mostly we would like to have precedence with TM5 for tables AER* and IFS for tables A*. I am sure there are few exceptions, but this would be a first order suggestion. Precedence of IFS over TM5 is true especially for the meteorological variables (these are mainly in Amon), since anyone using the data can always regrid to lower resolution. |
We decided to produce a file with double counting variables with rules on which component should in which case produce the variable. Format is variable name, table, components in list of preferred order. So for example a line Actually the table column could also be a list, since more than one table but not all can have same preference. Or what do others think? Attached is a list for TM5 |
It should also be noted that the user will have to give the 'model configuration' (i.e. list of components) that has produced the data, even though one is only cmorizing variables for one component at the time... |
Hi @tommibergman and @treerink after some thought I came to the following conclusion: it may be more appropriate to write a separate script that splits the input data request into json variable list files according to EC-Earth component. In this way, it becomes more traceable and transparent which variables are being produced by which component, it can even be archived or put under version control with the model configuration files. This script will of course make use of the preferences file proposed above. |
@goord would the same idea possible but then with these component json files again merged in one json file in the end for each mip experiment for a given ece model configuration? This makes the archiving more compact, but also the cmorisation more straight forward, because otherwise one has to specify several jsons and pick the right ones when cmorising. Or does this break your idea? |
@treerink we can also make a single json file with an extra level denoting the components, e.g.
If one specifies such a json, it can be crystal clear for the task loader and the user which variables will be omitted when processing for a single component. |
Yes, sounds like a plan. So let's try this for #253. |
This plan of having one json file for each job sounds good! But I'd like to comment a bit on what a job is: Wrt "model configuration", in common parlance this does not refer merely to a collection of components, but to what are separate models from the point of view of cmip6, eg These two things are the only two that we should consider for organizing the json files.
eg
The Do you agree? |
@zklaus actually I was not after setting up any new directory infrastructure for this data request json files, the idea is just to add them in the existing control output sub directories so they form a set with the control output files for each experiment. |
@treerink fair enough, that should work! |
Well each experiment has its own data request, in some cases (the Core MIP cases) this is a joined data request because we want to be efficient in running the experiment only once for all the MIPs run by a certain model configuration (EC-Earth3-AOGCM, EC-Earth3-Veg etc.). But As a cmorizer you don't need anything with In fact it would be also useful to generate for each experiment a metadata template file and add those as well to the control output sub directories, several MIP, experiment depending variables could be set by |
By the way an example how to create in the current situation an xlsx data request file (as long the |
Please have a look at the newly created issue 615 on the EC-Earth dev portal. The problem are not only the varlists for the different MIPs that are run with the same model configuration, but it's also the activity_id that is given by the MIP. Most experiments belong uniquely to one MIP so this is not a problem, but what to do with the "historical" experiment? How do we make sure that the variables are saved correctly for each MIP? |
Ok, concerning joined Core MIP experiments, your point is that at the time of cmorising you actually don't want to provide a joined cmorised set of variables, but now you want to split out for each MIP the requested list of variables by this MIP experiment and then provide the correct |
Are you sure about that? Do you think the same variable is in the drq for say SIMIP and CMIP? It would be nice if only variables that are exclusively in SIMIP are processed when running ece2cmor with
That could be a reason to not produce json files but stick with the xls files that are produced by drq, or? |
I was also pondering these issues, but I have come to the conclusion that the
There could be a few more of these hints; I didn't find anything supporting the reading that the files should carry the mip that requested the variable. In the case that actually multiple mips are relevant, [1, Table 1, This seems to be applicable only in the case of jointly owned experiment, the complete list of these is:
|
@treerink wrt the
Is this correct? In other words, you don't mean that I don't have to do the |
Ok there are 2 discussions going on here:
|
@goord you are right that we kind of derailed the original discussion which was about the same variable being available from different ec-earth components. But wrt to your second point, I think the situation is clear enough: The |
And in that case the same reference details on page 17 for the Directory structure template:
|
Yes correct, you need to run |
Ok, in that case it seems to be a good idea to go with
So all in all makes the job of the cmorizer much easier. Are there any disadvantages that I am overlooking? |
Using one data request file including all is indeed a pragmatic option, it will cause a lot of error messages because you are asking to cmorise many variables which are not in your data set (and this will differ per experiment). So you loose a bit of control, i.e. if a variable which should have been produced is for whatever reason not in your data set this error message is hard to distinguish , at the other hand, yes it is quite a short cut. |
As described in #253 we aim for json data request files which are based on the xlsx file as created by |
Note that the preference file might contain a key as "omit". For instance the chemical tracers |
Hi @treerink Is there a way to cmorise zg500 assuming it comes from ifs, or it falls in this "desired" list we're creating in this issue? I think this is a good candidate for double-counting variables. Thanks! |
Yes aerosol realm is mainly from TM5 but there are exceptions. I agree also that this is one for the desired list. |
@aearamos So if TM5 is not active in the used model configuration (for instance for EC-EARTH3-AOGCM) you want |
Yes, I'd want zg500 daily from IFS. |
Hi @treerink Can you provide a varlist in the model that we can use to test? Or how can we generate the varlists now? Thanks |
Hi Arthur, there is a script drq2vars that does that |
I'm just checking that. I get Do I have to add some more flags? |
Hmm not sure that seems like a bug in the branch. You could use the data request Excel file with ece2cmor, but you have to use it with the --drq option |
Could this be because of the version of CMOR? I'm using CMOR/3.3.3 now. |
No I think drq2vars is broken in the branch |
Yes
And indeed the cmorisation itself is also broken in the master, we aim to have it fixed all next Wednesday. If working again, an example of calling it is:
|
So, regarding this variable (zg500) from table AERday, by using the new varlist files, I should then have the modeling_realm as "aerosol" and the variable will be cmorised as an ifs variable? ece2cmor will be able to cmorise it even though the realm doesn't match one of ifs expected realms? In this case I'd be only using ifs. |
Hi @aearamos we can add it to ifspar.json. After speaking to @tommibergman , it looks like we will let TM5 generate the model-level meteorological variables (u, v, t, zg, w) in the AER* tables and IFS the rest (such as zg500, which is on pressure levels). |
Yes zg500 will be cmorized, regardless of the realms, they have no role anymore in the new task loading strategy. |
Closing this issue. |
It may happen that variables can be produced by more than one component (especially in the case of tm5-ifs or lpjg-ifs). We should come up with a mechanism to give precedence to certain models for certain variables.
The text was updated successfully, but these errors were encountered: