Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create an 'EC-Earth CMIP6 data request' json for each MIP experiment #253

Closed
treerink opened this issue Sep 20, 2018 · 34 comments
Closed

Create an 'EC-Earth CMIP6 data request' json for each MIP experiment #253

treerink opened this issue Sep 20, 2018 · 34 comments
Assignees
Labels
release 1.0 release which is ready for starting CMIP6 runs

Comments

@treerink
Copy link
Collaborator

With 'EC-Earth CMIP6 data request' I mean the subset of CMIP6 requested variables for a certain MIP experiment which indeed can be produced by EC-Earth3.

If this 'EC-Earth CMIP6 data request' is written to a json file it can be easily used as the data request file at time of cmorization, it can be easily diffed and it can be copied in the namelist subdir of each MIP experiment and thus archived at the EC-Earth svn repository. The latter wouldn't be a good idea with the *.xlsx data request files.

@treerink
Copy link
Collaborator Author

I think it is the easiest to create this file with checkvars.py because there all model components are considered.

treerink added a commit that referenced this issue Oct 10, 2018
…s (but nemo only) in order to use this as a data request file by ece2cmor when cmorizing the result of test-all. It is kind of similar thing as asked for in #253 but then for this specific case.
@goord goord removed their assignment Nov 5, 2018
@zklaus
Copy link
Contributor

zklaus commented Jan 28, 2019

Hi @treerink, what is the situation here? At SMHI we are in the process of settling on on-the-fly generated xlsx files for the data request. Basically we want to use

drq -m _all_ -e piControl --xls 

where we change the experiment, of course, but keep -m _all_ for all runs.

That means we need one data request file per experiment, regardless of the involved mips, multiplied by the configurations.

I guess we should make a decision one way or the other (perhaps in the TWG?) and then document this so that everyone can approach this in the same way.
What do you think?

@treerink
Copy link
Collaborator Author

treerink commented Feb 5, 2019

@zklaus the original idea of producing a json variant of the data request which then only includes the variables which are requested for a certain experiment AND which can be produced by the used EC-Earth3 model configuration and archiving this in the control output sub directories for each experiment would be the most convenient. The difficulty here, which hindered us to quickly implement this, is again the "preference" issue (also referenced here as double counting issue).

The whole bench of original xlsx CMIP6 data request files are of course produced by genecec at the moment I produce the control output files, so those I have and in principle I could these share easily but xlsx files are not nice to archive under svn because they won't give a svn diff (they are difficult to diff anyway, though possible to certain extent) and their size. The latter would not be nice because there are quite a lot of experiments.

@aearamos
Copy link

aearamos commented Feb 7, 2019

I've been thinking about this issue and we also discussed the xlsx files here at BSC. It would be nice to have the xlsx tables and/or the .json files that should be used by ece2cmor3 to cmorize each one of the MIPs in the ctrl folder. We could use this file as a reference for that MIP, assuming it was generated by the Data Request and has the correct variables. Right now our idea was to have the ppt/xml files in runtime/ctrl and the tables somewhere else, but I'm not sure this is the best approach. If we had a reliable table inside each folder, for DCPP, piControl, OMIP, etc., we can just point ece2cmor.py to that file.

@treerink
Copy link
Collaborator Author

treerink commented Feb 7, 2019

See also the discussion in #224. The solution of this issue to provide json data request files depends on a solution for the double-counting variables with a preference file.

@treerink treerink added the release 1.0 release which is ready for starting CMIP6 runs label Feb 7, 2019
@treerink
Copy link
Collaborator Author

We just discussed the general design if and how we will create the json data request file and where it will be archived.

We noted that for a joint data request like for the Core MIP experiments run by the AOGCM version (the joined request of these 10 MIPs) the activity_id is CMIP and that this means we can jointly upload this joined CMIP data for each EC-Earth model configuration. The same applies for only data requesting MIPs like CORDEX if they request data within e.g. ScenarioMIP, then the activity_id is ScenarioMIP. In a third case, in which experiments are shared across MIPs, I understand the MIPs can be listed in a certain order in the activity_id, seperated by a single space.

There will be created an additional script (which will be called for each experiment by genecec) which reads the general (joined) .xlsx data request file (as created by drq during running genecec) and uses the taskloader to omit the variable - table combination which are in the ignored list for EC-Earth3 and the tasks will be matched against a preference file in order to account for the double counting variables #224. This new script will thus need two arguments: 1. The .xlsx data request file 2. The EC-Earth3 model configuration (e.g. EC-Earth3-AOGCM). The name of generated json data request file will be labeled by the Earth3 model configuration, and in a few cases where a MIP is run by more than one Earth3 model configuration, there will be more than one json data request file in the control output directory. Note however that for the Core MIP there is already a separation per Earth3 model configuration, so only one json data request file will end up in these directories.

The control output files themselves won't be made preference (i.e. Earth3 model configuration) specific, in order to keep the design clear, on costs of a very limited tiny bit of additional (useless) output.

@zklaus
Copy link
Contributor

zklaus commented Feb 15, 2019

We noted that for a joint data request like for the Core MIP experiments run by the AOGCM version (the joined request of these 10 MIPs) the activity_id is CMIP and that this means we can jointly upload this joined CMIP data for each EC-Earth model configuration. The same applies for only data requesting MIPs like CORDEX if they request data within e.g. ScenarioMIP, then the activity_id is ScenarioMIP. In a third case, in which experiments are shared across MIPs, I understand the MIPs can be listed in a certain order in the activity_id, seperated by a single space.

This sounds good. Indeed, the activity_id only depends on the experiment_id.

There will be created an additional script (which will be called for each experiment by genecec) which reads the general (joined) .xlsx data request file (as created by drq during running genecec) and uses the taskloader to omit the variable - table combination which are in the ignored list for EC-Earth3 and the tasks will be matched against a preference file in order to account for the double counting variables #224.

Sounds good.

This new script will thus need two arguments: 1. The .xlsx data request file 2. The EC-Earth3 model configuration (e.g. EC-Earth3-AOGCM).

Wrt the configurations, note that this is CMIP6 controlled vocabulary as source_id. Hence we should stick to the exact spelling of the official list which is

  • EC-Earth3
  • EC-Earth3-AerChem
  • EC-Earth3-CC
  • EC-Earth3-GrIS
  • EC-Earth3-HR
  • EC-Earth3-LR
  • EC-Earth3-Veg
  • EC-Earth3-Veg-LR

Note the capitalization, the presence of the 3, the absence of an explicit -AOGCM version (which is the version without a suffix) and the spelling of GrIS.

The name of generated json data request file will be labeled by the Earth3 model configuration, and in a few cases where a MIP is run by more than one Earth3 model configuration, there will be more than one json data request file in the control output directory. Note however that for the Core MIP there is already a separation per Earth3 model configuration, so only one json data request file will end up in these directories. The control output files themselves won't be made preference (i.e. Earth3 model configuration) specific, in order to keep the design clear, on costs of a very limited tiny bit of additional (useless) output.

I'm not sure I understand how treating the CMIP experiments differently from the others simplifies things, but I guess you are in the better position to judge that.

@treerink
Copy link
Collaborator Author

treerink commented Feb 20, 2019

Subtasks for this issue

  • Add prefs file to resources
  • Add script drq2varlist to repo
  • Adapt genecec to generate varlists
  • Integrate prefs file in drq2varlist
  • Adapt ece2cmor script(s) to use component-wise varlists

@treerink
Copy link
Collaborator Author

When running:

./drq2varlist.py --drq cmip6-data-request/cmip6-data-request-m\=CMIP.DCPP.LS3MIP.PAMIP.RFMIP.ScenarioMIP.VolMIP.CORDEX.DynVar.SIMIP.VIACSAB-e\=piControl-t\=1-p\=1/cmvme_cm.co.dc.dy.ls.pa.rf.sc.si.vi.vo_piControl_1_1.xlsx --ececonf nemo,ifs

I get the following additions when changing from e46cc12 to the latest version 3ae9a71:

<             "zg500",

<         ],
<         "AERmon": [
<             "ua"
<         ],
<         "AERmonZ": [
<             "ta"

@treerink
Copy link
Collaborator Author

Hi Gijs,

I get also quite some differences in the output of genecec, i.e. differences in the output control files and the volume estimates when running genecec in the master (there still same as my previous run benchmark) and in the latest version 3ae9a71 in the task-load-prefs branch. I guess this is due to dd45426?

@goord
Copy link
Collaborator

goord commented Feb 20, 2019

Hi @treerink yes I changed the task loader, so this is expected to impact the genecec script. I do expect that it generates more 'double counted' variables, because the realm check was there to prevent such variables. I inserted a new warning whenever a duplicate variable is encountered:

Multiple models found for variable %s, table %s...choosing first but preference needed

so searching for this message may pinpoint to where the script is behaving differently...

treerink added a commit that referenced this issue Mar 14, 2019
treerink added a commit that referenced this issue Mar 15, 2019
treerink added a commit that referenced this issue Mar 21, 2019
treerink added a commit that referenced this issue Mar 22, 2019
@ufladrich
Copy link

Hi @treerink ,
I'm afraid I'm still confused about the usage of drq2varlist. I have applied it to the xls data request that I was using to cmorise before and then I used --vars instead of --drq when running ece2cmor. However, I get a number of errors like

ERROR:ece2cmor3.taskloader: Found duplicate target mrsos in table 3hr for models lpjg and ifs

and then

CRITICAL:ece2cmor3.taskloader: Duplicate requested variables were found, dismissing all cmorization tasks

No output is produced. (As a side not, the IFS job still goes on doing all the time-consuming grib filtering.)
When I manually remove all the duplicated targets and duplicated output names from the varlist json file, I get at least a non-empty task list.
What am I doing wrong/missunderstanding?

@goord
Copy link
Collaborator

goord commented Mar 22, 2019

Hi Uwe you aren't doing anything wrong, this is a signal that our "preference" script is incomplete, since it doesn't make a choice between ifs or lpjg for e.g. mrsos.

I will make the preferences complete and add a check for ifs variables before entering the grib filtering

@goord
Copy link
Collaborator

goord commented Mar 22, 2019

Hi @ufladrich or @tommibergman can you post the list of duplicate variables that were reported?

@tommibergman
Copy link
Collaborator

I got these:

mrsos
mrro
mrsol
mrso
mrros
evspsblsoi
mrsos

Some of them are doubly mentioned through different tables, but maybe that doesn't matter.

@goord
Copy link
Collaborator

goord commented Mar 22, 2019

Ok I committed a fix in which the above variables will be removed from the lpjguess variable list.

treerink added a commit that referenced this issue Mar 22, 2019
…previous varlist.json), and the adding the production of the new varlist.json by adding a call to drq2varlist #253.
treerink added a commit that referenced this issue Mar 22, 2019
…alled now drqlist.json while the ones created by drq2varlist will get the name varlist.json #253.
treerink added a commit that referenced this issue Mar 22, 2019
@ufladrich
Copy link

I have yet to understand what "preference" means in the context of this issue. @goord when you say above that the "preference script is incomplete", do you mean drq2varlist? And if that is the case, does it mean that the preference logic is build into drq2varlist? What I mean is, how does drq2varlist know that the above variables should be taken from IFS, not LPJG?

@ufladrich
Copy link

There are two more duplicated targets:

ERROR:ece2cmor3.taskloader: Found duplicate target tsl in table Lmon for models lpjg and ifs
ERROR:ece2cmor3.taskloader: Found duplicate target tsl in table 6hrPlevPt for models lpjg and ifs 

and some duplicated output names:

ERROR:ece2cmor3.taskloader: Found duplicate output name for targets ua, ua7h in table 6hrPlevPt for model ifs
ERROR:ece2cmor3.taskloader: Found duplicate output name for targets va, va7h in table 6hrPlevPt for model ifs
ERROR:ece2cmor3.taskloader: Found duplicate output name for targets ta, ta7h in table 6hrPlevPt for model ifs
ERROR:ece2cmor3.taskloader: Found duplicate output name for targets zg7h, zg27 in table 6hrPlevPt for model ifs

I'm not sure what to think about the latter, according to the CMIP6-CMOR tables the duplication is okay.

@treerink
Copy link
Collaborator Author

It means that the resources/prefs.py is not yet covering all duplicate variables. The infrastructure is there but we have still to make sure all duplicate variables are covered in the prefs.py file, and there we usually need the feedback of the scientists.

@goord
Copy link
Collaborator

goord commented Mar 22, 2019

Hi @ufladrich the preference script is here. It is just a python function that determines which variables to keep for which configurations and which to dismiss.

Yes the preference logic is called from drq2varlist. This script gathers all variables that any EC-Earth component could produce, and then runs all of them through the preference function that determines whether to keep it or not. This procedure is supposed to yield a unique set of variables for all data requests and all EC-Earth configurations.

Whenever you call ece2cmor with the --drq option, it does a drq2varlist first and then a cmorization with the component-wise variable set. It performs a check on the latter to ensure there are no duplicates, because that may give rise to files being overwritten.

BTW whenever calling ece2cmor with --drq option or drq2varlist, it is best to give also a target EC-Earth configuration (use --help to get a list of those), because that can be used to determine the preference and hence reduces the chance of ending up with duplicates.

@goord
Copy link
Collaborator

goord commented Mar 22, 2019

The duplication of ua, ua7h etc. is a problem because it will cause overwritten variables since the output file names for these variables are identical (see issue #334 ). I believe they have different priorities, and we should decide which ones to keep.

@treerink
Copy link
Collaborator Author

@goord the changes in e0e8dc5 so the extension of the prefs.py does change the json data request files, for a part as expected, but I am also partly surprised by rather long lists of changes.

@goord
Copy link
Collaborator

goord commented Mar 22, 2019

Hi @treerink the biggest change is the removal of variables for components that are not in the ec-earth configuration. I figured that e,g, AOGCM experiments should not be bothered with duplicates from e.g. land-surface or tm5 right? This will give a lot of removed variables I guess, I would expect entire blocks of component variables to be removed for certain configurations.

@treerink
Copy link
Collaborator Author

Hi @goord,

Ok, that seems indeed the case. I just show one example below, can you check this diff _latest_ _previous_ and agree?

71a72
>             "evspsblsoi",
116c117,199
<     "lpjg": {},
---
>     "lpjg": {
>         "Amon": [
>             "fco2antt",
>             "fco2nat"
>         ],
>         "Emon": [
>             "cSoil",
>             "mrsol",
>             "treeFracNdlDcd",
>             "treeFracBdlEvg",
>             "treeFracBdlDcd",
>             "grassFracC3",
>             "grassFracC4",
>             "pastureFracC3",
>             "pastureFracC4",
>             "nep",
>             "fLuc",
>             "cWood",
>             "nwdFracLut",
>             "fracLut",
>             "vegFrac",
>             "treeFracNdlEvg",
>             "cropFracC3",
>             "cropFracC4"
>         ],
>         "Eyr": [
>             "treeFrac",
>             "grassFrac",
>             "shrubFrac",
>             "cropFrac",
>             "vegFrac",
>             "baresoilFrac",
>             "fracOutLut",
>             "fracInLut",
>             "fracLut"
>         ],
>         "Lmon": [
>             "mrsos",
>             "mrso",
>             "mrros",
>             "mrro",
>             "prveg",
>             "evspsblveg",
>             "evspsblsoi",
>             "tran",
>             "tsl",
>             "treeFrac",
>             "grassFrac",
>             "shrubFrac",
>             "cropFrac",
>             "pastureFrac",
>             "baresoilFrac",
>             "residualFrac",
>             "cVeg",
>             "cLitter",
>             "cProduct",
>             "lai",
>             "gpp",
>             "ra",
>             "npp",
>             "rh",
>             "fFire",
>             "fGrazing",
>             "fHarvest",
>             "nbp",
>             "fVegLitter",
>             "fLitterSoil",
>             "cLeaf",
>             "cRoot",
>             "cCwd",
>             "cLitterAbove",
>             "cLitterBelow",
>             "cSoilFast",
>             "cSoilMedium",
>             "cSoilSlow",
>             "landCoverFrac",
>             "rGrowth",
>             "rMaint"
>         ],
>         "day": [
>             "mrso"
>         ]
>     },
282c365,378
<     "tm5": {}
---
>     "tm5": {
>         "AERmon": [
>             "abs550aer",
>             "od550aer"
>         ],
>         "Amon": [
>             "o3",
>             "o3Clim",
>             "ch4",
>             "ch4Clim",
>             "ch4global",
>             "ch4globalClim"
>         ]
>     }

@goord
Copy link
Collaborator

goord commented Mar 22, 2019

So this is for the AOGCM configuration I assume? Yes evspsblsoi was removed from the ifs parameters (Andrea pointed out it cannot be produced by ifs) and the other ones are not in the AOGCM configuration, so I expect them to be gone.

@goord
Copy link
Collaborator

goord commented Mar 22, 2019

So @ufladrich and @tommibergman if you run drq2varlist or ece2cmor with the --drq option and you don't want to be bothered with duplicates from other submodels than your targeted EC-Earth configuration, you have to provide your configuration, e.g.

ece2varlist --drq <something.xlsx> --ececonf EC-EARTH-AOGCM

to remove all variables not in ifs or nemo.

@goord
Copy link
Collaborator

goord commented Mar 22, 2019

@treerink I removed tsl from lpjguess in the prefs.py and fixed a bug concerning EC-EARTH-CC so you may want to regenerate the json files...

@ufladrich
Copy link

I had --ececonf EC-EARTH-Veg in my earlier tests.

@treerink
Copy link
Collaborator Author

treerink commented Mar 25, 2019

Done, the current latest version of the control output files in the r6705-control-output-files branch do contain these changes.

@treerink
Copy link
Collaborator Author

I think we can (nearly) close this issue.

The only sub issue I am not sure whether it is solved by now is this one about "duplication of ua, ua7h etc. which is a problem because variables will be overwritten".

@treerink
Copy link
Collaborator Author

A separate issue is created in #422 for the last sub issue mentioned above.

Closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release 1.0 release which is ready for starting CMIP6 runs
Projects
None yet
Development

No branches or pull requests

6 participants