Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weโ€™ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python cmorizers for CDR1 and CDR2 ESACCI H2O (TCWV=prw) data. #2152

Merged
merged 10 commits into from Jun 24, 2021

Conversation

katjaweigel
Copy link
Contributor

@katjaweigel katjaweigel commented May 4, 2021

Description

Cmorizer for the new ESACCI H2O (TCWV=prw) CDR-1 and CDR-2 data sets.
It has been tested for 0.5 and 0.05deg data.
So far, it only works without ancillary data, they will be added later.


Before you get started

Checklist

It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the ๐Ÿ›  Technical or ๐Ÿงช Scientific review.

New or updated recipe/diagnostic

New or updated data reformatting script


To help with the number of pull requests:

@katjaweigel
Copy link
Contributor Author

Documentation update is not ready, yet. I'm not sure how to include it into the documentation, because the data is not public now but should be come public at some point.

@katjaweigel
Copy link
Contributor Author

@remi-kazeroni I don't really know why github added you here automatically, either because it is a cmorizer or I left your name in somewhere in the code :)

@remi-kazeroni
Copy link
Contributor

@remi-kazeroni I don't really know why github added you here automatically, either because it is a cmorizer or I left your name in somewhere in the code :)

This is because I'm codeowner for everything in esmvaltool/cmorizers ๐Ÿ˜Ž We did that so I can have a look at all cmorizer PR (not necesseraly do the review), merge it at the end and add the new data to our data pool on Mistral.

Copy link
Contributor

@axel-lauer axel-lauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tested the new CMORizers and have a few comments:

  • Both CMORizers complain about "The reference file .../esmvaltool/references/.bibtex does not exist.". I guess this is because "reference" in the config files is set to ""?
  • I would strongly prefer that the dataset name used is "ESACCI-WATERVAPOUR" or maybe "ESACCI-WV". I find "ESACCI-TCWV" and "ESACCI-TCWV-CDR1" confusing and not consistent with the naming used for the other ESACCI datasets.
  • For the same reasons, I would prefer a single CMORizer for ESACCI-WATERVAPOUR instead of separate scripts for each of the 2 (eventually 4) CDRs.
  • I only tested with monthly data. For those, the output time_bounds seem to be wrong, e.g. giving 15-Jan-2014 00:00:00 --> 16-Jan-2014 00:00:00 instead of 1-Jan-2014 00:00:00 --> 1-Feb-2014 00:00:00. This needs to be fixed.
  • The documentation on how to download and prepare the data (e.g. putting all files into the same folder, i.e. no yearly folders) is still missing.

@katjaweigel
Copy link
Contributor Author

katjaweigel commented Jun 7, 2021

Time bounds and naming should be fixed now. I also added some documentation on how to download and prepare the data. I'm not sure about the reference, I put in the Water Vapour Climate Change Initiative (WV_cci) - CCI+ Phase 1 Climate Research Data Package (CRDP) document (https://climate.esa.int/en/projects/water-vapour/key-documents/), but it has no DOI, only an ESA specific identification number?
The cmorizer is only checked for monthly data so far, daily would probably not work.

@katjaweigel
Copy link
Contributor Author

Further changes would be necessary for daily data. Should I do them first or should we merge the version for monthly data first? Public data are not there, yet and expected for August or September, but maybe there are other reasons to merge it soon?

Copy link
Contributor

@axel-lauer axel-lauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for applying all the changes. The output fields look good but the time bounds still seem to be a bit odd. For example, the time bounds of the first time step in the year 2003 file looks like:

30-Dec-2002 12:00:00 -> 30-Jan-2003 12:00:00
but should probably rather be something like
1-Jan-2003 00:00:00 -> 31-Jan-2003 00:00:00

Could you please take another look at the time bounds?

@katjaweigel
Copy link
Contributor Author

Thanks for applying all the changes. The output fields look good but the time bounds still seem to be a bit odd. For example, the time bounds of the first time step in the year 2003 file looks like:

30-Dec-2002 12:00:00 -> 30-Jan-2003 12:00:00
but should probably rather be something like
1-Jan-2003 00:00:00 -> 31-Jan-2003 00:00:00

Could you please take another look at the time bounds?

@axel-lauer Thanks for spotting this:
The issue is, that def _fix_bounds(cube, dim_coord) from ESMValTool/esmvaltool/cmorizers/obs/utilities.py uses the iris method "cube.coord(dim_coord).guess_bounds()" which cannot handle irregularly gridded data well (month with 28-31 days).
There is a better way to set the time bounds in the ESMValCore:
def _get_time_bounds(time, freq) in ESMValCore/esmvalcore/cmor/check.py which would probably not cause that issue.

Therefore I'm wondering:

  1. Can I use a Core def in the tool or should I copy it?
  2. The def _fix_bounds(cube, dim_coord) in ESMValTool/esmvaltool/cmorizers/obs/utilities.py which I was using is also used for other cmorizers and possible also for time. Should I change the fix of the time bounds there or only for this specific cmorizer?
  3. Or should I fix it for this cmorizer now and start an issue about ESMValTool/esmvaltool/cmorizers/obs/utilities.py/ try to find out which cmorizers could be using this to fix time bounds?

@katjaweigel
Copy link
Contributor Author

I did now fix the bound for this cmorizer, using _get_time_bounds from esmvalcore.cmor.check _get_time_bounds (without changing esmvaltool/cmorizers/obs/utilities.py, which is proabably used by other cmorizers). There are many python cmorizers using fix_bounds, should I open an issue about it?

I don't know what the commit check wants to tell me?

Copy link
Contributor

@axel-lauer axel-lauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output including the time bounds looks good now. Not sure about the import error reported by CircleCI. @ESMValGroup/tech-reviewers any opinion on this?

@Peter9192
Copy link
Contributor

The output including the time bounds looks good now. Not sure about the import error reported by CircleCI.

maybe merge main into this branch?

Copy link
Contributor

@valeriupredoi valeriupredoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good technically, cheers @katjaweigel -> @remi-kazeroni pls test when you gots time, mate ๐Ÿบ

raw_info = {'name': vals['raw'], 'file': vals['file']}
inpfile = os.path.join(in_dir, cfg['filename'])
logger.info("CMORizing var %s from file type %s", var, inpfile)
# years = range(vals['start_year'], vals['end_year'] + 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okease remove commented out code if not needed anymore

@valeriupredoi
Copy link
Contributor

also please make sure all boxes are ticked and the ones that are irrelevant pls remove or strike them like this otherwise @zklaus will be knocking on your door ๐Ÿ˜

@katjaweigel
Copy link
Contributor Author

also please make sure all boxes are ticked and the ones that are irrelevant pls remove or strike them like this otherwise @zklaus will be knocking on your door grin

I wasn't sure if I have to tick them, if I think they are ok, or if the reviewers have to do that after there check?

@valeriupredoi
Copy link
Contributor

also please make sure all boxes are ticked and the ones that are irrelevant pls remove or strike them like this otherwise @zklaus will be knocking on your door grin

I wasn't sure if I have to tick them, if I think they are ok, or if the reviewers have to do that after there check?

afraid it is your responsability to tick/delete boxes, but if you are unsure you can always ask the technical reviewer ๐Ÿ‘

@katjaweigel
Copy link
Contributor Author

@valeriupredoi ok, thanks! I did that for PRs before the "Technical review" and "Scientific review" symbols were added, they seemed to indicate for me, that it should be done by the reviewers.

Copy link
Contributor

@remi-kazeroni remi-kazeroni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @katjaweigel! The cmorizer runs fine for me with the monthly data available and the recipe_check_obs.yml as well. Please let me know when the data can be put into the Mistral and Jasmin Obs directories.

@katjaweigel
Copy link
Contributor Author

@remi-kazeroni the public release of the data is planned for September, therefore I guess it is better not to put them into the Obs folders, yet?

@axel-lauer
Copy link
Contributor

For this reason, we put the data into tier 3 until they become publicly available. Then we would move the data to tier 2 just like all other ESACCI datasets. So in short, I think it is fine to put the data into the OBS folder.

@remi-kazeroni
Copy link
Contributor

For this reason, we put the data into tier 3 until they become publicly available. Then we would move the data to tier 2 just like all other ESACCI datasets. So in short, I think it is fine to put the data into the OBS folder.

Fine, so the data are now in Tier3 OBS folder. We will need to update the cmorizer once the data become publicly available.

@remi-kazeroni remi-kazeroni merged commit 3ef8143 into main Jun 24, 2021
@remi-kazeroni remi-kazeroni deleted the CMUG_ESACCI_H2O_cmorizer branch June 24, 2021 09:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add cmorizers for the upcoming ESACCI H2O data set.
6 participants