Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cmorizer for latest ESACCI-SST data #1895

Merged
merged 30 commits into from
Dec 14, 2020
Merged

Cmorizer for latest ESACCI-SST data #1895

merged 30 commits into from
Dec 14, 2020

Conversation

valeriupredoi
Copy link
Contributor

@valeriupredoi valeriupredoi commented Oct 30, 2020

Hold yer horses 馃惔 - this is WIP 馃毀

OK I am ready to ask around now: this is work towards #1885 that I am doing together with @cjroberts at URead but I need help from a few people: @cjroberts @axel-lauer @mattiarighi @hb326 because I don't know the science and nomenclatures of ESACCI-SST shtuffs, here is a list of things I would be very happy if you could help me with, tasked by names of specialists 馃嵑

  • Charles: this script cmorizes the data that you guys have produced, it would be helpful if you ran it yourself and examined the output: currently the cmorization is done for SST and SST Error, outputting two CMOR variables: ts and tsStderr (we can run it together on a Zoom call if you prefer!)
  • Axel/Mattia/Birgit: you guys are specialists wrt sciency stuff: I see that to get ts from the older SST raw data with the older NCL cmorizer a bunch of calculations had to be done - do we still need those? Currently I am just taking the SST and SST error data and not transforming it in any way, just correcting to CMOR standards for metadata;
  • Charles: can you please read the docstring header of the cmorizer and its configuration file and be sure all is correct there (note some bogus dates that we'll replace when all is ready and some insert here placeholders)

Cheers muchly guys 馃嵑

Standard stuff, to be looked at later

Before you start, please read our contribution guidelines.

Please discuss your idea with the development team before getting started, to avoid disappointment later. The way to do this is to open a new issue on GitHub. If you are planning to modify an existing functionality, please discuss it with the original author(s) by tagging them in the issue.


Tasks

  • Create an issue to discuss what you are going to do, if you haven't done so already (and add the link at the bottom)
  • Give this pull request a descriptive title that can be used as a one line summary in a changelog
  • Make sure your code is composed of functions of no more than 50 lines and uses meaningful names for variables
  • Circle/CI tests pass. Status can be seen below your pull request. If the tests are failing, click the link to find out why.
  • Preferably Codacy code quality checks pass, however a few remaining hard to solve Codacy issues are still acceptable. Status can be seen below your pull request. If there is an error, click the link to find out why. If you suspect Codacy may be wrong, please ask by commenting.
  • Please use yamllint to check that your YAML files do not contain mistakes
  • (Only if really necessary) Add any additional dependencies needed for the diagnostic script to setup.py, esmvaltool/install/R/r_requirements.txt or esmvaltool/install/Julia/Project.toml (depending on the language of your script) and also to package/meta.yaml for conda dependencies (includes Python and others, but not R/Julia)
  • If new dependencies are introduced, check that the license is compatible with Apache2.0

New recipe/diagnostic

  • Add documentation for the recipe to the doc/sphinx/source/recipes folder and add a new entry to index.rst
  • Add provenance information

Modified recipe/diagnostic

  • Update documentation for the recipe to the doc/sphinx/source/recipes folder
  • Update provenance information if needed
  • Assign the author(s) of the affected recipe(s) as reviewer(s)

New data reformatting script

  • Test the CMORized data using recipes/example/recipe_check_obs.yml, to make sure the CMOR checks pass without errors
  • Add the new dataset to the table in the documentation
  • Tag @mattiarighi in this pull request, so that the new dataset can be added to the OBS data pool at DKRZ and synchronized with CEDA-Jasmin

Modified data reformatting script

  • Test the CMORized data using recipes/example/recipe_check_obs.yml, to make sure the CMOR checks pass without errors
  • Tag @mattiarighi in this pull request, so that the updated dataset can be added to the OBS data pool at DKRZ and synchronized with CEDA-Jasmin

If you need help with any of the tasks above, please do not hesitate to ask by commenting in the issue or pull request.


Closes #issue_number

@valeriupredoi valeriupredoi changed the title Esacci sst cmorizer Cmorizer for latest ESACCI-SST data Nov 3, 2020
@cjroberts
Copy link

Hi V,

Sorry, just seen this yesterday as it got lost in my other email.

I notice the filename in the YAML file is for the regridded files at monthly resolution. The raw data is at daily resolution and has a longer filename. Also, other time resolutions are possible for the regridding. Not sure if this is a problem or not, but just thought I would point it out.

I will check with my colleagues what the version of the data should be, but I think it is 2.1 for the CCI files and 2.0 for the C3S ones from 2017 onwards.

Thank you very much for your help with this.

@cjroberts
Copy link

Sorry, I see the filename I was looking at in the YAML file is the output file.

The versions for the data are definitely 2.1 for the CCI files 2.0 for the C3S files, but now I am not sure what the fv3.1 means in the YAML file.

I will try downloading and running the script.

@valeriupredoi
Copy link
Contributor Author

OK cool, we'll have to go through the metadata too - about the monthly data question - I downloaded monthly means, I don't think we could produce daily data since that will be huge and at least for storing purposes we won't be able to store it on Jasmin or DKRZ, but could write code in the cmorizer to produce it? @cjroberts we should plan another call so we can go over the code and run it as well? 馃嵑

@cjroberts
Copy link

One of the options available on http://surftemp.net/regridding/index.html when regridding is to use input data with a post-hoc correction for biases associated with desert dust applied. Firstly, would the people who are experts in the observations like this correction applied? Secondly, this input data currently has a problem with its metadata that is reflected in the output of the regridding. That is that there are different, but identical, latitude and longitude coordinates (lat, lat_1, lat_2, ...) for each variable in the file. This does not seem to affect the CMORization, however.

V, regarding CMORizing the output of the regridder on the fly. There are 4 possible predefined time resolutions for the output plus an N-daily option, which regrids every N days within a year. Would it be feasible to write CMORizers within ESMValTool that covered all cases or I am wondering if it would be better to write my own flexible CMORizer using cf-python (which is what is used by the regridding tool)?

@cjroberts
Copy link

Thanks to @valeriupredoi I have run the CMORizer on my own machine. I have spoken with my supervisor, Chris Merchant, regarding the output.

We think the desert dust correction should be included. We have a fix in progress for the issue that we found. We could either bump up the version or we need some way of providing an acknowledgement of this in the metadata. The source and reference also seem to be wrong in the output. Could you please point me to where this can be changed in the code?

Another issue is that the uncertainty in the input files is an uncertainty not a standard error. Is it possible to change the name of the variable etc. or is a standard error the closest you have?

We do not think that having an on the fly converter for all possible resolutions is necessary until we get feedback asking for this.

Would it be possible to run ESMValTool to do an example comparison against a model? Chris has a workshop coming up where it would be good to demonstrate the product.

We would also like there to be a news item once the work is done if possible.

@cjroberts
Copy link

Sorry, one final point I forgot is to ask whether you are expecting an anomaly or an absolute value of SST?

@valeriupredoi
Copy link
Contributor Author

@cjroberts good stuff, dude! I will answer the technical stuffs:

  • for an obs-vs model comparison recipe I recommend the validation recipewhere you can input one model as control model and the ESACCI-SST as experiment model (give it no OBS's since you are already using the OBS as experiment model); I can help you edit it so you have it run;
  • for references, the reference file will have to change; you can do it yourself here in this PR, same as with the other changes you need to make to the cmor_config/ESACCI-SST.yml file that we've already started changing;

For the science and data characteritics questions I am afraid I don't know the answers - I am merely an engineer and don't use this data - @axel-lauer @mattiarighi and @hb326 are people in the know, can you guys have a look through pls and advise Charles? 馃嵑

@valeriupredoi
Copy link
Contributor Author

forgot this one:

Another issue is that the uncertainty in the input files is an uncertainty not a standard error. Is it possible to change the name of the variable etc. or is a standard error the closest you have?

we had tsStderr produced so far - this is a CMOR standard variable of the surface temp standard error; since esmvaltool works with CMOR standard variables only, we need to either find a CMOR standard variable in the table (full link to see the whole source): https://github.com/ESMValGroup/ESMValCore/blob/master/esmvalcore/cmor/tables/cmip6/Tables/CMIP6_Amon.json
or we can create a derived custom variable a la these here that use CMOR standard variables as input - from atechnical point of view both are functional options, but from a scientific point of view I dunno - referring back to the guys who know what is needed in terms of the data 猬嗭笍

@axel-lauer
Copy link
Contributor

@cjroberts thanks for your great work. Just a quick answer to some of your questions.

  1. We are aware that the uncertainty information is not a standard error, but this variable is the closest we have at the moment. Using such uncertainty information with the ESMValTool is still at a quite early stage meaning only very few diagnostics can use it and it is currently the diagnostic author's responsibility to use it properly as uncertainty information is highly dependent on the dataset. I would therefore be tempted to leave it tsStderr or tosStderr for the moment before a more general framework to use uncertainty information has been setup up. It might be good to include a comment (e.g. a global attribute) in the cmorized file on this, though.
  2. When reading SST (CMOR definition), the tool is expecting absolute SSTs.

@valeriupredoi
Copy link
Contributor Author

cheers much @axel-lauer 馃嵑 In the words of a lot of people over the Pond - we have us some science 馃榿

@axel-lauer
Copy link
Contributor

Thanks for adding the download instructions. I tested this and it works nicely! I would recommend to explicitely state the download settings instead of referring to "default settings" as these might change, i.e.:

Time Resolution: monthly
Longitude Resolution: 0.5
Latitude Resolution: 0.5
Start Date: 1982-01-01
End Date: 2019-12-31
Exclude data above sea ice threshold: True (Threshold: 100 %)
Include post-hoc SST bias adjustments: True
Output Absolute or Anomaly SST: absolute
Generate Sea Ice Fraction: True
Error Correlation in Time (Days): 7
Error Correlation In Space (Degrees): 3.0

Besides this, I think there is only one thing left to decide on before merging: what to do about existing recipes using the ESACCI-SST dataset. I could think of two options:

  1. We update these recipes to use the new ESACCI-SST dataset produced by this CMORizer.
  2. We do not update the existing recipes. That would imply that we have to keep the old ESACCI-SST CMORizer as well in order to be able to also produce the old version of the dataset used by these recipes.

I think I would have a slight preference for option 1 but I wanted to put this up for discussion before merging.

@cjroberts
Copy link

Thanks @axel-lauer I will update the download instructions. I will also add a comment about the uncertainty and check the version number is correct. Please could you tell me if there is a way to add two references for the dataset? There is an additional one for the desert dust bias correction.

@valeriupredoi
Copy link
Contributor Author

valeriupredoi commented Dec 9, 2020

@axel-lauer cheers for testing and reviewing mate! about the change of recipes - I would go with let's do it if this data is more up-to-date compared to the previous one (and I think it is @cjroberts ?) - and remove the older ncl cmorizer in the process. Could there be a case that the recipe developer intended to use only the older version of the data? I had a look at esmvaltool/recipes/examples/recipe_check_obs.yml and it looks to me that the ESACCI-SST entry is identical to what we have no so in that case I don't think we need to change anything in recipes, the only two that use ESACCI-SST (apart from the utility checker one) are:

  • esmvaltool/recipes/recipe_perfmetrics_CMIP5_4cds.yml
  • esmvaltool/recipes/recipe_perfmetrics_CMIP5.yml

and they look fine to me 馃憤

@axel-lauer
Copy link
Contributor

OK, I'd say then let's go with option 1. To do so, I think we then need to change
version: L4-GHRSST-SSTdepth-OSTIA-GLOB to version: 2.1 in the following recipes:

  • recipe_perfmetrics_CMIP5_4cds.yml
  • recipe_perfmetrics_CMIP5.yml
  • examples/recipe_check_obs.yml

In addition, we should probably delete the file cmorize_obs_esacci_sst.ncl.obsolete to avoid any confusion.

@axel-lauer
Copy link
Contributor

Please could you tell me if there is a way to add two references for the dataset? There is an additional one for the desert dust bias correction.

@cjroberts I am not sure how this works but I think simply adding another reference to esacci-sst.bibtex will not work. I also guess that specifying a list for "reference" in cmor_config/ESACCI-SST.yml does not work, so some changes might be required to extract_doi_value in utilities.py. @SarahAlidoost could you please advice or maybe extend extract_doi_value to support multiple entries in a bibtex file?

@SarahAlidoost
Copy link
Contributor

Please could you tell me if there is a way to add two references for the dataset? There is an additional one for the desert dust bias correction.

@cjroberts I am not sure how this works but I think simply adding another reference to esacci-sst.bibtex will not work. I also guess that specifying a list for "reference" in cmor_config/ESACCI-SST.yml does not work, so some changes might be required to extract_doi_value in utilities.py. @SarahAlidoost could you please advice or maybe extend extract_doi_value to support multiple entries in a bibtex file?

@axel-lauer I think that multiple references for a single dataset has not been supported. I am not sure if there is a technical reason for that. So, I made an issue regarding your suggestion on extending extract_doi_value function, please see #1951.

@valeriupredoi
Copy link
Contributor Author

valeriupredoi commented Dec 9, 2020

@cjroberts I have added you as author/developer (hope that's fine with you, no worries, no major responsibilities, just to keep an eye on this cmorizer in the future) - could pls add your orcid in CITATION.cff and esmvaltool/config-references.yml - I am sure you have an orcid no 馃嵑

@axel-lauer
Copy link
Contributor

I just merged PR #1951, so multiple references can now be specified in esmvaltool/cmorizers/obs/cmor_config/ESACCI-SST.yml, e.g.
reference = ["esacci-sst", "other-to-be-defined-ref"]
Please let me know once the references are complete so we can merge this PR.

@cjroberts
Copy link

@axel-lauer I have added the second reference. I haven't tested the code again as I presume having two references wouldn't work in this branch. I have also added a comment regarding the uncertainty not being a standard error and updated the version number to 2.2. There is no official version number for the data including the bias correction. At some point, however, there will be a version 3 of the data, which is the one we will be keen to promote.

Copy link
Contributor

@axel-lauer axel-lauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this again and it works nicely with the two references. Thanks @cjroberts ! From my point of view, this is ready to be merged. @valeriupredoi would you like to look briefly into the 4 remaining Codacy issues? If not, fine with me, just let me know how you decide and we can get this merged.

@valeriupredoi
Copy link
Contributor Author

super job guys, cheers @axel-lauer for the careful review and @cjroberts for the great work! I have killed off most of the Codacy issues and will merge right after the tests have finished running 馃嵑 馃巺

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants