Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update xesmf to versions >= 0.4.0 #2728

Merged
merged 7 commits into from May 23, 2023
Merged

Update xesmf to versions >= 0.4.0 #2728

merged 7 commits into from May 23, 2023

Conversation

zklaus
Copy link
Contributor

@zklaus zklaus commented Jul 20, 2022

Description

This PR updates xesmf to versions newer than 0.4.0. To do that, it updates the UERRA cmorizer to use the newer API, which means providing a filename for the weights explicitly. It also changes the name of xesmf on Pypi to pangeo-xesmf.
With this, we should be able to unpin xesmf and avoid some trouble with recent ABI incompatibilities.


Before you get started

Checklist

It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the πŸ›  Technical or πŸ§ͺ Scientific review.

New or updated data reformatting script


To help with the number of pull requests:

@zklaus
Copy link
Contributor Author

zklaus commented Jul 20, 2022

@remi-kazeroni, could you possibly give the changed cmorizer a spin in an updated environment?

@zklaus zklaus marked this pull request as ready for review July 20, 2022 13:04
@remi-kazeroni
Copy link
Contributor

@remi-kazeroni, could you possibly give the changed cmorizer a spin in an updated environment?

I have just created a new environment based on this branch in which I have:

# Name                    Version                   Build  Channel
pangeo-xesmf              0.6.3                    pypi_0    pypi
xesmf                     0.6.3              pyhd8ed1ab_1    conda-forge

but the cmorizer command for CDS-UERRA (esmvaltool data format CDS-UERRA) does not work for me, see log attached:
main_log_debug.txt.
I think there is an issue with this line which only returns the longitude (1440) and not the latitude (720).

I have also tried the cmorizer with a fresh environment created from the main branch (i.e. with xesmf=0.3.0 in it) and the cmorizer seems to run fine.

@valeriupredoi
Copy link
Contributor

that's no bueno, the env now contains two essentially identical packages, but with different names - although the conda forge one should be the only one we want πŸ€¦β€β™‚οΈ

@zklaus
Copy link
Contributor Author

zklaus commented Jul 20, 2022

You are right, @valeriupredoi, that is quite unfortunate. I have just opened an issue on the xesmf feedstock for it: conda-forge/xesmf-feedstock#27. The other thing that @remi-kazeroni brought up is simply a bug. Sorry about that, will fix soon.

@valeriupredoi
Copy link
Contributor

good man, Klaus! I built the env locally too and just noticed the pangeo-xesmf unpacks and installs on top of the already-installed conda xesmf, inside the from-conda xesmf install location - that's a mess - I'll go πŸ‘ your issue there 🍺

@sloosvel
Copy link
Contributor

Do you expect to be able to fix this by friday?

@zklaus
Copy link
Contributor Author

zklaus commented Jul 20, 2022

Do you expect to be able to fix this by friday?

Possibly. I don't think it's very difficult, but the right people need to respond in time. If it's not fixed by tomorrow evening, it will not happen. In that case, I propose to go ahead with pinning to <=0.3.0.

@sloosvel
Copy link
Contributor

What about the bug that @remi-kazeroni reported?

@zklaus
Copy link
Contributor Author

zklaus commented Jul 20, 2022

Working on it. It's not difficult, but downloading the rawdata locally takes forever. I am trying to test on levante now.

@remi-kazeroni
Copy link
Contributor

Working on it. It's not difficult, but downloading the rawdata locally takes forever. I am trying to test on levante now.

Sorry, I realize that I have never updated the raw data for CDS-UERRA when working on the new cmorizer interface. Because of a minor permission issue, I can't change that right now. I have downloaded all the raw data in /work/bd0854/b309192/runs/obs_PR/RAWOBS/Tier3/CDS-UERRA and cmoized everything in /work/bd0854/b309192/runs/obs_PR/OBS/Tier3/CDS-UERRA. Feel free to use these data and sorry for the inconvenience!

@zklaus
Copy link
Contributor Author

zklaus commented Jul 20, 2022

No worries. Thanks for the heads up, that should help!

@zklaus
Copy link
Contributor Author

zklaus commented Jul 20, 2022

Ok, the bug should be fixed and I tested it on the data that @remi-kazeroni mentioned. @remi-kazeroni, could you please have a second look and confirm that the data is as it should? I'll try to sort out the xesmf mess then tomorrow before we can merge this.

@zklaus
Copy link
Contributor Author

zklaus commented Jul 21, 2022

To make a decision here, we need a brief insight into how Pypi/pip/setuptools manage metadata vs how conda/mamba/conda-forge does it.

Setuptools keeps metadata about installed packages (from pypi, via pip, or otherwise) in dist-info directories inside the normal site-packages directory, for example lib/python3.8/site-packages/scitools_iris-3.2.1.dist-info/. The directory name already contains the distname, which may differ from any contained Python packages, and version information. This information is also found in various files in that directory.

Conda keeps its metadata in two places, namely inside the environment directory in the conda-meta sub-directory as a collection of json files and in the package in the info subdirectory, for example pkgs/iris-3.2.1-pyhd8ed1ab_0/info/. This directory name already contains name and version info, this name can be different from both setuptools' distname and any contained Python packages.

To achieve some interoperability, conda-forge includes dist-info directories in the conda packages.

The problem with xesmf is that the upstream project releases source distributions and wheels on Pypi with a distname of pangeo-xesmf, but also puts out source releases and wheels on their github page with a distname of xesmf. Reading between the lines, I think this is because they didn't manage to get control of the still present xesmf project on Pypi.

Conda-forge builds its packages from the github releases and consequently puts xesmf-0.6.3.dist-info directories into site-packages. This way, setuptools finds the info and is happy, if we put xesmf>=0.4.0 in setup.py. But that means that ESMValTool is no longer Pip installable because that dependency cannot be satisfied from Pypi, where it is only available as pangeo-xesmf.

Since we install everything from conda-forge, I think this is the best way forward now, though we should definitely press upstream to resolve this situation.

@sloosvel
Copy link
Contributor

Can I ask why the regridding is using xesfm instead of the esmvalcore regridder? I am assuming this was written a long time ago. Doesn't the updated module use xesfm or something like that anyway? Maybe it would be the worth it to update the cmorizer to use our routines and forget about this dependency nightmares.

@remi-kazeroni
Copy link
Contributor

Ok, the bug should be fixed and I tested it on the data that @remi-kazeroni mentioned. @remi-kazeroni, could you please have a second look and confirm that the data is as it should? I'll try to sort out the xesmf mess then tomorrow before we can merge this.

Thanks for your efforts @zklaus. Note that I have moved the CDS-UERRA data to the shared RAWOBS and OBS on Levante to ease the work here. The regridding part seems to work fine indeed but the CMORization returns an error:

Traceback (most recent call last):
  File "/work/bd0854/b309192/soft/mambaforge/envs/test_xesmf/lib/python3.10/site-packages/iris/__init__.py", line 343, in load_cube
    cube = cubes.merge_cube()
  File "/work/bd0854/b309192/soft/mambaforge/envs/test_xesmf/lib/python3.10/site-packages/iris/cube.py", line 386, in merge_cube
    raise ValueError("can't merge an empty CubeList")
ValueError: can't merge an empty CubeList

I did 2 tests on Levante:

  1. with this branch and an environment based on it: /work/bd0854/b309192/data/downloaded/esmvaltool_output/data_formatting_20220721_080604/
  2. with the main branch and an environment based on it: /work/bd0854/b309192/data/downloaded/esmvaltool_output/data_formatting_20220721_084427/

In both cases regridded data are produced in work but with different structure:

netcdf reanalysis-uerra-europe-soil-levels_196205 {
dimensions:
        time = 124 ;
        lat = 720 ;
        lon = 1440 ;
variables:
        int64 time(time) ;
                time:units = "seconds since 1970-01-01" ;
                time:calendar = "proleptic_gregorian" ;
        double step ;
                step:_FillValue = NaN ;
                step:units = "hours" ;
        double soilLayer ;
                soilLayer:_FillValue = NaN ;
        double valid_time(time) ;
                valid_time:_FillValue = NaN ;
                valid_time:units = "seconds since 1970-01-01" ;
                valid_time:calendar = "proleptic_gregorian" ;
        double lat(lat) ;
                lat:_FillValue = NaN ;
                lat:standard_name = "latitude" ;
                lat:units = "degrees_north" ;
        double lon(lon) ;
                lon:_FillValue = NaN ;
                lon:standard_name = "longitude" ;
                lon:units = "degrees_east" ;
        float __xarray_dataarray_variable__(time, lat, lon) ;
                __xarray_dataarray_variable__:_FillValue = NaNf ;
                __xarray_dataarray_variable__:coordinates = "step valid_time soilLayer" ;
}
netcdf reanalysis-uerra-europe-soil-levels_196205 {
dimensions:
        time = 124 ;
        lon = 1440 ;
        lat = 720 ;
variables:
        int64 time(time) ;
                time:long_name = "initial time of forecast" ;
                time:standard_name = "forecast_reference_time" ;
                time:units = "seconds since 1970-01-01" ;
                time:calendar = "proleptic_gregorian" ;
        double step ;
                step:_FillValue = NaN ;
                step:long_name = "time since forecast_reference_time" ;
                step:standard_name = "forecast_period" ;
                step:units = "hours" ;
        double soilLayer ;
                soilLayer:_FillValue = NaN ;
                soilLayer:long_name = "original GRIB coordinate for key: level(soilLayer)" ;
                soilLayer:units = "1" ;
        double valid_time(time) ;
                valid_time:_FillValue = NaN ;
                valid_time:standard_name = "time" ;
                valid_time:long_name = "time" ;
                valid_time:units = "seconds since 1970-01-01" ;
                valid_time:calendar = "proleptic_gregorian" ;
        double lon(lon) ;
                lon:_FillValue = NaN ;
        double lat(lat) ;
                lat:_FillValue = NaN ;
        double vsw(time, lat, lon) ;
                vsw:_FillValue = NaN ;
                vsw:coordinates = "valid_time soilLayer step" ;
}

I'm not sure I understand why the CMORization only works for regridded data in case 2 (main branch). This CMORizer is not very convenient to debug because the start and end options to select a time range are not enabled so everything is regridded before the CMORization starts...

@valeriupredoi
Copy link
Contributor

cheers @remi-kazeroni - and thanks lots to @zklaus and @sloosvel - sorry, was away the second part of last week. I suspect that the xesmf regridding is done since the regridding is done before the CMORization and esmvalcore/iris regridding can't be used because of wonky non-CF coordinates, so iris be complaining. This is, however, not an optimal approach, so I'd reckon we could get rid of xesmf for good, by regridding the data after CMORization - do you think this flow would affect the final data output?

@zklaus
Copy link
Contributor Author

zklaus commented Jul 27, 2022

I'd be in favor of getting rid of xesmf here. I wonder if we should regrid in the cmorizer at all. People might want to have the original data anyways, and when they want it regridded, they might have opinions on how the regridding should be done, so why not leave it to the preprocessor in the recipe?

@bouweandela
Copy link
Member

Great idea, that would be an even simpler solution

@zklaus
Copy link
Contributor Author

zklaus commented Jul 28, 2022

Note that this applies potentially to several cmorizers that are regridding right now, even though they might not use xesmf for this. Would that be good practice in general? What would be an appropriate forum to make that kind of decision?

@remi-kazeroni
Copy link
Contributor

I like the idea of not regridding in cmorizers and only have that done by preprocessors in recipes. But this would impact other cmorizers for which the data are then used in public recipes. Perhaps, it would be useful to open a separate issue on the topic and discuss with the development team.

Regarding the CDS-UERRA cmorizer, I am not in a position to say if it is fine to remove the regridding. I would also prefer to remove the regridding from the script or at least replace it with functions from the Core. Since the dataset is not used in any of the public recipes, I don't except much impact from this removal.
@bascrezee, since you developed the original cmorizer in #1275, would you be able to tell if it is fine to remove the regridding part from the cmorizer?

@bascrezee
Copy link
Contributor

Hi esmvaltool team :) Yes taking the regridding out of the cmorizer for CDS-UERRA is fine with me.

@remi-kazeroni
Copy link
Contributor

Hi esmvaltool team :) Yes taking the regridding out of the cmorizer for CDS-UERRA is fine with me.

Thanks for your input @bascrezee!

@zklaus, would you have the time to take care of removing the regridder from this CMORizer? If not, I could take a look.

@zklaus
Copy link
Contributor Author

zklaus commented Jul 29, 2022

Sorry, @remi-kazeroni, today is my last working day before vacation and I'll be back only in September. I can have a look then, but not earlier. πŸ–οΈ

@remi-kazeroni
Copy link
Contributor

Sorry, @remi-kazeroni, today is my last working day before vacation and I'll be back only in September. I can have a look then, but not earlier. πŸ–οΈ

Enjoy your vacation @zklaus 😎 I'll try to have a look next month then.

@valeriupredoi
Copy link
Contributor

good hols (hopefully on a πŸ–οΈ ) @zklaus 🍸 @remi-kazeroni you can count on me too, am going nowhere nice πŸ‡¬πŸ‡§

@bouweandela
Copy link
Member

Ping @zklaus @remi-kazeroni @valeriupredoi It looks like this was forgotten after the summer holidays. Is this something we would like to include in the upcoming v2.8 release?

@remi-kazeroni
Copy link
Contributor

Ping @zklaus @remi-kazeroni @valeriupredoi It looks like this was forgotten after the summer holidays. Is this something we would like to include in the upcoming v2.8 release?

Indeed, I completely forgot about that PR. I remember trying to get rid of the regridder step of the UERRA CMORizer but did not succeed in doing so because of the structure of the raw data...

Do we still have an issue with that version of the xesmf package that should be addressed before v2.8?

@bouweandela
Copy link
Member

It's getting a bit old so it might be inconvenient for users who want to use ESMValTool and xesmf in the same environment, but otherwise, there is no issue that I'm aware of.

@valeriupredoi
Copy link
Contributor

just realized, while opening conda-forge/esmvaltool-suite-feedstock#18 that we are still using xesmf=0.3.0 which is absolutely decrepit - we should have looked at this before heading over to release v2.8.0 - do we have time to accept it maybe?

@remi-kazeroni
Copy link
Contributor

I really lost track of this and don't have time to work on it myself for this release. It would be great if someone could take over the review and push it to the finish line, if possible. If I remember correctly, the xesmf package is only used for the CDS-UERRA cmorizer. But this dataset is not used in any public recipe (except check_obs of course). Thus, if we're left with the choice between removing the pin on xesmf which breaks the CMORizer and keeping the pin which puts the ESMValTool package at risk, I would go for the first option.

@zklaus
Copy link
Contributor Author

zklaus commented Mar 16, 2023

Oh my. Pangeo has initiated recovery of the pypi project, see pypi/support#2425, but probably removing the regridding is still a good idea and it seems that the recovery is likely to take a few more weeks. Pangeo opened the issue back in November, but PyPI got to it only mid-February and says the process might take six weeks.

setup.py Outdated
@@ -64,7 +64,7 @@
'seawater',
'shapely<2.0.0', # github.com/ESMValGroup/ESMValTool/issues/2965
'xarray',
'xesmf==0.3.0',
'xesmf>=0.4.0',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zklaus maybe leave this to free so the pip installation can go ahead without the version not found error?

@zklaus
Copy link
Contributor Author

zklaus commented Apr 5, 2023

Upstream has recovered the PyPI project and published the latest 0.7.1 there. Once this has made it to Conda-forge (see conda-forge/xesmf-feedstock#30), we can pin >=0.7.1 and fix the dependencies.

@bouweandela
Copy link
Member

Great to see progress on this! It looks like the linked pull request on conda-forge was merged.

Copy link
Contributor

@valeriupredoi valeriupredoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this now looks very good to me, apart from the small observation about the if/else block, but all the deps and versions look clean now, cheers @zklaus 🍺 Incidentally, xesmf is only needed for this particular cmorizer, is there any way we can replace it with say, esmpy or anything else? It's a proper overkill to have such a fussy dependency for one script only 😁

@bouweandela bouweandela merged commit 113cdb6 into main May 23, 2023
6 checks passed
@bouweandela bouweandela deleted the update-xesmf branch May 23, 2023 11:46
@valeriupredoi
Copy link
Contributor

great! cheers @zklaus and @bouweandela and @remi-kazeroni 🍺 x3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Old xesmf version dependency creates problems
6 participants