Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache regridding weights if possible #2344

Merged
merged 14 commits into from Apr 16, 2024
Merged

Conversation

schlunma
Copy link
Contributor

@schlunma schlunma commented Feb 23, 2024

Description

This implements regridder weights caching, which may reduce regridding time dramatically (if many variables of the same data set are analyzed).

I also modernized existing regridding tests so that they use pytest now instead of unittest, and actually test the regridding instead of using mocks.

Closes #2341

Link to documentation: https://esmvaltool--2344.org.readthedocs.build/projects/ESMValCore/en/2344/recipe/preprocessor.html#horizontal-regridding


Before you get started

Checklist

It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the πŸ›  Technical or πŸ§ͺ Scientific review.


To help with the number pull requests:

@schlunma schlunma added the preprocessor Related to the preprocessor label Feb 23, 2024
@schlunma schlunma added this to the v2.11.0 milestone Feb 23, 2024
@schlunma schlunma self-assigned this Feb 23, 2024
Copy link

codecov bot commented Feb 23, 2024

Codecov Report

All modified and coverable lines are covered by tests βœ…

Project coverage is 94.29%. Comparing base (6cf32c7) to head (3cf37b8).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2344      +/-   ##
==========================================
+ Coverage   94.28%   94.29%   +0.01%     
==========================================
  Files         246      246              
  Lines       13511    13540      +29     
==========================================
+ Hits        12739    12768      +29     
  Misses        772      772              

β˜” View full report in Codecov by Sentry.
πŸ“’ Have feedback on the report? Share it here.

Copy link
Contributor

@valeriupredoi valeriupredoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good, bud 🍺 A couple q's from me please - also, do you have a feel how big those caches can get to? ie would memory clogging get severe enough to forgo caching and just go for CPU time use instead?

doc/recipe/preprocessor.rst Show resolved Hide resolved
esmvalcore/preprocessor/_regrid.py Outdated Show resolved Hide resolved
@schlunma
Copy link
Contributor Author

schlunma commented Apr 5, 2024

Thanks for reviewing V! Memory usage should be minimal: as mentioned here, the weights should only be around 10 MiB for very high resolution grids (1000x1000), and much smaller for "normal" resolutions.

Copy link
Contributor

@valeriupredoi valeriupredoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for answering, Manu! Glad @bouweandela asked the same q about memory 😁

@valeriupredoi
Copy link
Contributor

valeriupredoi commented Apr 5, 2024

@bouweandela maybe you could have a look too, given that I took it you are not 100% convinced about the need of caching (reading from the issue) - and pls merge if all's good by ye too 🍺

@bouweandela
Copy link
Member

bouweandela commented Apr 11, 2024

Yes, a 10 percent increase in runtime in the best case doesn't seem like a huge gain, but it's nice to have of course. I am concerned about the size of the cache though, in the issue you talked about 1GB #2341 (comment) and I saw that there is some discussion planned at the workshop about re-using weights:

High resolution model data: often the weights to interpolate grids can be reused, not only for different variables but even across different experiments, time periods, and simulations of the same model. Could ESMValTool support the reading and use of precalculated weights because their calculation is very time consuming? (@katjaweigel?)

I image they would be rather large arrays too if they are so expensive to compute.

To avoid this turning into a memory leak, would it be an option to:

  1. add a method to the regrid preprocessor function to clear the cache, similar to how it's done if you're using lru_cache, so Python users can clear the cache when they are done with it?
  2. make weights caching optional by adding a cache_weights or similar argument to the preprocessor function, so it can be enabled from the recipe if it is relevant?

@schlunma
Copy link
Contributor Author

Yes, a 10 percent increase in runtime in the best case doesn't seem like a huge gain, but it's nice to have of course. I am concerned about the size of the cache though, in the issue you talked about 1GB #2341 (comment)

The 10% reduction in run time were only for this specific example, it's definitely not an upper bound. Also, the 1GiB is a really extreme example (e.g., a 0.1x0.1Β° grid would lead to a weights array of ~25 MiB).

To avoid this turning into a memory leak, would it be an option to:

  1. add a method to the regrid preprocessor function to clear the cache, similar to how it's done if you're using lru_cache, so Python users can clear the cache when they are done with it?

  2. make weights caching optional by adding a cache_weights or similar argument to the preprocessor function, so it can be enabled from the recipe if it is relevant?

That sounds very reasonable, I will do that! πŸ‘

@valeriupredoi
Copy link
Contributor

valeriupredoi commented Apr 11, 2024

from me own experience it's always better to read from file/memory if possible, than it is to compute (yes OK call me Dr Obvious 🀣 ) - and given Manu's reassuring info on upper limits for mem intake, I think it's a go - but I do like Bouwe's suggestions too. My only concern, that I just thought of, is maybe this is better it sat in iris? EDIT: then again, we have it, we use it - rather than wait a couple centuries for iris 😁

@valeriupredoi
Copy link
Contributor

Also (sorry, I sat down did MO crap today, so now my brain is free at last) - Manu, maybe it'd be worth ploppoing a worst case scenario test in the tests ie build a super-high res netCDF file on the fly, and do the weights caching dance on it - we'd be able to monitor the memory using the test performance tool we have in Github Actions, and of course, decorate it with eg @highmem so we don't run it usually 🍺

@schlunma
Copy link
Contributor Author

My only concern, that I just thought of, is maybe this is better it sat in iris?

Actually this entire functionality is already in iris in the form of Regridder classes that compute the weights and store them. With this PR, we are using those Regridder classes instead of the regridding schemes which do not offer weights caching.

Also (sorry, I sat down did MO crap today, so now my brain is free at last) - Manu, maybe it'd be worth ploppoing a worst case scenario test in the tests ie build a super-high res netCDF file on the fly, and do the weights caching dance on it - we'd be able to monitor the memory using the test performance tool we have in Github Actions, and of course, decorate it with eg @highmem so we don't run it usually 🍺

I am honestly not sure if that's worth it, especially if we implement the option to turn off weights caching 😬 What do we want to achieve with such a test? Kill the CI machines? πŸ˜„ πŸ€–

@schlunma
Copy link
Contributor Author

schlunma commented Apr 12, 2024

Added option to enable/disable (default: disabled) caching weights: b6eaade

Added function to clear cache similar to lru_cache's cache_clear(): regrid.cache_clear(), see 52a536c

@bouweandela
Copy link
Member

Thanks for making the changes @schlunma! @valeriupredoi Could you please do a final review to review the new code, since you reviewed the original PR, and then we can merge.

@valeriupredoi
Copy link
Contributor

On it, bud 🍺

@valeriupredoi
Copy link
Contributor

What do we want to achieve with such a test? Kill the CI machines?

Not a smart thing to do, especially since today Skynet's closer than ever πŸ˜† OK, leave the poor machines alone then

@valeriupredoi valeriupredoi added the enhancement New feature or request label Apr 15, 2024
Copy link
Contributor

@valeriupredoi valeriupredoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new stuff looks good! Cheers gents @schlunma and @bouweandela 🍺

@bouweandela bouweandela merged commit bbd307d into main Apr 16, 2024
6 checks passed
@bouweandela bouweandela deleted the cache_regridding_weights branch April 16, 2024 08:48
@chrisbillowsMO chrisbillowsMO added the dask related to improvements using Dask label May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dask related to improvements using Dask enhancement New feature or request preprocessor Related to the preprocessor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use iris' regridder caching for faster regridding?
4 participants