Cache regridding weights if possible #2344

schlunma · 2024-02-23T15:49:30Z

Description

This implements regridder weights caching, which may reduce regridding time dramatically (if many variables of the same data set are analyzed).

I also modernized existing regridding tests so that they use pytest now instead of unittest, and actually test the regridding instead of using mocks.

Closes #2341

Link to documentation: https://esmvaltool--2344.org.readthedocs.build/projects/ESMValCore/en/2344/recipe/preprocessor.html#horizontal-regridding

Before you get started

☝ Create an issue to discuss what you are going to do

Checklist

It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.

🧪 The new functionality is relevant and scientifically sound
🛠 This pull request has a descriptive title and labels
🛠 Code is written according to the code quality guidelines
🧪 and 🛠 Documentation is available
🛠 Unit tests have been added
🛠 Changes are backward compatible
🛠 Any changed dependencies have been added or removed correctly
🛠 The list of authors is up to date
🛠 All checks below this pull request were successful

To help with the number pull requests:

🙏 We kindly ask you to review two other open pull requests in this repository

…e it reuse weights)

codecov · 2024-02-23T15:50:52Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.29%. Comparing base (6cf32c7) to head (3cf37b8).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2344      +/-   ##
==========================================
+ Coverage   94.28%   94.29%   +0.01%     
==========================================
  Files         246      246              
  Lines       13511    13540      +29     
==========================================
+ Hits        12739    12768      +29     
  Misses        772      772

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

valeriupredoi

looking good, bud 🍺 A couple q's from me please - also, do you have a feel how big those caches can get to? ie would memory clogging get severe enough to forgo caching and just go for CPU time use instead?

doc/recipe/preprocessor.rst

esmvalcore/preprocessor/_regrid.py

schlunma · 2024-04-05T15:41:23Z

Thanks for reviewing V! Memory usage should be minimal: as mentioned here, the weights should only be around 10 MiB for very high resolution grids (1000x1000), and much smaller for "normal" resolutions.

valeriupredoi

thanks for answering, Manu! Glad @bouweandela asked the same q about memory 😁

valeriupredoi · 2024-04-05T16:04:32Z

@bouweandela maybe you could have a look too, given that I took it you are not 100% convinced about the need of caching (reading from the issue) - and pls merge if all's good by ye too 🍺

bouweandela · 2024-04-11T15:28:26Z

Yes, a 10 percent increase in runtime in the best case doesn't seem like a huge gain, but it's nice to have of course. I am concerned about the size of the cache though, in the issue you talked about 1GB #2341 (comment) and I saw that there is some discussion planned at the workshop about re-using weights:

High resolution model data: often the weights to interpolate grids can be reused, not only for different variables but even across different experiments, time periods, and simulations of the same model. Could ESMValTool support the reading and use of precalculated weights because their calculation is very time consuming? (@katjaweigel?)

I image they would be rather large arrays too if they are so expensive to compute.

To avoid this turning into a memory leak, would it be an option to:

add a method to the regrid preprocessor function to clear the cache, similar to how it's done if you're using lru_cache, so Python users can clear the cache when they are done with it?
make weights caching optional by adding a cache_weights or similar argument to the preprocessor function, so it can be enabled from the recipe if it is relevant?

schlunma · 2024-04-11T16:03:52Z

Yes, a 10 percent increase in runtime in the best case doesn't seem like a huge gain, but it's nice to have of course. I am concerned about the size of the cache though, in the issue you talked about 1GB #2341 (comment)

The 10% reduction in run time were only for this specific example, it's definitely not an upper bound. Also, the 1GiB is a really extreme example (e.g., a 0.1x0.1° grid would lead to a weights array of ~25 MiB).

To avoid this turning into a memory leak, would it be an option to:

add a method to the regrid preprocessor function to clear the cache, similar to how it's done if you're using lru_cache, so Python users can clear the cache when they are done with it?

make weights caching optional by adding a cache_weights or similar argument to the preprocessor function, so it can be enabled from the recipe if it is relevant?

That sounds very reasonable, I will do that! 👍

valeriupredoi · 2024-04-11T16:19:26Z

from me own experience it's always better to read from file/memory if possible, than it is to compute (yes OK call me Dr Obvious 🤣 ) - and given Manu's reassuring info on upper limits for mem intake, I think it's a go - but I do like Bouwe's suggestions too. My only concern, that I just thought of, is maybe this is better it sat in iris? EDIT: then again, we have it, we use it - rather than wait a couple centuries for iris 😁

valeriupredoi · 2024-04-11T16:24:25Z

Also (sorry, I sat down did MO crap today, so now my brain is free at last) - Manu, maybe it'd be worth ploppoing a worst case scenario test in the tests ie build a super-high res netCDF file on the fly, and do the weights caching dance on it - we'd be able to monitor the memory using the test performance tool we have in Github Actions, and of course, decorate it with eg @highmem so we don't run it usually 🍺

schlunma · 2024-04-11T17:39:46Z

My only concern, that I just thought of, is maybe this is better it sat in iris?

Actually this entire functionality is already in iris in the form of Regridder classes that compute the weights and store them. With this PR, we are using those Regridder classes instead of the regridding schemes which do not offer weights caching.

Also (sorry, I sat down did MO crap today, so now my brain is free at last) - Manu, maybe it'd be worth ploppoing a worst case scenario test in the tests ie build a super-high res netCDF file on the fly, and do the weights caching dance on it - we'd be able to monitor the memory using the test performance tool we have in Github Actions, and of course, decorate it with eg @highmem so we don't run it usually 🍺

I am honestly not sure if that's worth it, especially if we implement the option to turn off weights caching 😬 What do we want to achieve with such a test? Kill the CI machines? 😄 🤖

schlunma · 2024-04-12T08:37:42Z

Added option to enable/disable (default: disabled) caching weights: b6eaade

Added function to clear cache similar to lru_cache's cache_clear(): regrid.cache_clear(), see 52a536c

bouweandela · 2024-04-15T09:53:44Z

Thanks for making the changes @schlunma! @valeriupredoi Could you please do a final review to review the new code, since you reviewed the original PR, and then we can merge.

valeriupredoi · 2024-04-15T10:26:41Z

On it, bud 🍺

valeriupredoi · 2024-04-15T13:29:22Z

What do we want to achieve with such a test? Kill the CI machines?

Not a smart thing to do, especially since today Skynet's closer than ever 😆 OK, leave the poor machines alone then

valeriupredoi

new stuff looks good! Cheers gents @schlunma and @bouweandela 🍺

schlunma added 5 commits February 23, 2024 13:44

Cache regridders

c0fafec

Undo changes in ESMPy regridding (it's not as easy as it seems to mak…

2933d1c

…e it reuse weights)

Modernize tests (unittest -> pytest)

bf33934

Update docstring

cf3fb2a

Added tests

c257945

schlunma added the preprocessor Related to the preprocessor label Feb 23, 2024

schlunma added this to the v2.11.0 milestone Feb 23, 2024

schlunma self-assigned this Feb 23, 2024

schlunma mentioned this pull request Feb 23, 2024

Add support for native ERA5 data in GRIB format #2178

Draft

14 tasks

schlunma marked this pull request as ready for review February 23, 2024 16:36

schlunma mentioned this pull request Feb 23, 2024

Use iris' regridder caching for faster regridding? #2341

Closed

schlunma and others added 2 commits February 27, 2024 10:02

Better docstrings

eb502e4

Merge branch 'main' into cache_regridding_weights

af2b3f9

valeriupredoi requested changes Apr 5, 2024

View reviewed changes

doc/recipe/preprocessor.rst Show resolved Hide resolved

esmvalcore/preprocessor/_regrid.py Outdated Show resolved Hide resolved

valeriupredoi approved these changes Apr 5, 2024

View reviewed changes

valeriupredoi requested a review from bouweandela April 5, 2024 16:03

schlunma added 4 commits April 11, 2024 20:00

Merge remote-tracking branch 'origin/main' into cache_regridding_weights

3c10fc2

Add option to enable/disable regridder weights caching

b6eaade

Last one for today @valeriupredoi 🍻

a2f2b0a

Added functions to clear regridding weights cache

52a536c

Please Codacy

06f846c

Also performa all tests with cached_weights=True

6032bf4

valeriupredoi added the enhancement New feature or request label Apr 15, 2024

valeriupredoi approved these changes Apr 15, 2024

View reviewed changes

Merge branch 'main' into cache_regridding_weights

3cf37b8

bouweandela merged commit bbd307d into main Apr 16, 2024
6 checks passed

bouweandela deleted the cache_regridding_weights branch April 16, 2024 08:48

chrisbillowsMO added the dask related to improvements using Dask label May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache regridding weights if possible #2344

Cache regridding weights if possible #2344

schlunma commented Feb 23, 2024 •

edited

codecov bot commented Feb 23, 2024 •

edited

valeriupredoi left a comment

schlunma commented Apr 5, 2024

valeriupredoi left a comment

valeriupredoi commented Apr 5, 2024 •

edited

bouweandela commented Apr 11, 2024 •

edited

schlunma commented Apr 11, 2024

valeriupredoi commented Apr 11, 2024 •

edited

valeriupredoi commented Apr 11, 2024

schlunma commented Apr 11, 2024

schlunma commented Apr 12, 2024 •

edited

bouweandela commented Apr 15, 2024

valeriupredoi commented Apr 15, 2024

valeriupredoi commented Apr 15, 2024

valeriupredoi left a comment •

edited

Cache regridding weights if possible #2344

Cache regridding weights if possible #2344

Conversation

schlunma commented Feb 23, 2024 • edited

Description

Before you get started

Checklist

codecov bot commented Feb 23, 2024 • edited

Codecov Report

valeriupredoi left a comment

Choose a reason for hiding this comment

schlunma commented Apr 5, 2024

valeriupredoi left a comment

Choose a reason for hiding this comment

valeriupredoi commented Apr 5, 2024 • edited

bouweandela commented Apr 11, 2024 • edited

schlunma commented Apr 11, 2024

valeriupredoi commented Apr 11, 2024 • edited

valeriupredoi commented Apr 11, 2024

schlunma commented Apr 11, 2024

schlunma commented Apr 12, 2024 • edited

bouweandela commented Apr 15, 2024

valeriupredoi commented Apr 15, 2024

valeriupredoi commented Apr 15, 2024

valeriupredoi left a comment • edited

Choose a reason for hiding this comment

schlunma commented Feb 23, 2024 •

edited

codecov bot commented Feb 23, 2024 •

edited

valeriupredoi commented Apr 5, 2024 •

edited

bouweandela commented Apr 11, 2024 •

edited

valeriupredoi commented Apr 11, 2024 •

edited

schlunma commented Apr 12, 2024 •

edited

valeriupredoi left a comment •

edited