Use iris' regridder caching for faster regridding? #2341

schlunma · 2024-02-23T09:53:38Z

Some regridding schemes of iris allow reusing the regridding weights (see here for an overview of them), which might significantly speed up our regrid preprocessor if multiple variables of the same dataset are regridded. This is described here and basically boils down to using the following pattern

regridder = iris.analysis.Nearest().regridder(source_cube, target_cube)
for cube in list_of_cubes_on_source_grid:
    result = regridder(cube)

instead of

for cube in list_of_cubes_on_source_grid:
    result = cube.regrid(target_cube, iris.analysis.Nearest())

This should be fairly easy to implement with a dict cache that uses the source and target latitude and longitude dimensions as keys (we cannot use lru_cache since the hash of a coordinate is simply its id, therefore this would never use cached results for different cubes).

One drawback here is that we have an additional comparison of the coordinates for each regrid call, which makes it slightly slower if just a single variable for many data sets is analyzed.

@ESMValGroup/esmvaltool-coreteam is this something we want? I can open a draft PR so that the discussion is easier.

The text was updated successfully, but these errors were encountered:

bouweandela · 2024-02-23T10:37:42Z

I'd recommend using a profiler, eg py-spy, first to profile a run of a typical recipe using regrid and then see where most time is spent. Of it turns out that it is indeed regridding, then we could consider this.

schlunma · 2024-02-23T13:01:14Z

That's a very good suggestion.

Here is a run with ~20 variables of 1 year of ICON data (regridding only, nothing else).

No caching (current main): esmvaltool-no_caching.json
With caching: esmvaltool-caching.json
(these files can be opened with https://www.speedscope.app/)

The function that calculates the weights is _regrid_unstructured_to_rectilinear__prepare. Without caching, this step takes up 11% of runtime (50s), with caching this number drops to 0.53% (2s).

For that specific use case (multiple variables from one model) this can save up quite some time with minimal effort, so I think it's worth to include this.

bouweandela · 2024-02-23T13:52:03Z

Maybe you could also try if it still saves time when running with #2316, as it looks like that's where we we're headed. How much memory do the weights take up?

schlunma · 2024-02-23T14:14:13Z

The weights are 2D arrays whose size scales with the number of source/target points. For example, for a 1000x1000 grid (which is much larger than commonly used for CMIP6 models), this sums up to ~8MiB assuming float64. Of course, for very very high resolution grids you might end up with ~1GiB, but I don't think that this matters on machines where you want to evaluate that kind of data.

Also, the improvement shown here is only a lower bound for that use case, as it's data with a very low resolution. For higher resolutions, calculating the regridding weights will even take more time. For other use cases (like one variable, many models), this should not change anything regarding runtime.

Why would you think that #2316 changes something here?

bouweandela · 2024-02-23T14:22:01Z

Because there al preprocessor functions are run on dask workers (with limited memory), so the cache will likely not yet be available on the worker you're running on. Of course there are smarter ways to manage the cache that would also work in a dask cluster, but then you would need to copy it to the workers again..

schlunma · 2024-02-23T16:37:01Z

Sorry, I can't get #2316 to run. With the same setup that I used for the other tests (4 workers with 7 GiB of memory each), the tool fails with a wild dask error (I guess because it runs out of memory), and after increasing the resources to 2 workers with 15 GiB this runs since an hour but is still at 0% (the other tests finished in ~3min).

It would be also nice to discuss #2316 at some point. This looks like a drastic change, and to me it it's not 100% why "we're headed" that way and what the benefits of that are.

On the other hand, I really think that we should implement this. Under the described circumstances, not reusing regridding weights is really bad practice (this is the most computationally intensive part). In all other cases, this won't change performance at all. I opened a PR about this (#2344) and it really doesn't require a lot of changes. We basically rely on the weights caching of iris (or other regridding libraries).

bouweandela · 2024-02-26T08:06:48Z

Interesting, could you share the recipe (and required data) so I can see if I can find out why it does not work with #2316? So far, I've tested it with examples/recipe_easy_ipcc.yml and gotten very good performance (was able to run the recipe in half an hour instead of the original 3-4 hours). Of course, we won't adopt that approach if it turns out that it does not work after all.

bouweandela · 2024-02-28T11:04:06Z

The result of investigating the issue with #2316 was that the FileLock that prevents downloading the grid multiple times is blocking the computation (fairly easy to solve) and esmpy crashes if used from multiple threads (I reported this to the esmpy developers and hope they will be willing to do something about it) -> some more work to do before #2316 is usable.

schlunma · 2024-02-28T11:33:39Z

Thanks for investigating! I still think it's worth to include this PR. What do you think?

valeriupredoi · 2024-04-05T15:55:31Z

and esmpy crashes if used from multiple threads

surely having to do with HDF5's thread unsafetyness!

Well, I reviewed and approved Manu's PR which I think it's a good idea - caching is always a nice alternative to on the fly, if it's not a bulky, mem-hogging cache, that is 🍺

schlunma added the enhancement New feature or request label Feb 23, 2024

schlunma added this to the v2.11.0 milestone Feb 23, 2024

schlunma self-assigned this Feb 23, 2024

katjaweigel assigned jmalles and katjaweigel Feb 23, 2024

schlunma mentioned this issue Feb 23, 2024

Cache regridding weights if possible #2344

Merged

10 tasks

bouweandela mentioned this issue Mar 4, 2024

Download ICON grid without locking #2359

Merged

8 tasks

bouweandela closed this as completed in #2344 Apr 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use iris' regridder caching for faster regridding? #2341

Use iris' regridder caching for faster regridding? #2341

schlunma commented Feb 23, 2024

bouweandela commented Feb 23, 2024

schlunma commented Feb 23, 2024 •

edited

bouweandela commented Feb 23, 2024

schlunma commented Feb 23, 2024

bouweandela commented Feb 23, 2024

schlunma commented Feb 23, 2024

bouweandela commented Feb 26, 2024

bouweandela commented Feb 28, 2024

schlunma commented Feb 28, 2024

valeriupredoi commented Apr 5, 2024

Use iris' regridder caching for faster regridding? #2341

Use iris' regridder caching for faster regridding? #2341

Comments

schlunma commented Feb 23, 2024

bouweandela commented Feb 23, 2024

schlunma commented Feb 23, 2024 • edited

bouweandela commented Feb 23, 2024

schlunma commented Feb 23, 2024

bouweandela commented Feb 23, 2024

schlunma commented Feb 23, 2024

bouweandela commented Feb 26, 2024

bouweandela commented Feb 28, 2024

schlunma commented Feb 28, 2024

valeriupredoi commented Apr 5, 2024

schlunma commented Feb 23, 2024 •

edited