Skip apply_gufunc compensation when post-rechunk block fits array.chunk-size#12360
Skip apply_gufunc compensation when post-rechunk block fits array.chunk-size#12360thodson-usgs wants to merge 1 commit into
Conversation
The compensation block introduced in dask#11683 unconditionally shrinks loop dimensions after rechunking core dims to -1, to preserve per-block memory. But it never checks whether the original block was already small — so it over-splits already-small loop dims, producing huge task graphs with no memory benefit. Concrete repro (from pydata/xarray#9907 follow-up): src = da.zeros((200, 400), chunks=(2, 400)) # 100 input chunks da.apply_gufunc(..., allow_rechunk=True) # core axis = 0 Before: latitude → 1 chunk (good), longitude split to 100 chunks of 4 (bad — 20,701 tasks in graph, loop dim was already one chunk) After: latitude → 1 chunk, longitude stays one chunk (404 tasks) Downstream impact on xarray.interp(method="linear"|"nearest") with dask-chunked input: ~100x speedup on a 200x400 -> 50x100 interp with 100 chunks (1554 ms -> 14 ms; task graph 21,731 -> 413). Fix: guard the compensation with a chunk-size budget check. Only shrink loop dims when the post-rechunk block would actually exceed ``array.chunk-size`` (default 128 MiB). For the xarray#9907 scenario the blocks still get split (verified by an explicit test with a reduced limit). Split the existing ``test_gufunc_chunksizes_adjustment`` into two tests covering both branches — below limit (no compensation), above limit (compensation fires). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Routes ``xarray.interp(method="linear"|"nearest"|"slinear")`` on a dask-chunked core dim through a per-chunk dispatch instead of ``apply_ufunc(..., allow_rechunk=True)``. For each target point, look up the source chunk that contains its coord value and run the interpolator over that chunk plus a size-1 halo. Per-task memory scales with ``source_chunk + halo`` rather than the full interp axis. Fall-back path preserves the existing behavior for cubic, multi-dim interpn, non-monotonic source coord, empty target, and numpy input. Verified against the existing apply_ufunc path on 200x400 -> 50x100 for several source-chunk layouts (bit-identical), on a 3D time-chunked input (time chunking preserved), and on the memory-constrained 6000x5000 case where the new path beats ``apply_ufunc`` by ~10x. The per-chunk path materializes 1D source coords (searchsorted-based routing); data stays lazy. ``test_dataset_interp_datetime_dask`` bumped its ``raise_if_dask_computes`` budget to account for this. Related: :issue:`9907` (already closed; same root cause) and :issue:`10130` (open; partial overlap — single-chunk-source cases still use the existing path, better addressed by the dask-side guard in dask/dask#12360). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 21 files ± 0 21 suites ±0 5h 25m 50s ⏱️ + 9m 34s For more details on these failures, see this check. Results for commit e9e8dba. ± Comparison against base commit c6a85f3. This pull request removes 1 and adds 2 tests. Note that renamed tests count towards both. |
Routes ``xarray.interp(method="linear"|"nearest"|"slinear")`` on a dask-chunked core dim through a per-chunk dispatch instead of ``apply_ufunc(..., allow_rechunk=True)``. For each target point, look up the source chunk that contains its coord value and run the interpolator over that chunk plus a size-1 halo. Per-task memory scales with ``source_chunk + halo`` rather than the full interp axis. Fall-back path preserves the existing behavior for cubic, multi-dim interpn, non-monotonic source coord, empty target, and numpy input. Verified against the existing apply_ufunc path on 200x400 -> 50x100 for several source-chunk layouts (bit-identical), on a 3D time-chunked input (time chunking preserved), and on the memory-constrained 6000x5000 case where the new path beats ``apply_ufunc`` by ~10x. The per-chunk path materializes 1D source coords (searchsorted-based routing); data stays lazy. ``test_dataset_interp_datetime_dask`` bumped its ``raise_if_dask_computes`` budget to account for this. Related: :issue:`9907` (already closed; same root cause) and :issue:`10130` (open; partial overlap — single-chunk-source cases still use the existing path, better addressed by the dask-side guard in dask/dask#12360). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Routes ``xarray.interp(method="linear"|"nearest"|"slinear")`` on a dask-chunked core dim through a per-chunk dispatch instead of ``apply_ufunc(..., allow_rechunk=True)``. For each target point, look up the source chunk that contains its coord value and run the interpolator over that chunk plus a size-1 halo. Per-task memory scales with ``source_chunk + halo`` rather than the full interp axis. Fall-back path preserves the existing behavior for cubic, multi-dim interpn, non-monotonic source coord, empty target, and numpy input. Verified against the existing apply_ufunc path on 200x400 -> 50x100 for several source-chunk layouts (bit-identical), on a 3D time-chunked input (time chunking preserved), and on the memory-constrained 6000x5000 case where the new path beats ``apply_ufunc`` by ~10x. The per-chunk path materializes 1D source coords (searchsorted-based routing); data stays lazy. ``test_dataset_interp_datetime_dask`` bumped its ``raise_if_dask_computes`` budget to account for this. Related: :issue:`9907` (already closed; same root cause) and :issue:`10130` (open; partial overlap — single-chunk-source cases still use the existing path, better addressed by the dask-side guard in dask/dask#12360). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Routes ``xarray.interp(method="linear"|"nearest"|"slinear")`` on a dask-chunked core dim through a per-chunk dispatch instead of ``apply_ufunc(..., allow_rechunk=True)``. For each target point, look up the source chunk that contains its coord value and run the interpolator over that chunk plus a size-1 halo. Per-task memory scales with ``source_chunk + halo`` rather than the full interp axis. Fall-back path preserves the existing behavior for cubic, multi-dim interpn, non-monotonic source coord, empty target, and numpy input. Verified against the existing apply_ufunc path on 200x400 -> 50x100 for several source-chunk layouts (bit-identical), on a 3D time-chunked input (time chunking preserved), and on the memory-constrained 6000x5000 case where the new path beats ``apply_ufunc`` by ~10x. The per-chunk path materializes 1D source coords (searchsorted-based routing); data stays lazy. ``test_dataset_interp_datetime_dask`` bumped its ``raise_if_dask_computes`` budget to account for this. Related: :issue:`9907` (already closed; same root cause) and :issue:`10130` (open; partial overlap — single-chunk-source cases still use the existing path, better addressed by the dask-side guard in dask/dask#12360). Co-Authored-By: Claude <noreply@anthropic.com>
|
Closing this. After benchmarking, the regime this fix targets (post-rechunk blocks well below [This is Claude Code on behalf of Tim Hodson] |
pre-commit run --all-filesFollow-up to #11683. The loop-dim compensation block there unconditionally shrinks loop chunks after rechunking a core dim to
-1, even when the original block was already small. Result: task graphs explode without any memory benefit.Fix: guard the compensation with an
array.chunk-sizebudget. Only shrink loop dims when the post-rechunk block would actually exceed the limit. The memory-protection branch from #11683 still fires when it should — new testtest_gufunc_chunksizes_adjustment_above_limitforces the budget down and pins that behavior.Related: pydata/xarray#9907 (the original report motivating #11683; already closed), pydata/xarray#10130 (open memory-OOM report where the compensation over-fire contributes to graph blowup).
Companion xarray PR (pydata/xarray#11312) addresses the architectural side for
interp(method="linear"|"nearest"); this one helps everyapply_gufunc(allow_rechunk=True)caller.cc @phofl @crusaderky