Skip apply_gufunc compensation when post-rechunk block fits array.chunk-size by thodson-usgs · Pull Request #12360 · dask/dask

thodson-usgs · 2026-04-23T19:12:19Z

Tests added
Passes pre-commit run --all-files

Follow-up to #11683. The loop-dim compensation block there unconditionally shrinks loop chunks after rechunking a core dim to -1, even when the original block was already small. Result: task graphs explode without any memory benefit.

import dask.array as da

src = da.zeros((200, 400), chunks=(2, 400))           # 100 input chunks, ~6 KB each
out = da.apply_gufunc(
    lambda x, c: x[:1], "(i),(i)->(j)",
    src, da.arange(200.0),
    axes=[(0,), (0,), (0,)], output_sizes={"j": 50},
    allow_rechunk=True, output_dtypes=float,
)
len(out.__dask_graph__())
# before: 20,701       after: 404

Fix: guard the compensation with an array.chunk-size budget. Only shrink loop dims when the post-rechunk block would actually exceed the limit. The memory-protection branch from #11683 still fires when it should — new test test_gufunc_chunksizes_adjustment_above_limit forces the budget down and pins that behavior.

Related: pydata/xarray#9907 (the original report motivating #11683; already closed), pydata/xarray#10130 (open memory-OOM report where the compensation over-fire contributes to graph blowup).

Companion xarray PR (pydata/xarray#11312) addresses the architectural side for interp(method="linear"|"nearest"); this one helps every apply_gufunc(allow_rechunk=True) caller.

cc @phofl @crusaderky

The compensation block introduced in dask#11683 unconditionally shrinks loop dimensions after rechunking core dims to -1, to preserve per-block memory. But it never checks whether the original block was already small — so it over-splits already-small loop dims, producing huge task graphs with no memory benefit. Concrete repro (from pydata/xarray#9907 follow-up): src = da.zeros((200, 400), chunks=(2, 400)) # 100 input chunks da.apply_gufunc(..., allow_rechunk=True) # core axis = 0 Before: latitude → 1 chunk (good), longitude split to 100 chunks of 4 (bad — 20,701 tasks in graph, loop dim was already one chunk) After: latitude → 1 chunk, longitude stays one chunk (404 tasks) Downstream impact on xarray.interp(method="linear"|"nearest") with dask-chunked input: ~100x speedup on a 200x400 -> 50x100 interp with 100 chunks (1554 ms -> 14 ms; task graph 21,731 -> 413). Fix: guard the compensation with a chunk-size budget check. Only shrink loop dims when the post-rechunk block would actually exceed ``array.chunk-size`` (default 128 MiB). For the xarray#9907 scenario the blocks still get split (verified by an explicit test with a reduced limit). Split the existing ``test_gufunc_chunksizes_adjustment`` into two tests covering both branches — below limit (no compensation), above limit (compensation fires). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Routes ``xarray.interp(method="linear"|"nearest"|"slinear")`` on a dask-chunked core dim through a per-chunk dispatch instead of ``apply_ufunc(..., allow_rechunk=True)``. For each target point, look up the source chunk that contains its coord value and run the interpolator over that chunk plus a size-1 halo. Per-task memory scales with ``source_chunk + halo`` rather than the full interp axis. Fall-back path preserves the existing behavior for cubic, multi-dim interpn, non-monotonic source coord, empty target, and numpy input. Verified against the existing apply_ufunc path on 200x400 -> 50x100 for several source-chunk layouts (bit-identical), on a 3D time-chunked input (time chunking preserved), and on the memory-constrained 6000x5000 case where the new path beats ``apply_ufunc`` by ~10x. The per-chunk path materializes 1D source coords (searchsorted-based routing); data stays lazy. ``test_dataset_interp_datetime_dask`` bumped its ``raise_if_dask_computes`` budget to account for this. Related: :issue:`9907` (already closed; same root cause) and :issue:`10130` (open; partial overlap — single-chunk-source cases still use the existing path, better addressed by the dask-side guard in dask/dask#12360). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-23T20:18:05Z

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

21 files ± 0 21 suites ±0 5h 25m 50s ⏱️ + 9m 34s
18 294 tests + 1 16 935 ✅ ± 0 1 272 💤 ±0 87 ❌ +1
317 539 runs +19 273 745 ✅ +16 43 707 💤 +2 87 ❌ +1

For more details on these failures, see this check.

Results for commit e9e8dba. ± Comparison against base commit c6a85f3.

This pull request removes 1 and adds 2 tests. Note that renamed tests count towards both.

dask.array.tests.test_gufunc ‑ test_gufunc_chunksizes_adjustment

dask.array.tests.test_gufunc ‑ test_gufunc_chunksizes_adjustment_above_limit
dask.array.tests.test_gufunc ‑ test_gufunc_chunksizes_adjustment_below_limit

Routes ``xarray.interp(method="linear"|"nearest"|"slinear")`` on a dask-chunked core dim through a per-chunk dispatch instead of ``apply_ufunc(..., allow_rechunk=True)``. For each target point, look up the source chunk that contains its coord value and run the interpolator over that chunk plus a size-1 halo. Per-task memory scales with ``source_chunk + halo`` rather than the full interp axis. Fall-back path preserves the existing behavior for cubic, multi-dim interpn, non-monotonic source coord, empty target, and numpy input. Verified against the existing apply_ufunc path on 200x400 -> 50x100 for several source-chunk layouts (bit-identical), on a 3D time-chunked input (time chunking preserved), and on the memory-constrained 6000x5000 case where the new path beats ``apply_ufunc`` by ~10x. The per-chunk path materializes 1D source coords (searchsorted-based routing); data stays lazy. ``test_dataset_interp_datetime_dask`` bumped its ``raise_if_dask_computes`` budget to account for this. Related: :issue:`9907` (already closed; same root cause) and :issue:`10130` (open; partial overlap — single-chunk-source cases still use the existing path, better addressed by the dask-side guard in dask/dask#12360). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Routes ``xarray.interp(method="linear"|"nearest"|"slinear")`` on a dask-chunked core dim through a per-chunk dispatch instead of ``apply_ufunc(..., allow_rechunk=True)``. For each target point, look up the source chunk that contains its coord value and run the interpolator over that chunk plus a size-1 halo. Per-task memory scales with ``source_chunk + halo`` rather than the full interp axis. Fall-back path preserves the existing behavior for cubic, multi-dim interpn, non-monotonic source coord, empty target, and numpy input. Verified against the existing apply_ufunc path on 200x400 -> 50x100 for several source-chunk layouts (bit-identical), on a 3D time-chunked input (time chunking preserved), and on the memory-constrained 6000x5000 case where the new path beats ``apply_ufunc`` by ~10x. The per-chunk path materializes 1D source coords (searchsorted-based routing); data stays lazy. ``test_dataset_interp_datetime_dask`` bumped its ``raise_if_dask_computes`` budget to account for this. Related: :issue:`9907` (already closed; same root cause) and :issue:`10130` (open; partial overlap — single-chunk-source cases still use the existing path, better addressed by the dask-side guard in dask/dask#12360). Co-Authored-By: Claude <noreply@anthropic.com>

thodson-usgs · 2026-04-24T01:40:27Z

Closing this. After benchmarking, the regime this fix targets (post-rechunk blocks well below array.chunk-size) is narrow — at normal chunk sizes there's no measurable difference, and the specific case that motivated it (pydata/xarray#9907, #10130) is being addressed from the xarray side in pydata/xarray#11312. Happy to revisit if someone hits the same symptom through a different entry point.

[This is Claude Code on behalf of Tim Hodson]

thodson-usgs mentioned this pull request Apr 23, 2026

Per-chunk path for interp(linear|nearest) on dask-chunked core dims pydata/xarray#11312

Closed

5 tasks

thodson-usgs closed this Apr 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Skip apply_gufunc compensation when post-rechunk block fits array.chunk-size#12360

Skip apply_gufunc compensation when post-rechunk block fits array.chunk-size#12360
thodson-usgs wants to merge 1 commit into
dask:mainfrom
thodson-usgs:fix/apply-gufunc-overcompensation

thodson-usgs commented Apr 23, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 23, 2026

Uh oh!

thodson-usgs commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

thodson-usgs commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 23, 2026

Unit Test Results

Uh oh!

thodson-usgs commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

thodson-usgs commented Apr 23, 2026 •

edited

Loading