Skip to content

arena ToT: einsum sub-World fence guard + (outer-Contraction, inner-Hadamard) view-cell case#550

Merged
evaleev merged 2 commits into
masterfrom
evaleev/fix/arena-tot-fence-and-view-hadamard-outer-contract
May 21, 2026
Merged

arena ToT: einsum sub-World fence guard + (outer-Contraction, inner-Hadamard) view-cell case#550
evaleev merged 2 commits into
masterfrom
evaleev/fix/arena-tot-fence-and-view-hadamard-outer-contract

Conversation

@evaleev
Copy link
Copy Markdown
Member

@evaleev evaleev commented May 21, 2026

Summary

Stacked on top of #551 (lazy_deleter fast path).

  • einsum: add a single-fence RAII FenceSubWorldsOnExit guard so any tasks scheduled by non-deferred ~DistArray calls (e.g. AB during exception unwind) are drained before sub-Worlds are torn down.
  • cont_engine: add the (outer Contraction, inner Hadamard) view-cell case for ArenaTensor-backed ToT inner tiles, via the existing arena fast path.

Important

The PR base is set to evaleev/feature/lazy-deleter-skip-sync-in-do-cleanup (#551). Once #551 merges, this PR will auto-retarget master.

Why

Triaged from an abort while running a CSV-CCk traced expression over ArenaTensor ToT operands in MPQC. Cascade:

  1. The expression C(i_3,i_4;a<...>) = A(i_3;a<...>) * B(i_4;a<...>) -- the typical sub-product inside einsum's generalized-contraction loop after ToT operands are reduced per Hadamard tile -- hit the view-cell branch of init_inner_tile_op with (outer Contraction, inner Hadamard). That combo had no handler and threw \"nested non-contraction product on view inner tiles is not yet supported\".
  2. The exception propagated out of einsum's per-h iteration. The temporary worlds vector destructed during stack unwind, tearing down sub-Worlds while lazy_sync_children tasks scheduled by ~DistArray's lazy_deleter were still in the global ThreadPool's queue.
  3. A later fence in the enclosing scope picked up those stranded tasks; ~WorldObject then asserted World::exists(&world) and aborted, masking the real TA exception.

#551 eliminates the deferred-cleanup half of (2) at the source (no lazy_sync task ever scheduled in that path). This PR adds the missing view-cell case so (1) does not throw in the first place, and keeps a thin RAII guard so the non-deferred path during exception unwind also leaves the sub-Worlds clean.

What

einsum RAII guard

FenceSubWorldsOnExit declared right after the worlds vector. LIFO destruction means AB/C destruct first (releasing pimpls; the non-deferred path goes through lazy_sync and enqueues tasks on sub-Worlds), then the guard runs (drains those tasks via a single fence per sub-World), then worlds destructs (empty taskqs). A single fence per sub-World is sufficient because #551 makes the deferred-cleanup ~DistArray path skip lazy_sync (no post-do_cleanup task to drain); only tasks from non-deferred destructors remain, and a single drain suffices.

All participating ranks of a sub-World reach this RAII guard at the same point in lockstep at function exit, so their lazy_sync handshakes match up.

cont_engine view-cell case

init_inner_tile_op now handles (outer Contraction, inner Hadamard) for view inner cells. Mirrors init_inner_tile_op_owning_: arena_plan_ with ArenaInnerShapeKind::left_range and a per-cell make_fused_hadamard_lambda / make_fused_hadamard_scaled_lambda op that accumulates r += l * rr (optionally scaled) into pre-shaped view cells. Non-identity inner result permutation is rejected explicitly (the owning fallback that materializes a permuted return cell cannot run for views).

Test plan

  • Local downstream MPQC repro (bimal.json CSV-CCk trace evaluator): previously aborted in ~WorldObject during stack unwind; now runs end-to-end with the real TA expression evaluating cleanly, exit 0.
  • CI green (single-rank and multi-rank).

@evaleev evaleev force-pushed the evaleev/fix/arena-tot-fence-and-view-hadamard-outer-contract branch from 74c87c5 to 7a2d341 Compare May 21, 2026 02:09
@evaleev evaleev force-pushed the evaleev/fix/arena-tot-fence-and-view-hadamard-outer-contract branch from 7a2d341 to 23505b3 Compare May 21, 2026 02:48
@evaleev evaleev changed the title arena ToT: fix exception-unwind world tear-down and add (outer-Contraction, inner-Hadamard) view-cell case arena ToT: fence sub-Worlds during destruction + (outer-Contraction, inner-Hadamard) view-cell case May 21, 2026
@evaleev evaleev force-pushed the evaleev/fix/arena-tot-fence-and-view-hadamard-outer-contract branch from 23505b3 to 31f9447 Compare May 21, 2026 03:10
@evaleev evaleev changed the title arena ToT: fence sub-Worlds during destruction + (outer-Contraction, inner-Hadamard) view-cell case arena ToT: bypass lazy_sync in deferred lazy_deleter + (outer-Contraction, inner-Hadamard) view-cell case May 21, 2026
evaleev added 2 commits May 21, 2026 12:23
Add an inline RAII guard `FenceSubWorldsOnExit` to the generalized-
contraction path of einsum, declared right after the `worlds` vector so
it destructs *before* `worlds` (LIFO) and *after* AB/C. On normal exit
this is a final harmless drain; on exception unwind it drains any
`lazy_sync_children` tasks that ~DistArray scheduled via lazy_deleter
on sub-World taskqs before those sub-Worlds are torn down. Without
this, those tasks survive into the global ThreadPool past ~World, then
trip ~WorldObject's `World::exists(&world)` assertion when an enclosing
scope's fence runs them, masking the real exception with a cryptic
abort.

One fence per sub-World suffices because lazy_deleter now bypasses
lazy_sync when invoked from `do_cleanup` (gated by
`world.gop.is_in_do_cleanup()`): the deferred-cleanup path performs
direct deletes rather than scheduling cross-rank tasks. The remaining
tasks this fence has to drain come only from non-deferred ~DistArray
calls (e.g. AB during exception unwind), and all participating ranks
of a sub-World reach this RAII guard in lockstep so their lazy_sync
handshakes match up.
Add the (outer Contraction, inner Hadamard) case to
init_inner_tile_op's view-cell branch. Mirrors the owning-tile path in
init_inner_tile_op_owning_: arena_plan_ uses the `left_range` plan to
shape each result cell from a non-empty left inner cell, and the
per-cell op accumulates `r += l * rr` -- or `r += (l * rr) * factor_`
when scaled -- via fused_hadamard_inplace into the pre-shaped view
cell. No value-returning per-cell op is needed, so this works for view
cells (e.g. ArenaTensor); non-identity inner result permutation is
rejected (the owning fallback that materializes a permuted return cell
cannot run for views).

Previously this case threw "nested non-contraction product on view
inner tiles is not yet supported", aborting expressions such as
`C(i_3,i_4;a<...>) = A(i_3;a<...>) * B(i_4;a<...>)` over ArenaTensor
inner cells -- the typical sub-product inside einsum's generalized
contraction loop for ToTxToT with Hadamard outer-Hadamard inner shapes.
@evaleev evaleev force-pushed the evaleev/fix/arena-tot-fence-and-view-hadamard-outer-contract branch from 31f9447 to 31800a9 Compare May 21, 2026 03:24
@evaleev evaleev changed the title arena ToT: bypass lazy_sync in deferred lazy_deleter + (outer-Contraction, inner-Hadamard) view-cell case arena ToT: einsum sub-World fence guard + (outer-Contraction, inner-Hadamard) view-cell case May 21, 2026
@evaleev evaleev changed the base branch from master to evaleev/feature/lazy-deleter-skip-sync-in-do-cleanup May 21, 2026 03:25
Base automatically changed from evaleev/feature/lazy-deleter-skip-sync-in-do-cleanup to master May 21, 2026 04:16
@evaleev evaleev merged commit d327b82 into master May 21, 2026
9 checks passed
@evaleev evaleev deleted the evaleev/fix/arena-tot-fence-and-view-hadamard-outer-contract branch May 21, 2026 04:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant