Skip to content

einsum: ToT×ToT→T denest via phantom-unit dot (avoid C0 blowup)#558

Merged
evaleev merged 1 commit into
masterfrom
evaleev/feature/denest-fused-contraction
May 26, 2026
Merged

einsum: ToT×ToT→T denest via phantom-unit dot (avoid C0 blowup)#558
evaleev merged 1 commit into
masterfrom
evaleev/feature/denest-fused-contraction

Conversation

@evaleev
Copy link
Copy Markdown
Member

@evaleev evaleev commented May 26, 2026

Problem

A denested ToT × ToT contraction (inner indices fully contracted, plain-T result) is evaluated by einsum<DeNest::True> as expand-then-reduce: it forms the full uncontracted product C0 (external × contracted-outer × inner) before reducing it. When a contracted outer index is large (e.g. a DF/RI index), C0 is enormous.

Concretely, in a CSV-CC PNO-CCSD term for C8H18 — I(i₃,i₁,i₂,Κ;a₄) * I(i₂,i₁,i₄,Κ;a₄) → I(i₄,i₃,i₂,i₁)C0 reaches ~20 GB (256 s, 26.6 GB process peak) to produce a 412 KB result.

Fix

Reformulate as a single contraction whose inner product is a Frobenius dot. The inner reduction is expressed with a phantom unit-extent result mode (a reserved label prefix, ⊗₁; detail::is_phantom_unit_label) so the result inner cell is a genuine order-≥1 tensor (TA has no order-0), and the dot reads the operand inner cells flat:

  • no operand carries the phantom mode (no reshape, arena-safe),
  • no inner-GEMM rank match, no order-0 tensor,
  • C0 is never built,
  • correct even when an inner extent depends on a contracted-outer index (each cell dots its own range).

Changes

  • util/annotation.h — phantom-unit label prefix + is_phantom_unit_label.
  • einsum/tiledarray.hDeNest::True builds C(c;⊗₁) = A(…) * B(…;…,⊗₁) then unwraps the unit-extent inner cells to scalars.
  • tensor/arena_einsum.hRegimeAInnerKind::phantom_dot (regime-A !e path), ArenaInnerShapeKind::unit_range, and arena_hadamard_phantom_dot (view-cell outer-Hadamard).
  • expressions/cont_engine.h — inner phantom-dot op in both the owning and view-cell (ArenaTensor) inner-op paths, for both outer-Contraction and outer-Hadamard regimes.
  • tests/einsum.cpp — adds the external-index (e-present) denest case alongside the existing pure-Hadamard and contracted-outer cases.

Validation

  • einsum_manual (incl. the new external-index denest) passes; full ta_test (np=1) clean.
  • MPQC c4h10 PNO-CCSD converges as before.
  • MPQC C8H18 PNO-CCSD: the motivating term drops 256 s / 26.6 GB → 0.25 s / 2.9 GB, and the run (previously incomplete) now finishes all iterations.

🤖 Generated with Claude Code

…educe

A denested ToT x ToT contraction (inner indices fully contracted, plain-T
result) was evaluated by einsum<DeNest::True> as expand-then-reduce: it formed
the full uncontracted product C0 (external x contracted-outer x inner) before
reducing. With a large contracted-outer index (e.g. a DF/RI index) this
materializes an enormous intermediate -- ~20 GB for one CSV-CC term in C8H18
PNO-CCSD (256 s, 26.6 GB peak) for a 412 KB result.

Reformulate as a single contraction whose inner product is a Frobenius dot.
The inner reduction is expressed with a phantom unit-extent result mode
(reserved label prefix U+2297, is_phantom_unit_label) so the result inner cell
is a genuine order->=1 tensor (TA has no order-0), and the dot reads operand
cells flat: no operand carries the phantom mode, no inner GEMM rank match, no
order-0, and C0 is never built. Correct even when an inner extent depends on a
contracted-outer index.

- util/annotation.h: phantom-unit label prefix + is_phantom_unit_label.
- einsum/tiledarray.h: DeNest::True builds C(c;U) = A(..) * B(..;..,U) then
  unwraps the unit-extent inner cells to scalars.
- tensor/arena_einsum.h: RegimeAInnerKind::phantom_dot, ArenaInnerShapeKind::
  unit_range, and arena_hadamard_phantom_dot (view cells).
- expressions/cont_engine.h: inner phantom-dot op in the owning and view-cell
  inner-op paths, for both outer-Contraction and outer-Hadamard regimes.
- tests/einsum.cpp: external-index (e-present) denest case.

C8H18 PNO-CCSD: the motivating term drops 256 s/26.6 GB -> 0.25 s/2.9 GB and
the run now completes; c4h10 converges as before.
@evaleev evaleev merged commit e22e652 into master May 26, 2026
9 checks passed
@evaleev evaleev deleted the evaleev/feature/denest-fused-contraction branch May 26, 2026 18:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant