einsum: ToT×ToT→T denest via phantom-unit dot (avoid C0 blowup)#558
Merged
Conversation
…educe A denested ToT x ToT contraction (inner indices fully contracted, plain-T result) was evaluated by einsum<DeNest::True> as expand-then-reduce: it formed the full uncontracted product C0 (external x contracted-outer x inner) before reducing. With a large contracted-outer index (e.g. a DF/RI index) this materializes an enormous intermediate -- ~20 GB for one CSV-CC term in C8H18 PNO-CCSD (256 s, 26.6 GB peak) for a 412 KB result. Reformulate as a single contraction whose inner product is a Frobenius dot. The inner reduction is expressed with a phantom unit-extent result mode (reserved label prefix U+2297, is_phantom_unit_label) so the result inner cell is a genuine order->=1 tensor (TA has no order-0), and the dot reads operand cells flat: no operand carries the phantom mode, no inner GEMM rank match, no order-0, and C0 is never built. Correct even when an inner extent depends on a contracted-outer index. - util/annotation.h: phantom-unit label prefix + is_phantom_unit_label. - einsum/tiledarray.h: DeNest::True builds C(c;U) = A(..) * B(..;..,U) then unwraps the unit-extent inner cells to scalars. - tensor/arena_einsum.h: RegimeAInnerKind::phantom_dot, ArenaInnerShapeKind:: unit_range, and arena_hadamard_phantom_dot (view cells). - expressions/cont_engine.h: inner phantom-dot op in the owning and view-cell inner-op paths, for both outer-Contraction and outer-Hadamard regimes. - tests/einsum.cpp: external-index (e-present) denest case. C8H18 PNO-CCSD: the motivating term drops 256 s/26.6 GB -> 0.25 s/2.9 GB and the run now completes; c4h10 converges as before.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
A denested
ToT × ToTcontraction (inner indices fully contracted, plain-T result) is evaluated byeinsum<DeNest::True>as expand-then-reduce: it forms the full uncontracted productC0(external × contracted-outer × inner) before reducing it. When a contracted outer index is large (e.g. a DF/RI index),C0is enormous.Concretely, in a CSV-CC PNO-CCSD term for C8H18 —
I(i₃,i₁,i₂,Κ;a₄) * I(i₂,i₁,i₄,Κ;a₄) → I(i₄,i₃,i₂,i₁)—C0reaches ~20 GB (256 s, 26.6 GB process peak) to produce a 412 KB result.Fix
Reformulate as a single contraction whose inner product is a Frobenius dot. The inner reduction is expressed with a phantom unit-extent result mode (a reserved label prefix,
⊗₁;detail::is_phantom_unit_label) so the result inner cell is a genuine order-≥1 tensor (TA has no order-0), and the dot reads the operand inner cells flat:C0is never built,Changes
util/annotation.h— phantom-unit label prefix +is_phantom_unit_label.einsum/tiledarray.h—DeNest::TruebuildsC(c;⊗₁) = A(…) * B(…;…,⊗₁)then unwraps the unit-extent inner cells to scalars.tensor/arena_einsum.h—RegimeAInnerKind::phantom_dot(regime-A!epath),ArenaInnerShapeKind::unit_range, andarena_hadamard_phantom_dot(view-cell outer-Hadamard).expressions/cont_engine.h— inner phantom-dot op in both the owning and view-cell (ArenaTensor) inner-op paths, for both outer-Contraction and outer-Hadamard regimes.tests/einsum.cpp— adds the external-index (e-present) denest case alongside the existing pure-Hadamard and contracted-outer cases.Validation
einsum_manual(incl. the new external-index denest) passes; fullta_test(np=1) clean.🤖 Generated with Claude Code