Note: This issue was drafted by an AI agent (Claude Code) during a profiling/investigation session with @FBumann. The measurements (memray C-level profiling, benchmarks, the 284× / 132 MB figures) and the git-history archaeology are real and reproducible, but the framing and proposed directions are a starting point for discussion — please sanity-check before acting on them.
LinearExpression.__matmul__ is (self * other).sum(dim=common_dims) (expressions.py). When other is a sparse matrix, the result still gets one _term per element of the contracted dimension — almost all of them zero-coefficient.
Where it bites
PyPSA's Kirchhoff Voltage Law constraint (pypsa/optimization/constraints.py, define_kirchhoff_voltage_constraints):
flow = m[f"{c}-s"].sel(...) # (snapshot, branch), nterm = 1
exprs.append(flow @ C_branch * 1e5) # C_branch: (branch, cycle), sparse
The cycle matrix C is sparse — each cycle contains only a handful of branches (~3), each branch sits in 1–2 cycles. But flow @ C_branch produces an expression with _term = n_branches, because the sum over branch stacks every branch into _term regardless of whether C[b,c] == 0.
Measured (852 branches, 268 cycles, 24 snapshots, ~3 branches/cycle)
|
_term |
result cells |
current flow @ C_branch |
852 |
5,480,064 |
same constraint via sparse entries + groupby(cycle).sum() |
3 |
19,296 |
284× fewer cells. Memray attributes the single largest allocation in a full SciGRID-DE create_model() (132 MB of a 351 MB peak) to the concat/merge of these densified KVL expressions — see companion issue on merge.
Reframe (proof it is inherently sparse)
A branch belongs to multiple cycles, so it is not a clean groupby over branches. But exploding C to its nonzero (branch, cycle, value) entries makes each entry belong to exactly one cycle — a clean partition — so the constraint is expressible as a sparse-entry groupby(cycle).sum() with nterm = max(branches per cycle). The prototype above does exactly this and is bit-equivalent.
Possible directions
Sibling of #745 (groupby padding) — same root cause, different entry point. Part of the dense-_term memory story.
LinearExpression.__matmul__is(self * other).sum(dim=common_dims)(expressions.py). Whenotheris a sparse matrix, the result still gets one_termper element of the contracted dimension — almost all of them zero-coefficient.Where it bites
PyPSA's Kirchhoff Voltage Law constraint (
pypsa/optimization/constraints.py,define_kirchhoff_voltage_constraints):The cycle matrix
Cis sparse — each cycle contains only a handful of branches (~3), each branch sits in 1–2 cycles. Butflow @ C_branchproduces an expression with_term = n_branches, because the sum overbranchstacks every branch into_termregardless of whetherC[b,c] == 0.Measured (852 branches, 268 cycles, 24 snapshots, ~3 branches/cycle)
_termflow @ C_branchgroupby(cycle).sum()284× fewer cells. Memray attributes the single largest allocation in a full SciGRID-DE
create_model()(132 MB of a 351 MB peak) to theconcat/mergeof these densified KVL expressions — see companion issue onmerge.Reframe (proof it is inherently sparse)
A branch belongs to multiple cycles, so it is not a clean
groupbyover branches. But explodingCto its nonzero(branch, cycle, value)entries makes each entry belong to exactly one cycle — a clean partition — so the constraint is expressible as a sparse-entrygroupby(cycle).sum()withnterm = max(branches per cycle). The prototype above does exactly this and is bit-equivalent.Possible directions
dot/@: when the constant operand has many zeros, drop zero-coefficient terms instead of materializing them (cf.densify_terms, but never building them in the first place)._termkernel (Umbrella: long-format / sparse_termkernel (dense-_termmemory cluster) #756), which subsumes this.Sibling of #745 (groupby padding) — same root cause, different entry point. Part of the dense-
_termmemory story.