Lazy expression merging to reduce memory usage and build time #12

FBumann · 2026-02-01T14:18:34Z

Summary

Introduces LazyLinearExpression, a subclass of LinearExpression that defers xr.concat along the _term dimension. Instead of immediately concatenating datasets with join="outer" (which creates massive dense padding when coordinate spaces differ), expressions are stored as a list of compact per-variable datasets and only materialized when needed.

This targets the core memory bottleneck in large models (e.g. PyPSA) where merge() of many expressions triggers dense outer-join padding across all coordinate dimensions.

Key changes

linopy/expressions.py

LazyLinearExpression class — stores expression parts as list[Dataset], defers materialization:
- flat property iterates parts independently, avoiding dense padding entirely (solver hot-path)
- to_polars() same per-part pattern for polars-based IO
- __add__/__sub__/__neg__ concatenate part lists without materializing
- _compact() merges parts sharing the same coordinate space using cheap join="override"
- rename()/diff() dispatch per-part, only applying to parts that contain the target dimension
- _materialize() falls back to standard dense merge when full Dataset access is needed
merge() function — returns LazyLinearExpression when merging along _term dimension instead of eagerly concatenating
Type checks — changed type(x) is LinearExpression to isinstance checks throughout, so LazyLinearExpression is recognized as a valid LinearExpression

linopy/objective.py

is_linear/is_quadratic properties use isinstance instead of identity checks

dev-scripts/profile_model_memory.py

Reusable benchmarking script with CLI args (--shape, --sparsity, --n-expr, --preset)
Outputs JSON with git branch/SHA, peak RSS, timing for cross-branch comparison
Designed to work with scalene for line-level memory profiling

Benchmark results (5 × 200×200×50, sparsity=0.2)

Scenario	Branch	build	build RSS	flat	flat RSS
Same-coords	this PR	0.05s	673 MB	0.18s	826 MB
Same-coords	master	0.36s	1043 MB	0.62s	1260 MB
Diff-coords	this PR	0.10s	1044 MB	0.18s	1178 MB
Diff-coords	master	0.24s	1037 MB	0.57s	1255 MB

Same-coords (all variables share dimensions — typical PyPSA pattern): 7× faster build, 35% less memory, 3.4× faster flat

Test plan

All 1187 existing tests pass (pytest test/)
Benchmarked against master and feature/defered-merge branches
Test with a real PyPSA model to validate end-to-end memory reduction

🤖 Generated with Claude Code

Two files changed: linopy/expressions.py: - Added LazyLinearExpression class (inherits LinearExpression) that stores a list of un-merged Dataset _parts instead of a single concatenated Dataset - Key overrides: data (lazy materialization), const, flat (iterates parts directly), __add__/__sub__/__neg__ (propagate laziness), all with fallback to parent when not lazy - Modified merge() to return LazyLinearExpression when dim == TERM_DIM and cls is a LinearExpression subclass - Protected lazy expressions from premature materialization in merge's data extraction and override detection linopy/objective.py: - Changed is_linear/is_quadratic from type(x) is LinearExpression identity checks to isinstance checks, so LazyLinearExpression is correctly identified as linear Performance (different-coordinate variables, 5 × 200×200×50): ┌───────────────────────┬─────────┬─────────┬───────────────┐ │ Metric │ Before │ After │ Change │ ├───────────────────────┼─────────┼─────────┼───────────────┤ │ Expression build time │ 0.31s │ 0.09s │ 3.4x faster │ ├───────────────────────┼─────────┼─────────┼───────────────┤ │ flat export time │ 0.57s │ 0.17s │ 3.4x faster │ ├───────────────────────┼─────────┼─────────┼───────────────┤ │ Peak RSS at flat │ 1337 MB │ 1186 MB │ -151 MB (11%) │ └───────────────────────┴─────────┴─────────┴───────────────┘ Same-coordinate variables see no regression — materialization occurs at to_constraint time with the same override join as before. Phase 2 (lazy constraints) would extend savings to that path.

1. _compact() — Groups parts by their coordinate signature and merges same-coord groups using join="override" (no padding). Keeps part count low after many chained additions. 2. to_polars() — Lazy override that converts each part to a polars DataFrame independently and concatenates, same pattern as flat. 3. rename() — Per-part dispatch that only renames dims/vars present in each part, avoiding errors for parts that lack the target dimension. 4. diff() — Per-part dispatch that applies diff only to parts containing the target dimension. 5. sel() and shift() — Fall back to materialized path since their semantics (indexing, fill values) need a consistent coordinate space. Still deferred (Phase 2+): - Lazy constraint propagation (to_constraint currently materializes) - Lazy _sum() for dimension reduction - These would require changes to the Constraint class and solver IO paths

coderabbitai · 2026-02-01T14:18:41Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

🔍 Trigger a full review

📝 Walkthrough

Walkthrough

This PR introduces a new memory profiling utility for Linopy models and extends the expression system with lazy evaluation support. A new LazyLinearExpression class defers data materialization during construction, optimizing memory usage while maintaining compatibility with existing APIs. The objective module's type checking is also refined to use isinstance checks for better polymorphism support.

Changes

Cohort / File(s)	Summary
Memory Profiling Utility `dev-scripts/profile_model_memory.py`	New standalone script for measuring peak RSS memory usage during Linopy model construction. Provides configurable presets for shape, sparsity, and expression counts, with git metadata collection and JSON output.
Lazy Expression Support `linopy/expressions.py`	Introduces LazyLinearExpression class that defers merging along the term dimension by storing per-expression Dataset parts. Enhances merge logic to detect and handle lazy expressions without forced materialization. Refactors operations (add, neg, sub, etc.) to return lazy instances when appropriate. Adjusts type checks from exact equality to isinstance for better polymorphism.
Type Checking Refinement `linopy/objective.py`	Updates is_linear and is_quadratic to use isinstance checks instead of exact type comparisons. Adds ValueError guard in to_matrix to enforce QuadraticExpression constraint before conversion.

Sequence Diagram

sequenceDiagram
    participant User
    participant LazyLinearExpr as LazyLinearExpression
    participant Merge as Merge Logic
    participant Data as Data Materialization
    participant Result as LinearExpression

    User->>LazyLinearExpr: Create with parts
    Note over LazyLinearExpr: Store parts, defer merge
    
    User->>LazyLinearExpr: Perform operations (__add__, __sub__)
    LazyLinearExpr->>LazyLinearExpr: Accumulate new parts
    LazyLinearExpr->>User: Return LazyLinearExpression
    
    User->>LazyLinearExpr: Access .data (first access)
    LazyLinearExpr->>Merge: Trigger merge on TERM_DIM
    Merge->>Merge: Detect lazy expressions
    Merge->>Merge: Flatten parts without dense padding
    Merge->>Data: Materialize to Dataset
    Data->>Result: Construct LinearExpression
    Result->>User: Return materialized expression

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 Hop and gather, parts so lazy,
Don't materialize, keep it hazy!
Merge when needed, data takes flight,
Memory profiled, expressions just right! 🎯

🚥 Pre-merge checks | ❌ 3

❌ Failed checks (2 warnings, 1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The description contains only the empty template with no narrative content explaining the changes, objectives, or rationale for the memory usage reduction feature.	Add a detailed description of what was changed and why, including the LazyLinearExpression feature, performance improvements, and how it reduces memory usage. Ensure all checklist items are addressed.
Docstring Coverage	⚠️ Warning	Docstring coverage is 54.84% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'Feature/memory usage reduction' is vague and generic, using broad terminology that does not clearly communicate the specific technical change implemented.	Replace with a more specific title describing the main change, e.g., 'Introduce LazyLinearExpression for deferred materialization' or 'Add lazy evaluation support for linear expressions'.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/memmory-usage-reduction

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

FBumann · 2026-02-01T14:20:34Z

@coderabbitai review

coderabbitai · 2026-02-01T14:20:41Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@linopy/expressions.py`:
- Around line 2037-2060: The pl.concat call in to_polars (method to_polars on
the expression class handling self._parts) should allow concatenation of
DataFrames with different schemas; change the plain pl.concat(frames) to use
pl.concat(frames, how="diagonal_relaxed") so frames with different
coordinate-variable columns (e.g., presence/absence of "const") are merged
correctly before calling group_terms_polars and check_has_nulls_polars.

🧹 Nitpick comments (1)

dev-scripts/profile_model_memory.py (1)
31-52: Add type hints to the new profiling helpers.
Please annotate helper signatures (this function and the rest of the script) so mypy can validate them.
🔧 Example (apply similarly to the other helpers)
-def get_git_info():
+def get_git_info() -> dict[str, str]:
As per coding guidelines, **/*.py: Use type hints and mypy for type checking in Python files.

coderabbitai · 2026-02-01T14:33:21Z

linopy/expressions.py

+    def to_polars(self) -> pl.DataFrame:
+        """Convert the expression to a polars DataFrame without materializing."""
+        if not self.is_lazy:
+            return super().to_polars()
+
+        frames = []
+        for part in self._parts:
+            df = to_polars(part)
+            df = filter_nulls_polars(df)
+            if len(df):
+                frames.append(df)
+
+        if not frames:
+            return pl.DataFrame(
+                {
+                    "vars": pl.Series([], dtype=pl.Int64),
+                    "coeffs": pl.Series([], dtype=pl.Float64),
+                }
+            )
+
+        df = pl.concat(frames)
+        df = group_terms_polars(df)
+        check_has_nulls_polars(df, name=self.type)
+        return df


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's explore the file structure and understand the context find . -type f -name "expressions.py" | head -5

Repository: FBumann/linopy

Length of output: 81

🏁 Script executed:

# Check the file and the specific method wc -l linopy/expressions.py

Repository: FBumann/linopy

Length of output: 84

🏁 Script executed:

# Let's examine the to_polars method and related code sed -n '2037,2060p' linopy/expressions.py

Repository: FBumann/linopy

Length of output: 800

🏁 Script executed:

# Look at the broader context - what is _parts and how are frames built sed -n '2000,2065p' linopy/expressions.py

Repository: FBumann/linopy

Length of output: 2216

🏁 Script executed:

# Find the to_polars and related helper functions grep -n "def to_polars" linopy/expressions.py | head -10

Repository: FBumann/linopy

Length of output: 259

🏁 Script executed:

# Let's also look for filter_nulls_polars and group_terms_polars grep -n "def filter_nulls_polars\|def group_terms_polars" linopy/expressions.py

Repository: FBumann/linopy

Length of output: 40

🏁 Script executed:

# Search for these helper functions in the entire codebase grep -r "def filter_nulls_polars\|def group_terms_polars\|def check_has_nulls_polars" --include="*.py"

Repository: FBumann/linopy

Length of output: 295

🏁 Script executed:

# Let's look at the imports in expressions.py to understand where these functions come from head -100 linopy/expressions.py | grep -E "^import|^from"

Repository: FBumann/linopy

Length of output: 1107

🏁 Script executed:

# Check what to_polars(part) does - it's imported grep -n "from.*to_polars\|import.*to_polars" linopy/expressions.py

Repository: FBumann/linopy

Length of output: 40

🏁 Script executed:

# Let's examine the entire context around line 2037 and understand _parts sed -n '1980,2100p' linopy/expressions.py

Repository: FBumann/linopy

Length of output: 4369

🏁 Script executed:

# Examine the helper functions in common.py grep -n "def filter_nulls_polars\|def group_terms_polars\|def check_has_nulls_polars" linopy/common.py

Repository: FBumann/linopy

Length of output: 256

🏁 Script executed:

# Get the implementations of these functions sed -n '1,2000p' linopy/common.py | grep -A 20 "def filter_nulls_polars"

Repository: FBumann/linopy

Length of output: 670

🏁 Script executed:

# Let's search for all these functions more carefully rg "def (filter_nulls_polars|group_terms_polars|check_has_nulls_polars)" -A 15 linopy/common.py

Repository: FBumann/linopy

Length of output: 1661

🏁 Script executed:

# Let's see what to_polars function does (the one called on part) grep -n "^def to_polars\|^ def to_polars" linopy/common.py

Repository: FBumann/linopy

Length of output: 120

🏁 Script executed:

# Check the entire common.py structure wc -l linopy/common.py

Repository: FBumann/linopy

Length of output: 79

🏁 Script executed:

# Let's find where to_polars is defined for Datasets grep -B 5 -A 20 "def to_polars" linopy/common.py | head -60

Repository: FBumann/linopy

Length of output: 767

🏁 Script executed:

# Let's understand the broadcast function and what columns are produced grep -n "def broadcast" linopy/common.py

Repository: FBumann/linopy

Length of output: 40

🏁 Script executed:

# Look at how _parts are created and what they contain grep -B 5 -A 10 "_parts" linopy/expressions.py | head -80

Repository: FBumann/linopy

Length of output: 2762

🏁 Script executed:

# Search for tests of to_polars with LazyLinearExpression find . -name "*.py" -type f | xargs grep -l "to_polars" | grep test

Repository: FBumann/linopy

Length of output: 176

🏁 Script executed:

# Let's look at actual tests for to_polars grep -B 5 -A 20 "def test.*to_polars" --include="*.py" -r . | head -100

Repository: FBumann/linopy

Length of output: 6395

🏁 Script executed:

# Check if there are any issues or tests related to concat with different schemas grep -r "pl.concat" --include="*.py" linopy/ -B 3 -A 3

Repository: FBumann/linopy

Length of output: 4025

🏁 Script executed:

# Let's understand what columns the to_polars produces from a dataset sed -n '352,375p' linopy/common.py

Repository: FBumann/linopy

Length of output: 779

🏁 Script executed:

# Let's look more closely at the broadcast function to understand what columns are created grep -n "def broadcast" linopy/ -r --include="*.py"

Repository: FBumann/linopy

Length of output: 40

🏁 Script executed:

# Check if there are any tests for LazyLinearExpression specifically grep -r "LazyLinearExpression" test/ --include="*.py"

Repository: FBumann/linopy

Length of output: 40

🏁 Script executed:

# Let's see if there's a test that calls to_polars on LazyLinearExpression grep -B 10 -A 20 "to_polars" test/test_linear_expression.py | head -80

Repository: FBumann/linopy

Length of output: 2763

🏁 Script executed:

# Look at what variables a Dataset from a LazyLinearExpression part would have grep -B 10 -A 10 "TERM_DIM\|HELPER_DIMS" linopy/expressions.py | head -50

Repository: FBumann/linopy

Length of output: 1715

🏁 Script executed:

# Check the constants to understand what TERM_DIM and HELPER_DIMS are grep -n "TERM_DIM\|HELPER_DIMS" linopy/constants.py

Repository: FBumann/linopy

Length of output: 252

🏁 Script executed:

# Let's understand what broadcast does by looking at its implementation grep -n "def broadcast" linopy/ -r --include="*.py"

Repository: FBumann/linopy

Length of output: 40

🏁 Script executed:

# Check if broadcast is imported from xarray grep -n "from xarray\|import.*broadcast" linopy/common.py | head -20

Repository: FBumann/linopy

Length of output: 316

🏁 Script executed:

# Let's create a test case to see if different parts produce different columns cat > /tmp/test_polars_concat.py << 'EOF' import sys sys.path.insert(0, '/repo') import numpy as np import pandas as pd import xarray as xr import polars as pl from linopy.common import to_polars # Create two datasets with different coordinate dimensions ds1 = xr.Dataset({ 'var': (['t', 'x'], np.random.randn(3, 2)) }, coords={ 't': np.arange(3), 'x': np.arange(2) }) ds2 = xr.Dataset({ 'var': (['t', 'y'], np.random.randn(3, 4)) }, coords={ 't': np.arange(3), 'y': np.arange(4) }) print("Dataset 1:") print(ds1) print("\nDataset 2:") print(ds2) df1 = to_polars(ds1) df2 = to_polars(ds2) print("\nDataFrame 1 columns:", df1.columns) print("DataFrame 1 shape:", df1.shape) print("\nDataFrame 2 columns:", df2.columns) print("DataFrame 2 shape:", df2.shape) print("\nSchemas match:", df1.schema == df2.schema) # Try to concat them try: result = pl.concat([df1, df2]) print("\nConcat succeeded!") print("Result columns:", result.columns) except Exception as e: print(f"\nConcat failed with error: {type(e).__name__}: {e}") EOF python /tmp/test_polars_concat.py

Repository: FBumann/linopy

Length of output: 217

🏁 Script executed:

# Let's also check what happens with the actual LazyLinearExpression scenario cat > /tmp/test_lazy_polars.py << 'EOF' import sys sys.path.insert(0, '/repo') import numpy as np import pandas as pd import xarray as xr from linopy import Model from linopy.expressions import LazyLinearExpression # Create a model m = Model() # Create variables with different coordinate spaces x = m.add_variables(name="x", coords=[("t", [0, 1, 2]), ("space", [0, 1])]) y = m.add_variables(name="y", coords=[("t", [0, 1, 2]), ("feature", [0, 1, 2])]) # Create expressions that would result in parts with different coord spaces expr1 = 2 * x expr2 = 3 * y # Sum over one dimension to create parts with different coords expr1_sum = expr1.sum("space") # dims: [t, _term] expr2_sum = expr2.sum("feature") # dims: [t, _term] # Try to manually create a LazyLinearExpression with parts from different coord spaces from linopy.expressions import LazyLinearExpression if hasattr(expr1_sum, 'data') and hasattr(expr2_sum, 'data'): parts = [expr1_sum.data, expr2_sum.data] lazy_expr = LazyLinearExpression(parts, m, parts=parts) print("LazyLinearExpression created") print("Part 1 dims:", parts[0].dims) print("Part 2 dims:", parts[1].dims) try: df = lazy_expr.to_polars() print("to_polars succeeded!") print("Result columns:", df.columns) except Exception as e: print(f"to_polars failed: {type(e).__name__}: {e}") EOF python /tmp/test_lazy_polars.py

Repository: FBumann/linopy

Length of output: 215

🏁 Script executed:

# Let's check what type of concat strategies are available in this Polars version python << 'EOF' import polars as pl print(f"Polars version: {pl.__version__}") # Check if diagonal concat is available help_str = str(pl.concat.__doc__) if "diagonal" in help_str.lower(): print("Diagonal concat is available") print("---") # Extract the how parameter info if "how" in help_str.lower(): print("'how' parameter found in documentation") else: print("Diagonal concat might not be available") EOF

Repository: FBumann/linopy

Length of output: 176

🏁 Script executed:

# Look at the constraints.py usage of pl.concat with diagonal_relaxed grep -B 10 -A 5 "pl.concat.*diagonal_relaxed" linopy/constraints.py

Repository: FBumann/linopy

Length of output: 912

🏁 Script executed:

# Let's examine the to_polars implementation in LinearExpression (non-lazy) to understand schema sed -n '1446,1480p' linopy/expressions.py

Repository: FBumann/linopy

Length of output: 1366

🏁 Script executed:

# Check what happens in the parent class's to_polars grep -B 5 -A 20 "class LinearExpression" linopy/expressions.py | head -40

Repository: FBumann/linopy

Length of output: 1071

🏁 Script executed:

# Let's look more carefully at the full to_polars method for LinearExpression sed -n '1446,1500p' linopy/expressions.py

Repository: FBumann/linopy

Length of output: 2119

🏁 Script executed:

# Let's check if there's any documentation about what columns to_polars produces grep -B 5 -A 30 "def to_polars.*Dataset" linopy/common.py

Repository: FBumann/linopy

Length of output: 1119

🏁 Script executed:

# Check what broadcast does - it's from xarray python3 << 'EOF' import xarray as xr help(xr.broadcast) EOF

Repository: FBumann/linopy

Length of output: 176

🏁 Script executed:

# Check the Polars version requirement in the project find . -name "requirements*.txt" -o -name "setup.py" -o -name "pyproject.toml" | xargs grep -l "polars" 2>/dev/null

Repository: FBumann/linopy

Length of output: 74

🏁 Script executed:

# Check the pyproject.toml for Polars version cat pyproject.toml | grep -A 5 "polars"

Repository: FBumann/linopy

Length of output: 163

🏁 Script executed:

# Let's look at how Polars concat behavior works by checking if diagonal_relaxed is used anywhere else grep -r "diagonal_relaxed" --include="*.py"

Repository: FBumann/linopy

Length of output: 165

🏁 Script executed:

# Let's trace through a specific scenario - what happens with different coord spaces # First, find how parts are created in LazyLinearExpression grep -B 5 -A 10 "__add__\|__mul__" linopy/expressions.py | grep -A 20 "class LazyLinearExpression" | head -50

Repository: FBumann/linopy

Length of output: 40

🏁 Script executed:

# Look at the structure of a Dataset in a part - check what variables are typically included sed -n '1330,1350p' linopy/expressions.py

Repository: FBumann/linopy

Length of output: 597

🏁 Script executed:

# Search for where LazyLinearExpression gets created with parts grep -n "LazyLinearExpression(" linopy/expressions.py | head -10

Repository: FBumann/linopy

Length of output: 422

🏁 Script executed:

# Let's check what type hints are used in the to_polars method sed -n '2037,2060p' linopy/expressions.py

Repository: FBumann/linopy

Length of output: 800

🏁 Script executed:

# Check if type hints are used in related methods sed -n '1446,1475p' linopy/expressions.py

Repository: FBumann/linopy

Length of output: 1252

🏁 Script executed:

# Check the imports to see if there are type hint issues head -50 linopy/expressions.py | grep -E "from typing|import.*TYPE_CHECKING"

Repository: FBumann/linopy

Length of output: 114

🏁 Script executed:

# Let's examine if there's a test that demonstrates the issue or the fix find . -name "*.py" -type f -path "*/test/*" -exec grep -l "LazyLinearExpression.*to_polars\|to_polars.*lazy" {} \;

Repository: FBumann/linopy

Length of output: 40

🏁 Script executed:

# Let's check if there are any existing tests for to_polars on lazy expressions grep -r "test.*to_polars" test/ --include="*.py" -A 10 | grep -i "lazy\|part"

Repository: FBumann/linopy

Length of output: 40

🏁 Script executed:

# Check the entire LazyLinearExpression class to see if to_polars is the only method handling parts sed -n '1775,2200p' linopy/expressions.py | grep -E "def |to_polars|_parts"

Repository: FBumann/linopy

Length of output: 2450

🏁 Script executed:

# Let's check what variables are in a Dataset that becomes a part # Look at what gets stored in the Dataset for a part sed -n '1980,2050p' linopy/expressions.py | grep -A 30 "def __add__"

Repository: FBumann/linopy

Length of output: 40

🏁 Script executed:

# Let's see what happens when parts are created with addition sed -n '1900,1980p' linopy/expressions.py | grep -B 20 "parts=self._parts + other._parts"

Repository: FBumann/linopy

Length of output: 897

🏁 Script executed:

# Check the actual error that would occur with pl.concat on mismatched schemas # Let's look for any existing issues or PRs mentioning this grep -r "schema\|concat" test/ --include="*.py" | grep -i "lazy\|polars" | head -20

Repository: FBumann/linopy

Length of output: 40

🏁 Script executed:

# Verify if type hints are properly used in the method sed -n '2037,2045p' linopy/expressions.py

Repository: FBumann/linopy

Length of output: 376

🏁 Script executed:

# Check the return type annotation for to_polars grep -B 2 "def to_polars" linopy/expressions.py | grep "->.*DataFrame"

Repository: FBumann/linopy

Length of output: 168

🏁 Script executed:

# Let's check what the actual Dataset structure looks like when broadcast is called # by examining what variables are typically in the data sed -n '352,370p' linopy/common.py

Repository: FBumann/linopy

Length of output: 588

🏁 Script executed:

# Check how const is handled in LazyLinearExpression sed -n '1825,1850p' linopy/expressions.py

Repository: FBumann/linopy

Length of output: 987

🏁 Script executed:

# Look at the __add__ method to see how const_override is managed sed -n '1900,1970p' linopy/expressions.py | grep -A 50 "if isinstance(other, LazyLinearExpression)"

Repository: FBumann/linopy

Length of output: 1601

🏁 Script executed:

# Check if const column is always added to frames in to_polars grep -B 5 -A 30 "def to_polars" linopy/common.py | grep -A 20 "def to_polars(ds"

Repository: FBumann/linopy

Length of output: 666

🏁 Script executed:

# Look at what to_polars returns from the parent class sed -n '1446,1475p' linopy/expressions.py

Repository: FBumann/linopy

Length of output: 1252

🏁 Script executed:

# Check if the issue could occur: does to_polars handle missing const? sed -n '1446,1475p' linopy/expressions.py | head -40

Repository: FBumann/linopy

Length of output: 1252

🏁 Script executed:

# Let's confirm by looking at what variables could be different # Check if parts always have the same set of variables sed -n '1900,1950p' linopy/expressions.py | grep -B 20 "parts=self._parts + other._parts"

Repository: FBumann/linopy

Length of output: 40

🏁 Script executed:

# The key is: when you have two Datasets from different expressions, do they always have the same variables? # Let's see what variables a Dataset from an expression contains grep -B 10 "coeffs\|vars" linopy/expressions.py | head -40

Repository: FBumann/linopy

Length of output: 1233

🏁 Script executed:

# Check if broadcast could produce different columns for different Datasets # Look at the common.py to_polars more carefully cat linopy/common.py | sed -n '352,375p'

Repository: FBumann/linopy

Length of output: 779

🏁 Script executed:

# Now let's verify the actual issue by checking what happens with a simple example # Are parts guaranteed to have the same variables? python3 << 'EOF' # Check the Polars concat documentation and default behavior import polars as pl # Create two DataFrames with different columns df1 = pl.DataFrame({"a": [1, 2], "b": [3, 4]}) df2 = pl.DataFrame({"a": [5, 6], "c": [7, 8]}) print("df1 columns:", df1.columns) print("df2 columns:", df2.columns) # Try to concat them try: result = pl.concat([df1, df2]) print("Default concat succeeded") print("Result columns:", result.columns) except Exception as e: print(f"Default concat failed: {type(e).__name__}: {e}") # Try with diagonal_relaxed try: result = pl.concat([df1, df2], how="diagonal_relaxed") print("\ndiagonal_relaxed concat succeeded") print("Result columns:", result.columns) print(result) except Exception as e: print(f"\ndiagonal_relaxed concat failed: {type(e).__name__}: {e}") EOF

Repository: FBumann/linopy

Length of output: 176

🏁 Script executed:

# Let me check if parts could have different variables by looking at how expressions with constants are handled sed -n '1900,1975p' linopy/expressions.py

Repository: FBumann/linopy

Length of output: 2781

🏁 Script executed:

# Check what FILL_VALUE tells us about expected variables grep -B 3 -A 3 "FILL_VALUE" linopy/expressions.py linopy/common.py

Repository: FBumann/linopy

Length of output: 2381

🏁 Script executed:

# Let's see if there's documentation on what variables a part Dataset contains sed -n '1775,1850p' linopy/expressions.py

Repository: FBumann/linopy

Length of output: 2803

🏁 Script executed:

# Check if the review's concern about different coord spaces matches how columns might differ # Look at the _compact method which groups by coord space sed -n '1980,2030p' linopy/expressions.py

Repository: FBumann/linopy

Length of output: 1809

🏁 Script executed:

# Final check - what happens when to_polars is called on Datasets with different coords? # Look at examples of how parts are used elsewhere grep -A 10 "for part in self._parts" linopy/expressions.py | head -50

Repository: FBumann/linopy

Length of output: 758

Use how="diagonal_relaxed" for pl.concat() to handle frames with different columns from different coord spaces.

When parts from different coordinate dimensions are converted to polars DataFrames, they may have different variable columns (e.g., some include const, others don't). pl.concat(frames) requires matching schemas; use pl.concat(frames, how="diagonal_relaxed") instead, which is already available in this codebase (see constraints.py line 1360).

✅ Suggested fix

- df = pl.concat(frames) + df = pl.concat(frames, how="diagonal_relaxed")

🤖 Prompt for AI Agents

In `@linopy/expressions.py` around lines 2037 - 2060, The pl.concat call in to_polars (method to_polars on the expression class handling self._parts) should allow concatenation of DataFrames with different schemas; change the plain pl.concat(frames) to use pl.concat(frames, how="diagonal_relaxed") so frames with different coordinate-variable columns (e.g., presence/absence of "const") are merged correctly before calling group_terms_polars and check_has_nulls_polars.

- In merge(), changed from xr.concat(const_arrays, join="outer").sum() to filtering out zero-valued const arrays before aligning. This eliminates the 6GB spike for the common case where consts are zero (which they are in story2's 2*x + 3*y). Current story2 results: ┌─────────────────────┬───────────┬─────────────┐ │ Step │ Master │ This branch │ ├─────────────────────┼───────────┼─────────────┤ │ after add_variables │ 13 MB │ 14 MB │ ├─────────────────────┼───────────┼─────────────┤ │ after 2*x + 3*y │ 12,013 MB │ 14 MB │ ├─────────────────────┼───────────┼─────────────┤ │ after .flat │ 18,973 MB │ 36 MB │ ├─────────────────────┼───────────┼─────────────┤ │ after total <= 1 │ 18,973 MB │ 5,777 MB │ └─────────────────────┴───────────┴─────────────┘ Build and flat are effectively solved — 858× and 527× reduction. The constraint path (<= 1) still materializes at 5.8 GB because to_constraint calls (self - rhs).data which triggers _materialize() → xr.concat(parts, join="outer").

┌───────────────────────┬───────────┬─────────────┬───────────┐ │ Step │ Master │ This branch │ Reduction │ ├───────────────────────┼───────────┼─────────────┼───────────┤ │ after 2*x + 3*y │ 12,013 MB │ 14 MB │ 858× │ ├───────────────────────┼───────────┼─────────────┼───────────┤ │ after .flat │ 18,973 MB │ 36 MB │ 527× │ ├───────────────────────┼───────────┼─────────────┼───────────┤ │ after total <= 1 │ 18,973 MB │ 14 MB │ 1,355× │ ├───────────────────────┼───────────┼─────────────┼───────────┤ │ after add_constraints │ — │ 15 MB │ — │ ├───────────────────────┼───────────┼─────────────┼───────────┤ │ after con.flat │ — │ 88 MB │ 215× │ └───────────────────────┴───────────┴─────────────┴───────────┘ Partially overlapping dims (shared time) ┌──────────────────┬───────────┬─────────────┬──────────────────────────────┐ │ Step │ Master │ This branch │ Reduction │ ├──────────────────┼───────────┼─────────────┼──────────────────────────────┤ │ after 2*x + 3*y │ 12,013 MB │ 14 MB │ 858× │ ├──────────────────┼───────────┼─────────────┼──────────────────────────────┤ │ after .flat │ 18,973 MB │ 36 MB │ 527× │ ├──────────────────┼───────────┼─────────────┼──────────────────────────────┤ │ after total <= 1 │ 18,973 MB │ 5,773 MB │ Materialization (shared dim) │ └──────────────────┴───────────┴─────────────┴──────────────────────────────┘ Same-coords (no regression) Build=0.05s/543MB, flat=0.41s/779MB — unchanged from before. What was implemented 1. LazyLinearExpression.to_constraint() — builds per-part constraint data when dims are fully disjoint; falls back to materialization when any dims overlap 2. Constraint._lazy_parts — stores per-part labeled constraint datasets, lazy .data materialization 3. Constraint.flat / to_polars — per-part iteration avoiding Cartesian product 4. add_constraints() in model.py — per-part label assignment and infinity check 5. Per-part sanitization — sanitize_zeros, sanitize_missings, sanitize_infinities operate on parts directly

Changes 1. Added import math at the top of the file. 2. New function _parts_are_coord_disjoint(parts) — checks if parts have non-overlapping coordinate values (not just dimension names). Requires all parts to have at least one non-helper dimension (scalars are excluded since they need broadcasting). 3. New function _try_redistribute(parts) — when one "broad" part spans the full coordinate space and the remaining "narrow" parts are pairwise coord-disjoint and perfectly tile the broad part, slices the broad part per narrow part's coordinates and concatenates along _term. Returns enriched parts that are coordinate-disjoint, or None if conditions aren't met. 4. Updated LinearExpression.to_constraint() — when rhs is a LazyLinearExpression, delegates to (self - rhs).to_constraint(sign, 0) so the lazy paths get a chance to fire. 5. Updated LazyLinearExpression.to_constraint() — added two new paths between the existing dim-disjoint check and the materialization fallback: - Coord-disjoint path: if parts already have non-overlapping coordinates, builds lazy constraints with per-part RHS slicing. - Redistribute path: if _try_redistribute succeeds, builds lazy constraints from the enriched parts. --- Regarding your PR #12 analysis — this implementation directly addresses concern #1 ("to_constraint only works for disjoint parts"). The new coord-disjoint and redistribute paths handle the case where parts share dimension names but cover different coordinate subsets, which is the common PyPSA pattern.

RobbieKiwi and others added 4 commits February 1, 2026 14:01

Add benchmark_sparsity_memory.py

434f80c

Update benchmarking script

a28dbce

FBumann mentioned this pull request Feb 1, 2026

Feature/lazy-linear-expession PyPSA/linopy#565

Closed

4 tasks

FBumann changed the title ~~Feature/memory usage reduction~~ Lazy expression merging to reduce memory usage and build time Feb 1, 2026

coderabbitai bot reviewed Feb 1, 2026

View reviewed changes

This was referenced Feb 1, 2026

Memory blowup from expression alignment and .sum() on masked variables #14

Open

merge() creates dense Cartesian product for disjoint dimensions #15

Open

_sum() stacks dead terms from masked variables #16

Open

FBumann added 6 commits February 1, 2026 15:57

fix: docstes

66d1978

fix: mypy

6b1d0ce

extract helpers

46cac72

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lazy expression merging to reduce memory usage and build time #12

Lazy expression merging to reduce memory usage and build time #12

Uh oh!

FBumann commented Feb 1, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Feb 1, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

FBumann commented Feb 1, 2026

Uh oh!

coderabbitai bot commented Feb 1, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Lazy expression merging to reduce memory usage and build time #12

Are you sure you want to change the base?

Lazy expression merging to reduce memory usage and build time #12

Uh oh!

Conversation

FBumann commented Feb 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key changes

Benchmark results (5 × 200×200×50, sparsity=0.2)

Test plan

Uh oh!

coderabbitai bot commented Feb 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

FBumann commented Feb 1, 2026

Uh oh!

coderabbitai bot commented Feb 1, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

FBumann commented Feb 1, 2026 •

edited

Loading

coderabbitai bot commented Feb 1, 2026 •

edited

Loading