Skip to content

Conversation

@FBumann
Copy link
Owner

@FBumann FBumann commented Feb 1, 2026

Summary

Introduces LazyLinearExpression, a subclass of LinearExpression that defers xr.concat along the _term dimension. Instead of immediately concatenating datasets with join="outer" (which creates massive dense padding when coordinate spaces differ), expressions are stored as a list of compact per-variable datasets and only materialized when needed.

This targets the core memory bottleneck in large models (e.g. PyPSA) where merge() of many expressions triggers dense outer-join padding across all coordinate dimensions.

Key changes

linopy/expressions.py

  • LazyLinearExpression class — stores expression parts as list[Dataset], defers materialization:
    • flat property iterates parts independently, avoiding dense padding entirely (solver hot-path)
    • to_polars() same per-part pattern for polars-based IO
    • __add__/__sub__/__neg__ concatenate part lists without materializing
    • _compact() merges parts sharing the same coordinate space using cheap join="override"
    • rename()/diff() dispatch per-part, only applying to parts that contain the target dimension
    • _materialize() falls back to standard dense merge when full Dataset access is needed
  • merge() function — returns LazyLinearExpression when merging along _term dimension instead of eagerly concatenating
  • Type checks — changed type(x) is LinearExpression to isinstance checks throughout, so LazyLinearExpression is recognized as a valid LinearExpression

linopy/objective.py

  • is_linear/is_quadratic properties use isinstance instead of identity checks

dev-scripts/profile_model_memory.py

  • Reusable benchmarking script with CLI args (--shape, --sparsity, --n-expr, --preset)
  • Outputs JSON with git branch/SHA, peak RSS, timing for cross-branch comparison
  • Designed to work with scalene for line-level memory profiling

Benchmark results (5 × 200×200×50, sparsity=0.2)

Scenario Branch build build RSS flat flat RSS
Same-coords this PR 0.05s 673 MB 0.18s 826 MB
Same-coords master 0.36s 1043 MB 0.62s 1260 MB
Diff-coords this PR 0.10s 1044 MB 0.18s 1178 MB
Diff-coords master 0.24s 1037 MB 0.57s 1255 MB

Same-coords (all variables share dimensions — typical PyPSA pattern): 7× faster build, 35% less memory, 3.4× faster flat

Test plan

  • All 1187 existing tests pass (pytest test/)
  • Benchmarked against master and feature/defered-merge branches
  • Test with a real PyPSA model to validate end-to-end memory reduction

🤖 Generated with Claude Code

RobbieKiwi and others added 4 commits February 1, 2026 14:01
  Two files changed:

  linopy/expressions.py:
  - Added LazyLinearExpression class (inherits LinearExpression) that stores a list of un-merged Dataset _parts instead of a single concatenated Dataset
  - Key overrides: data (lazy materialization), const, flat (iterates parts directly), __add__/__sub__/__neg__ (propagate laziness), all with fallback to parent when not lazy
  - Modified merge() to return LazyLinearExpression when dim == TERM_DIM and cls is a LinearExpression subclass
  - Protected lazy expressions from premature materialization in merge's data extraction and override detection

  linopy/objective.py:
  - Changed is_linear/is_quadratic from type(x) is LinearExpression identity checks to isinstance checks, so LazyLinearExpression is correctly identified as linear

  Performance (different-coordinate variables, 5 × 200×200×50):
  ┌───────────────────────┬─────────┬─────────┬───────────────┐
  │        Metric         │ Before  │  After  │    Change     │
  ├───────────────────────┼─────────┼─────────┼───────────────┤
  │ Expression build time │ 0.31s   │ 0.09s   │ 3.4x faster   │
  ├───────────────────────┼─────────┼─────────┼───────────────┤
  │ flat export time      │ 0.57s   │ 0.17s   │ 3.4x faster   │
  ├───────────────────────┼─────────┼─────────┼───────────────┤
  │ Peak RSS at flat      │ 1337 MB │ 1186 MB │ -151 MB (11%) │
  └───────────────────────┴─────────┴─────────┴───────────────┘
  Same-coordinate variables see no regression — materialization occurs at to_constraint time with the same override join as before. Phase 2 (lazy constraints) would extend
  savings to that path.
  1. _compact() — Groups parts by their coordinate signature and merges same-coord groups using join="override" (no padding). Keeps part count low after many chained additions.
  2. to_polars() — Lazy override that converts each part to a polars DataFrame independently and concatenates, same pattern as flat.
  3. rename() — Per-part dispatch that only renames dims/vars present in each part, avoiding errors for parts that lack the target dimension.
  4. diff() — Per-part dispatch that applies diff only to parts containing the target dimension.
  5. sel() and shift() — Fall back to materialized path since their semantics (indexing, fill values) need a consistent coordinate space.

  Still deferred (Phase 2+):
  - Lazy constraint propagation (to_constraint currently materializes)
  - Lazy _sum() for dimension reduction
  - These would require changes to the Constraint class and solver IO paths
@coderabbitai
Copy link

coderabbitai bot commented Feb 1, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

  • 🔍 Trigger a full review
📝 Walkthrough

Walkthrough

This PR introduces a new memory profiling utility for Linopy models and extends the expression system with lazy evaluation support. A new LazyLinearExpression class defers data materialization during construction, optimizing memory usage while maintaining compatibility with existing APIs. The objective module's type checking is also refined to use isinstance checks for better polymorphism support.

Changes

Cohort / File(s) Summary
Memory Profiling Utility
dev-scripts/profile_model_memory.py
New standalone script for measuring peak RSS memory usage during Linopy model construction. Provides configurable presets for shape, sparsity, and expression counts, with git metadata collection and JSON output.
Lazy Expression Support
linopy/expressions.py
Introduces LazyLinearExpression class that defers merging along the term dimension by storing per-expression Dataset parts. Enhances merge logic to detect and handle lazy expressions without forced materialization. Refactors operations (add, neg, sub, etc.) to return lazy instances when appropriate. Adjusts type checks from exact equality to isinstance for better polymorphism.
Type Checking Refinement
linopy/objective.py
Updates is_linear and is_quadratic to use isinstance checks instead of exact type comparisons. Adds ValueError guard in to_matrix to enforce QuadraticExpression constraint before conversion.

Sequence Diagram

sequenceDiagram
    participant User
    participant LazyLinearExpr as LazyLinearExpression
    participant Merge as Merge Logic
    participant Data as Data Materialization
    participant Result as LinearExpression

    User->>LazyLinearExpr: Create with parts
    Note over LazyLinearExpr: Store parts, defer merge
    
    User->>LazyLinearExpr: Perform operations (__add__, __sub__)
    LazyLinearExpr->>LazyLinearExpr: Accumulate new parts
    LazyLinearExpr->>User: Return LazyLinearExpression
    
    User->>LazyLinearExpr: Access .data (first access)
    LazyLinearExpr->>Merge: Trigger merge on TERM_DIM
    Merge->>Merge: Detect lazy expressions
    Merge->>Merge: Flatten parts without dense padding
    Merge->>Data: Materialize to Dataset
    Data->>Result: Construct LinearExpression
    Result->>User: Return materialized expression
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 Hop and gather, parts so lazy,
Don't materialize, keep it hazy!
Merge when needed, data takes flight,
Memory profiled, expressions just right! 🎯

🚥 Pre-merge checks | ❌ 3
❌ Failed checks (2 warnings, 1 inconclusive)
Check name Status Explanation Resolution
Description check ⚠️ Warning The description contains only the empty template with no narrative content explaining the changes, objectives, or rationale for the memory usage reduction feature. Add a detailed description of what was changed and why, including the LazyLinearExpression feature, performance improvements, and how it reduces memory usage. Ensure all checklist items are addressed.
Docstring Coverage ⚠️ Warning Docstring coverage is 54.84% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Feature/memory usage reduction' is vague and generic, using broad terminology that does not clearly communicate the specific technical change implemented. Replace with a more specific title describing the main change, e.g., 'Introduce LazyLinearExpression for deferred materialization' or 'Add lazy evaluation support for linear expressions'.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/memmory-usage-reduction

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@FBumann
Copy link
Owner Author

FBumann commented Feb 1, 2026

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Feb 1, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@FBumann FBumann changed the title Feature/memory usage reduction Lazy expression merging to reduce memory usage and build time Feb 1, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@linopy/expressions.py`:
- Around line 2037-2060: The pl.concat call in to_polars (method to_polars on
the expression class handling self._parts) should allow concatenation of
DataFrames with different schemas; change the plain pl.concat(frames) to use
pl.concat(frames, how="diagonal_relaxed") so frames with different
coordinate-variable columns (e.g., presence/absence of "const") are merged
correctly before calling group_terms_polars and check_has_nulls_polars.
🧹 Nitpick comments (1)
dev-scripts/profile_model_memory.py (1)

31-52: Add type hints to the new profiling helpers.
Please annotate helper signatures (this function and the rest of the script) so mypy can validate them.

🔧 Example (apply similarly to the other helpers)
-def get_git_info():
+def get_git_info() -> dict[str, str]:

As per coding guidelines, **/*.py: Use type hints and mypy for type checking in Python files.

Comment on lines +2037 to +2060
def to_polars(self) -> pl.DataFrame:
"""Convert the expression to a polars DataFrame without materializing."""
if not self.is_lazy:
return super().to_polars()

frames = []
for part in self._parts:
df = to_polars(part)
df = filter_nulls_polars(df)
if len(df):
frames.append(df)

if not frames:
return pl.DataFrame(
{
"vars": pl.Series([], dtype=pl.Int64),
"coeffs": pl.Series([], dtype=pl.Float64),
}
)

df = pl.concat(frames)
df = group_terms_polars(df)
check_has_nulls_polars(df, name=self.type)
return df
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's explore the file structure and understand the context
find . -type f -name "expressions.py" | head -5

Repository: FBumann/linopy

Length of output: 81


🏁 Script executed:

# Check the file and the specific method
wc -l linopy/expressions.py

Repository: FBumann/linopy

Length of output: 84


🏁 Script executed:

# Let's examine the to_polars method and related code
sed -n '2037,2060p' linopy/expressions.py

Repository: FBumann/linopy

Length of output: 800


🏁 Script executed:

# Look at the broader context - what is _parts and how are frames built
sed -n '2000,2065p' linopy/expressions.py

Repository: FBumann/linopy

Length of output: 2216


🏁 Script executed:

# Find the to_polars and related helper functions
grep -n "def to_polars" linopy/expressions.py | head -10

Repository: FBumann/linopy

Length of output: 259


🏁 Script executed:

# Let's also look for filter_nulls_polars and group_terms_polars
grep -n "def filter_nulls_polars\|def group_terms_polars" linopy/expressions.py

Repository: FBumann/linopy

Length of output: 40


🏁 Script executed:

# Search for these helper functions in the entire codebase
grep -r "def filter_nulls_polars\|def group_terms_polars\|def check_has_nulls_polars" --include="*.py"

Repository: FBumann/linopy

Length of output: 295


🏁 Script executed:

# Let's look at the imports in expressions.py to understand where these functions come from
head -100 linopy/expressions.py | grep -E "^import|^from"

Repository: FBumann/linopy

Length of output: 1107


🏁 Script executed:

# Check what to_polars(part) does - it's imported
grep -n "from.*to_polars\|import.*to_polars" linopy/expressions.py

Repository: FBumann/linopy

Length of output: 40


🏁 Script executed:

# Let's examine the entire context around line 2037 and understand _parts
sed -n '1980,2100p' linopy/expressions.py

Repository: FBumann/linopy

Length of output: 4369


🏁 Script executed:

# Examine the helper functions in common.py
grep -n "def filter_nulls_polars\|def group_terms_polars\|def check_has_nulls_polars" linopy/common.py

Repository: FBumann/linopy

Length of output: 256


🏁 Script executed:

# Get the implementations of these functions
sed -n '1,2000p' linopy/common.py | grep -A 20 "def filter_nulls_polars"

Repository: FBumann/linopy

Length of output: 670


🏁 Script executed:

# Let's search for all these functions more carefully
rg "def (filter_nulls_polars|group_terms_polars|check_has_nulls_polars)" -A 15 linopy/common.py

Repository: FBumann/linopy

Length of output: 1661


🏁 Script executed:

# Let's see what to_polars function does (the one called on part)
grep -n "^def to_polars\|^    def to_polars" linopy/common.py

Repository: FBumann/linopy

Length of output: 120


🏁 Script executed:

# Check the entire common.py structure
wc -l linopy/common.py

Repository: FBumann/linopy

Length of output: 79


🏁 Script executed:

# Let's find where to_polars is defined for Datasets
grep -B 5 -A 20 "def to_polars" linopy/common.py | head -60

Repository: FBumann/linopy

Length of output: 767


🏁 Script executed:

# Let's understand the broadcast function and what columns are produced
grep -n "def broadcast" linopy/common.py

Repository: FBumann/linopy

Length of output: 40


🏁 Script executed:

# Look at how _parts are created and what they contain
grep -B 5 -A 10 "_parts" linopy/expressions.py | head -80

Repository: FBumann/linopy

Length of output: 2762


🏁 Script executed:

# Search for tests of to_polars with LazyLinearExpression
find . -name "*.py" -type f | xargs grep -l "to_polars" | grep test

Repository: FBumann/linopy

Length of output: 176


🏁 Script executed:

# Let's look at actual tests for to_polars
grep -B 5 -A 20 "def test.*to_polars" --include="*.py" -r . | head -100

Repository: FBumann/linopy

Length of output: 6395


🏁 Script executed:

# Check if there are any issues or tests related to concat with different schemas
grep -r "pl.concat" --include="*.py" linopy/ -B 3 -A 3

Repository: FBumann/linopy

Length of output: 4025


🏁 Script executed:

# Let's understand what columns the to_polars produces from a dataset
sed -n '352,375p' linopy/common.py

Repository: FBumann/linopy

Length of output: 779


🏁 Script executed:

# Let's look more closely at the broadcast function to understand what columns are created
grep -n "def broadcast" linopy/ -r --include="*.py"

Repository: FBumann/linopy

Length of output: 40


🏁 Script executed:

# Check if there are any tests for LazyLinearExpression specifically
grep -r "LazyLinearExpression" test/ --include="*.py"

Repository: FBumann/linopy

Length of output: 40


🏁 Script executed:

# Let's see if there's a test that calls to_polars on LazyLinearExpression
grep -B 10 -A 20 "to_polars" test/test_linear_expression.py | head -80

Repository: FBumann/linopy

Length of output: 2763


🏁 Script executed:

# Look at what variables a Dataset from a LazyLinearExpression part would have
grep -B 10 -A 10 "TERM_DIM\|HELPER_DIMS" linopy/expressions.py | head -50

Repository: FBumann/linopy

Length of output: 1715


🏁 Script executed:

# Check the constants to understand what TERM_DIM and HELPER_DIMS are
grep -n "TERM_DIM\|HELPER_DIMS" linopy/constants.py

Repository: FBumann/linopy

Length of output: 252


🏁 Script executed:

# Let's understand what broadcast does by looking at its implementation
grep -n "def broadcast" linopy/ -r --include="*.py"

Repository: FBumann/linopy

Length of output: 40


🏁 Script executed:

# Check if broadcast is imported from xarray
grep -n "from xarray\|import.*broadcast" linopy/common.py | head -20

Repository: FBumann/linopy

Length of output: 316


🏁 Script executed:

# Let's create a test case to see if different parts produce different columns
cat > /tmp/test_polars_concat.py << 'EOF'
import sys
sys.path.insert(0, '/repo')

import numpy as np
import pandas as pd
import xarray as xr
import polars as pl
from linopy.common import to_polars

# Create two datasets with different coordinate dimensions
ds1 = xr.Dataset({
    'var': (['t', 'x'], np.random.randn(3, 2))
}, coords={
    't': np.arange(3),
    'x': np.arange(2)
})

ds2 = xr.Dataset({
    'var': (['t', 'y'], np.random.randn(3, 4))
}, coords={
    't': np.arange(3),
    'y': np.arange(4)
})

print("Dataset 1:")
print(ds1)
print("\nDataset 2:")
print(ds2)

df1 = to_polars(ds1)
df2 = to_polars(ds2)

print("\nDataFrame 1 columns:", df1.columns)
print("DataFrame 1 shape:", df1.shape)
print("\nDataFrame 2 columns:", df2.columns)
print("DataFrame 2 shape:", df2.shape)

print("\nSchemas match:", df1.schema == df2.schema)

# Try to concat them
try:
    result = pl.concat([df1, df2])
    print("\nConcat succeeded!")
    print("Result columns:", result.columns)
except Exception as e:
    print(f"\nConcat failed with error: {type(e).__name__}: {e}")
EOF
python /tmp/test_polars_concat.py

Repository: FBumann/linopy

Length of output: 217


🏁 Script executed:

# Let's also check what happens with the actual LazyLinearExpression scenario
cat > /tmp/test_lazy_polars.py << 'EOF'
import sys
sys.path.insert(0, '/repo')

import numpy as np
import pandas as pd
import xarray as xr
from linopy import Model
from linopy.expressions import LazyLinearExpression

# Create a model
m = Model()

# Create variables with different coordinate spaces
x = m.add_variables(name="x", coords=[("t", [0, 1, 2]), ("space", [0, 1])])
y = m.add_variables(name="y", coords=[("t", [0, 1, 2]), ("feature", [0, 1, 2])])

# Create expressions that would result in parts with different coord spaces
expr1 = 2 * x
expr2 = 3 * y

# Sum over one dimension to create parts with different coords
expr1_sum = expr1.sum("space")  # dims: [t, _term]
expr2_sum = expr2.sum("feature")  # dims: [t, _term]

# Try to manually create a LazyLinearExpression with parts from different coord spaces
from linopy.expressions import LazyLinearExpression

if hasattr(expr1_sum, 'data') and hasattr(expr2_sum, 'data'):
    parts = [expr1_sum.data, expr2_sum.data]
    lazy_expr = LazyLinearExpression(parts, m, parts=parts)
    
    print("LazyLinearExpression created")
    print("Part 1 dims:", parts[0].dims)
    print("Part 2 dims:", parts[1].dims)
    
    try:
        df = lazy_expr.to_polars()
        print("to_polars succeeded!")
        print("Result columns:", df.columns)
    except Exception as e:
        print(f"to_polars failed: {type(e).__name__}: {e}")
EOF
python /tmp/test_lazy_polars.py

Repository: FBumann/linopy

Length of output: 215


🏁 Script executed:

# Let's check what type of concat strategies are available in this Polars version
python << 'EOF'
import polars as pl
print(f"Polars version: {pl.__version__}")

# Check if diagonal concat is available
help_str = str(pl.concat.__doc__)
if "diagonal" in help_str.lower():
    print("Diagonal concat is available")
    print("---")
    # Extract the how parameter info
    if "how" in help_str.lower():
        print("'how' parameter found in documentation")
else:
    print("Diagonal concat might not be available")
EOF

Repository: FBumann/linopy

Length of output: 176


🏁 Script executed:

# Look at the constraints.py usage of pl.concat with diagonal_relaxed
grep -B 10 -A 5 "pl.concat.*diagonal_relaxed" linopy/constraints.py

Repository: FBumann/linopy

Length of output: 912


🏁 Script executed:

# Let's examine the to_polars implementation in LinearExpression (non-lazy) to understand schema
sed -n '1446,1480p' linopy/expressions.py

Repository: FBumann/linopy

Length of output: 1366


🏁 Script executed:

# Check what happens in the parent class's to_polars
grep -B 5 -A 20 "class LinearExpression" linopy/expressions.py | head -40

Repository: FBumann/linopy

Length of output: 1071


🏁 Script executed:

# Let's look more carefully at the full to_polars method for LinearExpression
sed -n '1446,1500p' linopy/expressions.py

Repository: FBumann/linopy

Length of output: 2119


🏁 Script executed:

# Let's check if there's any documentation about what columns to_polars produces
grep -B 5 -A 30 "def to_polars.*Dataset" linopy/common.py

Repository: FBumann/linopy

Length of output: 1119


🏁 Script executed:

# Check what broadcast does - it's from xarray
python3 << 'EOF'
import xarray as xr
help(xr.broadcast)
EOF

Repository: FBumann/linopy

Length of output: 176


🏁 Script executed:

# Check the Polars version requirement in the project
find . -name "requirements*.txt" -o -name "setup.py" -o -name "pyproject.toml" | xargs grep -l "polars" 2>/dev/null

Repository: FBumann/linopy

Length of output: 74


🏁 Script executed:

# Check the pyproject.toml for Polars version
cat pyproject.toml | grep -A 5 "polars"

Repository: FBumann/linopy

Length of output: 163


🏁 Script executed:

# Let's look at how Polars concat behavior works by checking if diagonal_relaxed is used anywhere else
grep -r "diagonal_relaxed" --include="*.py"

Repository: FBumann/linopy

Length of output: 165


🏁 Script executed:

# Let's trace through a specific scenario - what happens with different coord spaces
# First, find how parts are created in LazyLinearExpression
grep -B 5 -A 10 "__add__\|__mul__" linopy/expressions.py | grep -A 20 "class LazyLinearExpression" | head -50

Repository: FBumann/linopy

Length of output: 40


🏁 Script executed:

# Look at the structure of a Dataset in a part - check what variables are typically included
sed -n '1330,1350p' linopy/expressions.py

Repository: FBumann/linopy

Length of output: 597


🏁 Script executed:

# Search for where LazyLinearExpression gets created with parts
grep -n "LazyLinearExpression(" linopy/expressions.py | head -10

Repository: FBumann/linopy

Length of output: 422


🏁 Script executed:

# Let's check what type hints are used in the to_polars method
sed -n '2037,2060p' linopy/expressions.py

Repository: FBumann/linopy

Length of output: 800


🏁 Script executed:

# Check if type hints are used in related methods
sed -n '1446,1475p' linopy/expressions.py

Repository: FBumann/linopy

Length of output: 1252


🏁 Script executed:

# Check the imports to see if there are type hint issues
head -50 linopy/expressions.py | grep -E "from typing|import.*TYPE_CHECKING"

Repository: FBumann/linopy

Length of output: 114


🏁 Script executed:

# Let's examine if there's a test that demonstrates the issue or the fix
find . -name "*.py" -type f -path "*/test/*" -exec grep -l "LazyLinearExpression.*to_polars\|to_polars.*lazy" {} \;

Repository: FBumann/linopy

Length of output: 40


🏁 Script executed:

# Let's check if there are any existing tests for to_polars on lazy expressions
grep -r "test.*to_polars" test/ --include="*.py" -A 10 | grep -i "lazy\|part"

Repository: FBumann/linopy

Length of output: 40


🏁 Script executed:

# Check the entire LazyLinearExpression class to see if to_polars is the only method handling parts
sed -n '1775,2200p' linopy/expressions.py | grep -E "def |to_polars|_parts"

Repository: FBumann/linopy

Length of output: 2450


🏁 Script executed:

# Let's check what variables are in a Dataset that becomes a part
# Look at what gets stored in the Dataset for a part
sed -n '1980,2050p' linopy/expressions.py | grep -A 30 "def __add__"

Repository: FBumann/linopy

Length of output: 40


🏁 Script executed:

# Let's see what happens when parts are created with addition
sed -n '1900,1980p' linopy/expressions.py | grep -B 20 "parts=self._parts + other._parts"

Repository: FBumann/linopy

Length of output: 897


🏁 Script executed:

# Check the actual error that would occur with pl.concat on mismatched schemas
# Let's look for any existing issues or PRs mentioning this
grep -r "schema\|concat" test/ --include="*.py" | grep -i "lazy\|polars" | head -20

Repository: FBumann/linopy

Length of output: 40


🏁 Script executed:

# Verify if type hints are properly used in the method
sed -n '2037,2045p' linopy/expressions.py

Repository: FBumann/linopy

Length of output: 376


🏁 Script executed:

# Check the return type annotation for to_polars
grep -B 2 "def to_polars" linopy/expressions.py | grep "->.*DataFrame"

Repository: FBumann/linopy

Length of output: 168


🏁 Script executed:

# Let's check what the actual Dataset structure looks like when broadcast is called
# by examining what variables are typically in the data
sed -n '352,370p' linopy/common.py

Repository: FBumann/linopy

Length of output: 588


🏁 Script executed:

# Check how const is handled in LazyLinearExpression
sed -n '1825,1850p' linopy/expressions.py

Repository: FBumann/linopy

Length of output: 987


🏁 Script executed:

# Look at the __add__ method to see how const_override is managed
sed -n '1900,1970p' linopy/expressions.py | grep -A 50 "if isinstance(other, LazyLinearExpression)"

Repository: FBumann/linopy

Length of output: 1601


🏁 Script executed:

# Check if const column is always added to frames in to_polars
grep -B 5 -A 30 "def to_polars" linopy/common.py | grep -A 20 "def to_polars(ds"

Repository: FBumann/linopy

Length of output: 666


🏁 Script executed:

# Look at what to_polars returns from the parent class
sed -n '1446,1475p' linopy/expressions.py

Repository: FBumann/linopy

Length of output: 1252


🏁 Script executed:

# Check if the issue could occur: does to_polars handle missing const?
sed -n '1446,1475p' linopy/expressions.py | head -40

Repository: FBumann/linopy

Length of output: 1252


🏁 Script executed:

# Let's confirm by looking at what variables could be different
# Check if parts always have the same set of variables
sed -n '1900,1950p' linopy/expressions.py | grep -B 20 "parts=self._parts + other._parts"

Repository: FBumann/linopy

Length of output: 40


🏁 Script executed:

# The key is: when you have two Datasets from different expressions, do they always have the same variables?
# Let's see what variables a Dataset from an expression contains
grep -B 10 "coeffs\|vars" linopy/expressions.py | head -40

Repository: FBumann/linopy

Length of output: 1233


🏁 Script executed:

# Check if broadcast could produce different columns for different Datasets
# Look at the common.py to_polars more carefully
cat linopy/common.py | sed -n '352,375p'

Repository: FBumann/linopy

Length of output: 779


🏁 Script executed:

# Now let's verify the actual issue by checking what happens with a simple example
# Are parts guaranteed to have the same variables?
python3 << 'EOF'
# Check the Polars concat documentation and default behavior
import polars as pl

# Create two DataFrames with different columns
df1 = pl.DataFrame({"a": [1, 2], "b": [3, 4]})
df2 = pl.DataFrame({"a": [5, 6], "c": [7, 8]})

print("df1 columns:", df1.columns)
print("df2 columns:", df2.columns)

# Try to concat them
try:
    result = pl.concat([df1, df2])
    print("Default concat succeeded")
    print("Result columns:", result.columns)
except Exception as e:
    print(f"Default concat failed: {type(e).__name__}: {e}")

# Try with diagonal_relaxed
try:
    result = pl.concat([df1, df2], how="diagonal_relaxed")
    print("\ndiagonal_relaxed concat succeeded")
    print("Result columns:", result.columns)
    print(result)
except Exception as e:
    print(f"\ndiagonal_relaxed concat failed: {type(e).__name__}: {e}")
EOF

Repository: FBumann/linopy

Length of output: 176


🏁 Script executed:

# Let me check if parts could have different variables by looking at how expressions with constants are handled
sed -n '1900,1975p' linopy/expressions.py

Repository: FBumann/linopy

Length of output: 2781


🏁 Script executed:

# Check what FILL_VALUE tells us about expected variables
grep -B 3 -A 3 "FILL_VALUE" linopy/expressions.py linopy/common.py

Repository: FBumann/linopy

Length of output: 2381


🏁 Script executed:

# Let's see if there's documentation on what variables a part Dataset contains
sed -n '1775,1850p' linopy/expressions.py

Repository: FBumann/linopy

Length of output: 2803


🏁 Script executed:

# Check if the review's concern about different coord spaces matches how columns might differ
# Look at the _compact method which groups by coord space
sed -n '1980,2030p' linopy/expressions.py

Repository: FBumann/linopy

Length of output: 1809


🏁 Script executed:

# Final check - what happens when to_polars is called on Datasets with different coords?
# Look at examples of how parts are used elsewhere
grep -A 10 "for part in self._parts" linopy/expressions.py | head -50

Repository: FBumann/linopy

Length of output: 758


Use how="diagonal_relaxed" for pl.concat() to handle frames with different columns from different coord spaces.

When parts from different coordinate dimensions are converted to polars DataFrames, they may have different variable columns (e.g., some include const, others don't). pl.concat(frames) requires matching schemas; use pl.concat(frames, how="diagonal_relaxed") instead, which is already available in this codebase (see constraints.py line 1360).

✅ Suggested fix
-        df = pl.concat(frames)
+        df = pl.concat(frames, how="diagonal_relaxed")
🤖 Prompt for AI Agents
In `@linopy/expressions.py` around lines 2037 - 2060, The pl.concat call in
to_polars (method to_polars on the expression class handling self._parts) should
allow concatenation of DataFrames with different schemas; change the plain
pl.concat(frames) to use pl.concat(frames, how="diagonal_relaxed") so frames
with different coordinate-variable columns (e.g., presence/absence of "const")
are merged correctly before calling group_terms_polars and
check_has_nulls_polars.

  - In merge(), changed from xr.concat(const_arrays, join="outer").sum() to filtering out zero-valued const arrays before aligning. This eliminates the 6GB spike for the common
   case where consts are zero (which they are in story2's 2*x + 3*y).

  Current story2 results:
  ┌─────────────────────┬───────────┬─────────────┐
  │        Step         │  Master   │ This branch │
  ├─────────────────────┼───────────┼─────────────┤
  │ after add_variables │ 13 MB     │ 14 MB       │
  ├─────────────────────┼───────────┼─────────────┤
  │ after 2*x + 3*y     │ 12,013 MB │ 14 MB       │
  ├─────────────────────┼───────────┼─────────────┤
  │ after .flat         │ 18,973 MB │ 36 MB       │
  ├─────────────────────┼───────────┼─────────────┤
  │ after total <= 1    │ 18,973 MB │ 5,777 MB    │
  └─────────────────────┴───────────┴─────────────┘
  Build and flat are effectively solved — 858× and 527× reduction. The constraint path (<= 1) still materializes at 5.8 GB because to_constraint calls (self - rhs).data which
  triggers _materialize() → xr.concat(parts, join="outer").
  ┌───────────────────────┬───────────┬─────────────┬───────────┐
  │         Step          │  Master   │ This branch │ Reduction │
  ├───────────────────────┼───────────┼─────────────┼───────────┤
  │ after 2*x + 3*y       │ 12,013 MB │ 14 MB       │ 858×      │
  ├───────────────────────┼───────────┼─────────────┼───────────┤
  │ after .flat           │ 18,973 MB │ 36 MB       │ 527×      │
  ├───────────────────────┼───────────┼─────────────┼───────────┤
  │ after total <= 1      │ 18,973 MB │ 14 MB       │ 1,355×    │
  ├───────────────────────┼───────────┼─────────────┼───────────┤
  │ after add_constraints │ —         │ 15 MB       │ —         │
  ├───────────────────────┼───────────┼─────────────┼───────────┤
  │ after con.flat        │ —         │ 88 MB       │ 215×      │
  └───────────────────────┴───────────┴─────────────┴───────────┘
  Partially overlapping dims (shared time)
  ┌──────────────────┬───────────┬─────────────┬──────────────────────────────┐
  │       Step       │  Master   │ This branch │          Reduction           │
  ├──────────────────┼───────────┼─────────────┼──────────────────────────────┤
  │ after 2*x + 3*y  │ 12,013 MB │ 14 MB       │ 858×                         │
  ├──────────────────┼───────────┼─────────────┼──────────────────────────────┤
  │ after .flat      │ 18,973 MB │ 36 MB       │ 527×                         │
  ├──────────────────┼───────────┼─────────────┼──────────────────────────────┤
  │ after total <= 1 │ 18,973 MB │ 5,773 MB    │ Materialization (shared dim) │
  └──────────────────┴───────────┴─────────────┴──────────────────────────────┘
  Same-coords (no regression)

  Build=0.05s/543MB, flat=0.41s/779MB — unchanged from before.

  What was implemented

  1. LazyLinearExpression.to_constraint() — builds per-part constraint data when dims are fully disjoint; falls back to materialization when any dims overlap
  2. Constraint._lazy_parts — stores per-part labeled constraint datasets, lazy .data materialization
  3. Constraint.flat / to_polars — per-part iteration avoiding Cartesian product
  4. add_constraints() in model.py — per-part label assignment and infinity check
  5. Per-part sanitization — sanitize_zeros, sanitize_missings, sanitize_infinities operate on parts directly
  Changes

  1. Added import math at the top of the file.
  2. New function _parts_are_coord_disjoint(parts) — checks if parts have non-overlapping coordinate values (not just dimension names). Requires all parts to have at least one
  non-helper dimension (scalars are excluded since they need broadcasting).
  3. New function _try_redistribute(parts) — when one "broad" part spans the full coordinate space and the remaining "narrow" parts are pairwise coord-disjoint and perfectly
  tile the broad part, slices the broad part per narrow part's coordinates and concatenates along _term. Returns enriched parts that are coordinate-disjoint, or None if
  conditions aren't met.
  4. Updated LinearExpression.to_constraint() — when rhs is a LazyLinearExpression, delegates to (self - rhs).to_constraint(sign, 0) so the lazy paths get a chance to fire.
  5. Updated LazyLinearExpression.to_constraint() — added two new paths between the existing dim-disjoint check and the materialization fallback:
    - Coord-disjoint path: if parts already have non-overlapping coordinates, builds lazy constraints with per-part RHS slicing.
    - Redistribute path: if _try_redistribute succeeds, builds lazy constraints from the enriched parts.

  ---
  Regarding your PR #12 analysis — this implementation directly addresses concern #1 ("to_constraint only works for disjoint parts"). The new coord-disjoint and redistribute
  paths handle the case where parts share dimension names but cover different coordinate subsets, which is the common PyPSA pattern.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants