Skip to content

fix: broadcast pandas/DataArray bounds in coords and preserve dim order#722

Open
FBumann wants to merge 13 commits into
masterfrom
fix/bounds-coords-broadcast
Open

fix: broadcast pandas/DataArray bounds in coords and preserve dim order#722
FBumann wants to merge 13 commits into
masterfrom
fix/bounds-coords-broadcast

Conversation

@FBumann
Copy link
Copy Markdown
Collaborator

@FBumann FBumann commented May 23, 2026

Closes #706, #709. Partially addresses #723 (helper relocated to common.py; call-site audit deferred to follow-up — see issue comment for scope).

What changes

Principle. 0.7.0 made coords the source of truth for DataArray bounds in add_variables. This PR closes the two remaining gaps so the rule holds across every bound type and dimension order:

Implementation. _validate_dataarray_bounds and the downstream as_dataarray(arr, coords) call in add_variables collapse into a single helper, as_dataarray_in_coords (in linopy/common.py, alongside as_dataarray). It converts any input (pandas with named axes via to_xarray, otherwise via as_dataarray), validates the result against coords, expands missing dims, transposes to coords order, and reconstructs the coord variables in that order. expand_dims({}) and identity transpose are metadata-only no-ops, so scalar bounds and full-dim DataArray bounds already in coords order pay no extra cost.

Pandas conversion. A new _named_pandas_to_dataarray helper handles pandas with fully named axes (including MultiIndex on either axis) by stacking all column levels into the row index in one call: arr.stack(level=list(range(arr.columns.nlevels)), future_stack=True), followed by to_xarray(). Pandas with any unnamed axis falls through to as_dataarray.

Piecewise. Same family of fix: linopy.piecewise._broadcast_points built its expand_dims map from a set, giving a hash-randomized dim order across processes. Iterates expressions and dims in declaration order now.

Supersedes

Test plan

  • test/test_variable.py::TestAddVariablesBoundsWithCoords — 57/57. Includes:
    • test_bound_broadcast_missing_dim parametrized over DataArray, Series, DataFrame, Series-multiindex, DataFrame-multicolumns, DataFrame-multiindex.
    • test_dataarray_broadcast_missing_dim_order parametrized over scalar/DA bound combinations.
    • test_dataarray_coord_reorder for the same-values-different-order reindex.
    • test_pandas_bound_with_unnamed_axis_falls_through, test_unnamed_coords_short_circuit, test_dataarray_bound_with_multiindex_coord covering edge branches in the new helpers.
  • test/test_piecewise_constraints.py::test_broadcast_points_dim_order_follows_exprs.
  • Broader sweep: test_model.py, test_linear_expression.py, test_constraint.py, test_constraints.py, test_piecewise_constraints.py, test_variable_assignment.py, test_variables.py — 784 pass, no regressions.
  • Pre-commit (ruff/format/blackdoc/codespell) clean.

🤖 Generated with Claude Code

FBumann and others added 6 commits May 24, 2026 00:38
`add_variables` had two related bugs when `lower`/`upper` were arrays:

- pandas Series/DataFrame bounds missing a dimension in `coords` had
  the missing dimension silently dropped (#709), unlike DataArray
  bounds which were already broadcast.
- DataArray bounds missing a dimension were expanded with
  `DataArray.expand_dims`, which prepends new dimensions and produces
  a `coords`-mismatched dimension order in the resulting variable
  (#706). The order depended on the type of the bounds, so scalar
  bounds worked but two array bounds missing the same dimension did
  not.

Replace `_validate_dataarray_bounds` plus the downstream
`as_dataarray(..., coords)` call with a single helper
`_as_dataarray_in_coords`. It converts any input (pandas with named
axes via `to_xarray`, otherwise via `as_dataarray`), validates the
result against `coords`, expands missing dims, transposes to coords
order, and reconstructs the coord variables in that order.
`expand_dims` and `transpose` are no-ops when the array already
matches, so scalar / full-dim DataArray bounds keep their fast path.

Also fix `linopy.piecewise._broadcast_points`, which built the
`expand_dims` map from a `set`, producing a hash-randomized dimension
order across processes. Iterate expressions and dims in declaration
order instead.

Closes #706 and #709. Supersedes #710 and #719.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Restate #706/#709's fix as a single principle in the docstring,
release note, and `_as_dataarray_in_coords` helper docstring:
when `coords` is provided to `add_variables`, it is the source of
truth for dimensions, dimension order, and coordinate values, and
`lower` / `upper` are broadcast and aligned to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0.7.0 already shipped "add_variables no longer ignores coords when
lower / upper are DataArrays". Recast the new bullet as extending
that fix to the remaining gaps (pandas bounds; dim order across
bound types) so the continuity is visible from the release notes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ng gaps"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Parametrize test_bound_broadcast_missing_dim with three additional
  cases: Series with MultiIndex(time, colour), DataFrame with
  MultiIndex columns(space, colour), and DataFrame with MultiIndex
  index(time, space). Exercises the `while DataFrame: unstack()`
  loop and the MultiIndex branch of `_named_pandas_to_dataarray`.
- Add test_dataarray_coord_reorder for the same-values-different-order
  reindex branch (previously only the unequal-values raise was
  covered).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Relocate `_as_dataarray_in_coords` and its helpers
(`_coords_to_dict`, `_named_pandas_to_dataarray`) from `model.py`
into `common.py`, alongside the existing `as_dataarray` they
parallel. Rename to `as_dataarray_in_coords` (no leading underscore)
since it is no longer file-local — other modules can import the
strict-coords variant when migrating call sites.

Pure relocation: no behavior change, no call-site changes beyond
`add_variables`'s import. Refs #723.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread linopy/model.py
self._check_valid_dim_names(data)

if mask is not None:
mask = as_dataarray(mask, coords=data.coords, dims=data.dims).astype(bool)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would make sense to give mask the same treatment as the bounds to avoid the same pandas issue. Then mask could have the same type hint as lower/upper bounds and the user could for example pass pd.DataFrame for all 3

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would! But this is somewhat breaking (we had a FutureWarning in place, but still...). I will split this into a separate PR!

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed?

Comment thread linopy/model.py Outdated
FBumann and others added 2 commits May 24, 2026 13:15
…anches

Replace the unstack-while-loop / split named-check structure with a
single up-front "all axes named" check and a single
``DataFrame.stack(level=list(range(nlevels)), future_stack=True)``
call that collapses all column levels into the row MultiIndex in
one shot. Same observable behaviour, fewer moving parts, no
defensive unreachable branches.

Add tests covering the unnamed-axis fall-through path, the
empty-coords short-circuit in ``as_dataarray_in_coords``, and the
``MultiIndex``-on-a-dim ``continue`` in the validation loop.
Together with the restructure these bring the new helper code to
full patch coverage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pandas allows any hashable in ``pd.Index.names`` (tuples, ints,
etc.), but only strings map cleanly to xarray dim names. Reject
anything non-string up front so the pandas falls back to
``as_dataarray`` instead of producing a DataArray with an awkward
non-string dim name that downstream validation would reject with a
confusing "extra dimensions" error.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
FBumann and others added 3 commits May 24, 2026 15:11
Inputs without their own meaningful labels — numpy arrays, polars
Series, pandas with unnamed axes — fell through ``as_dataarray_in_coords``
via a short-circuit return. That meant:

- The default ``dim_0`` / ``dim_1`` axis names from ``as_dataarray``
  leaked into the result, so a pandas Series without an index name
  combined with another bound carrying a named coord produced a
  spurious 2-D variable.
- Shape mismatches surfaced further downstream as confusing
  "coordinates do not match" errors against the auto-generated
  ``RangeIndex``.

The fall-through now: (a) defaults ``dims`` to coords' keys so axes
get labelled correctly; (b) runs the same validate / expand /
transpose path as labelled inputs; (c) re-assigns coords from
``expected`` on the resulting DataArray so positional inputs align
to coords by position. A shape mismatch surfaces as xarray's clear
``conflicting sizes`` from ``assign_coords``. MultiIndex coords are
left alone (re-assigning a PandasMultiIndex emits a FutureWarning).

Replaces the tautological ``test_pandas_bound_with_unnamed_axis_falls_through``
(which sneaked past by naming the coord ``"dim_0"`` to match the
auto-generated dim) with ``test_positional_bound_aligns_to_coords``
that asserts actual positional alignment across numpy / Series /
DataFrame, plus ``test_positional_bound_wrong_size_raises_clear_error``
for the shape-mismatch path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…o_dict

``reformulate_sos1`` / ``reformulate_sos2`` built the coords for
the indicator variable as ``[var.coords[d] for d in var.dims]``,
which is a list of ``xarray.DataArray`` coord objects. The rest
of linopy passes ``coords`` as a list of ``pd.Index``. The mix
slipped through under the old short-circuit fall-through but
broke once the helper started defaulting ``dims`` from
``_coords_to_dict(coords)`` — non-``pd.Index`` entries were
silently dropped, so ``len(dims) < len(coords)`` and xarray
raised ``different number of dimensions on data and dims: 2 vs 1``.

Use ``var.indexes[d]`` instead — it returns the actual
``pd.Index`` (regular or MultiIndex) for the dim and preserves
structure that ``pd.Index(coord.values, ...)`` would flatten.

Also widen ``_coords_to_dict`` to accept any entry with a
``.name`` (xarray DataArrays included) so a future caller passing
mixed types doesn't silently lose coords. The reformulator fix
removes the only known producer of mixed-type coords; this is
belt-and-suspenders.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the permissive ``getattr(c, "name", None)`` check with an
explicit allow-list: ``pd.Index`` (named or not — unnamed silently
skip as before) and unnamed sequences (``list`` / ``tuple`` /
``range`` / ``numpy.ndarray``). Any other type (notably
``xarray.DataArray``, but also ``pd.Series`` and friends) now
raises ``TypeError`` with a hint to pass ``variable.indexes[<dim>]``
instead. This would have caught the SOS-reformulator bug at the
source instead of letting it surface as a confusing xarray error
about mismatched dim counts ten frames down.

Drop ``DataArray`` from the matching ``coords`` type hints in
``model.py`` / ``expressions.py`` so the documented and runtime
type sets agree.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

add_variables: DataArray bounds with missing dimensions yield wrong dimension order

2 participants