fix: broadcast pandas/DataArray bounds in coords and preserve dim order#722
Open
FBumann wants to merge 13 commits into
Open
fix: broadcast pandas/DataArray bounds in coords and preserve dim order#722FBumann wants to merge 13 commits into
FBumann wants to merge 13 commits into
Conversation
`add_variables` had two related bugs when `lower`/`upper` were arrays: - pandas Series/DataFrame bounds missing a dimension in `coords` had the missing dimension silently dropped (#709), unlike DataArray bounds which were already broadcast. - DataArray bounds missing a dimension were expanded with `DataArray.expand_dims`, which prepends new dimensions and produces a `coords`-mismatched dimension order in the resulting variable (#706). The order depended on the type of the bounds, so scalar bounds worked but two array bounds missing the same dimension did not. Replace `_validate_dataarray_bounds` plus the downstream `as_dataarray(..., coords)` call with a single helper `_as_dataarray_in_coords`. It converts any input (pandas with named axes via `to_xarray`, otherwise via `as_dataarray`), validates the result against `coords`, expands missing dims, transposes to coords order, and reconstructs the coord variables in that order. `expand_dims` and `transpose` are no-ops when the array already matches, so scalar / full-dim DataArray bounds keep their fast path. Also fix `linopy.piecewise._broadcast_points`, which built the `expand_dims` map from a `set`, producing a hash-randomized dimension order across processes. Iterate expressions and dims in declaration order instead. Closes #706 and #709. Supersedes #710 and #719. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Restate #706/#709's fix as a single principle in the docstring, release note, and `_as_dataarray_in_coords` helper docstring: when `coords` is provided to `add_variables`, it is the source of truth for dimensions, dimension order, and coordinate values, and `lower` / `upper` are broadcast and aligned to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0.7.0 already shipped "add_variables no longer ignores coords when lower / upper are DataArrays". Recast the new bullet as extending that fix to the remaining gaps (pandas bounds; dim order across bound types) so the continuity is visible from the release notes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ng gaps" Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Parametrize test_bound_broadcast_missing_dim with three additional cases: Series with MultiIndex(time, colour), DataFrame with MultiIndex columns(space, colour), and DataFrame with MultiIndex index(time, space). Exercises the `while DataFrame: unstack()` loop and the MultiIndex branch of `_named_pandas_to_dataarray`. - Add test_dataarray_coord_reorder for the same-values-different-order reindex branch (previously only the unequal-values raise was covered). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3 tasks
Relocate `_as_dataarray_in_coords` and its helpers (`_coords_to_dict`, `_named_pandas_to_dataarray`) from `model.py` into `common.py`, alongside the existing `as_dataarray` they parallel. Rename to `as_dataarray_in_coords` (no leading underscore) since it is no longer file-local — other modules can import the strict-coords variant when migrating call sites. Pure relocation: no behavior change, no call-site changes beyond `add_variables`'s import. Refs #723. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
RobbieKiwi
approved these changes
May 24, 2026
| self._check_valid_dim_names(data) | ||
|
|
||
| if mask is not None: | ||
| mask = as_dataarray(mask, coords=data.coords, dims=data.dims).astype(bool) |
Contributor
There was a problem hiding this comment.
Maybe it would make sense to give mask the same treatment as the bounds to avoid the same pandas issue. Then mask could have the same type hint as lower/upper bounds and the user could for example pass pd.DataFrame for all 3
Collaborator
Author
There was a problem hiding this comment.
It would! But this is somewhat breaking (we had a FutureWarning in place, but still...). I will split this into a separate PR!
…anches Replace the unstack-while-loop / split named-check structure with a single up-front "all axes named" check and a single ``DataFrame.stack(level=list(range(nlevels)), future_stack=True)`` call that collapses all column levels into the row MultiIndex in one shot. Same observable behaviour, fewer moving parts, no defensive unreachable branches. Add tests covering the unnamed-axis fall-through path, the empty-coords short-circuit in ``as_dataarray_in_coords``, and the ``MultiIndex``-on-a-dim ``continue`` in the validation loop. Together with the restructure these bring the new helper code to full patch coverage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pandas allows any hashable in ``pd.Index.names`` (tuples, ints, etc.), but only strings map cleanly to xarray dim names. Reject anything non-string up front so the pandas falls back to ``as_dataarray`` instead of producing a DataArray with an awkward non-string dim name that downstream validation would reject with a confusing "extra dimensions" error. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5 tasks
Inputs without their own meaningful labels — numpy arrays, polars Series, pandas with unnamed axes — fell through ``as_dataarray_in_coords`` via a short-circuit return. That meant: - The default ``dim_0`` / ``dim_1`` axis names from ``as_dataarray`` leaked into the result, so a pandas Series without an index name combined with another bound carrying a named coord produced a spurious 2-D variable. - Shape mismatches surfaced further downstream as confusing "coordinates do not match" errors against the auto-generated ``RangeIndex``. The fall-through now: (a) defaults ``dims`` to coords' keys so axes get labelled correctly; (b) runs the same validate / expand / transpose path as labelled inputs; (c) re-assigns coords from ``expected`` on the resulting DataArray so positional inputs align to coords by position. A shape mismatch surfaces as xarray's clear ``conflicting sizes`` from ``assign_coords``. MultiIndex coords are left alone (re-assigning a PandasMultiIndex emits a FutureWarning). Replaces the tautological ``test_pandas_bound_with_unnamed_axis_falls_through`` (which sneaked past by naming the coord ``"dim_0"`` to match the auto-generated dim) with ``test_positional_bound_aligns_to_coords`` that asserts actual positional alignment across numpy / Series / DataFrame, plus ``test_positional_bound_wrong_size_raises_clear_error`` for the shape-mismatch path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…o_dict ``reformulate_sos1`` / ``reformulate_sos2`` built the coords for the indicator variable as ``[var.coords[d] for d in var.dims]``, which is a list of ``xarray.DataArray`` coord objects. The rest of linopy passes ``coords`` as a list of ``pd.Index``. The mix slipped through under the old short-circuit fall-through but broke once the helper started defaulting ``dims`` from ``_coords_to_dict(coords)`` — non-``pd.Index`` entries were silently dropped, so ``len(dims) < len(coords)`` and xarray raised ``different number of dimensions on data and dims: 2 vs 1``. Use ``var.indexes[d]`` instead — it returns the actual ``pd.Index`` (regular or MultiIndex) for the dim and preserves structure that ``pd.Index(coord.values, ...)`` would flatten. Also widen ``_coords_to_dict`` to accept any entry with a ``.name`` (xarray DataArrays included) so a future caller passing mixed types doesn't silently lose coords. The reformulator fix removes the only known producer of mixed-type coords; this is belt-and-suspenders. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the permissive ``getattr(c, "name", None)`` check with an explicit allow-list: ``pd.Index`` (named or not — unnamed silently skip as before) and unnamed sequences (``list`` / ``tuple`` / ``range`` / ``numpy.ndarray``). Any other type (notably ``xarray.DataArray``, but also ``pd.Series`` and friends) now raises ``TypeError`` with a hint to pass ``variable.indexes[<dim>]`` instead. This would have caught the SOS-reformulator bug at the source instead of letting it surface as a confusing xarray error about mismatched dim counts ten frames down. Drop ``DataArray`` from the matching ``coords`` type hints in ``model.py`` / ``expressions.py`` so the documented and runtime type sets agree. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 24, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #706, #709. Partially addresses #723 (helper relocated to
common.py; call-site audit deferred to follow-up — see issue comment for scope).What changes
Principle. 0.7.0 made
coordsthe source of truth forDataArraybounds inadd_variables. This PR closes the two remaining gaps so the rule holds across every bound type and dimension order:Series/DataFramebounds missing a dimension incoordsare now broadcast tocoordsinstead of being silently dropped (add_variables: pandas Series/DataFrame bounds with missing dimensions silently drop the dimension #709).coordsregardless of bound type. Previously the order of expanded array bounds depended on the type of the bound (scalar bounds keptcoordsorder; two array bounds missing the same dimension produced a prepended order, add_variables: DataArray bounds with missing dimensions yield wrong dimension order #706).Implementation.
_validate_dataarray_boundsand the downstreamas_dataarray(arr, coords)call inadd_variablescollapse into a single helper,as_dataarray_in_coords(inlinopy/common.py, alongsideas_dataarray). It converts any input (pandas with named axes viato_xarray, otherwise viaas_dataarray), validates the result againstcoords, expands missing dims, transposes to coords order, and reconstructs the coord variables in that order.expand_dims({})and identitytransposeare metadata-only no-ops, so scalar bounds and full-dimDataArraybounds already incoordsorder pay no extra cost.Pandas conversion. A new
_named_pandas_to_dataarrayhelper handles pandas with fully named axes (includingMultiIndexon either axis) by stacking all column levels into the row index in one call:arr.stack(level=list(range(arr.columns.nlevels)), future_stack=True), followed byto_xarray(). Pandas with any unnamed axis falls through toas_dataarray.Piecewise. Same family of fix:
linopy.piecewise._broadcast_pointsbuilt itsexpand_dimsmap from aset, giving a hash-randomized dim order across processes. Iterates expressions and dims in declaration order now.Supersedes
set(var.dims) == {…}and didn't catch the dim-order regression; rewritten to assert tuple-equal dim order, with additionalMultiIndexcoverage and a one-liner for the same-values-different-order reindex branch.Test plan
test/test_variable.py::TestAddVariablesBoundsWithCoords— 57/57. Includes:test_bound_broadcast_missing_dimparametrized overDataArray,Series,DataFrame,Series-multiindex,DataFrame-multicolumns,DataFrame-multiindex.test_dataarray_broadcast_missing_dim_orderparametrized over scalar/DA bound combinations.test_dataarray_coord_reorderfor the same-values-different-order reindex.test_pandas_bound_with_unnamed_axis_falls_through,test_unnamed_coords_short_circuit,test_dataarray_bound_with_multiindex_coordcovering edge branches in the new helpers.test/test_piecewise_constraints.py::test_broadcast_points_dim_order_follows_exprs.test_model.py,test_linear_expression.py,test_constraint.py,test_constraints.py,test_piecewise_constraints.py,test_variable_assignment.py,test_variables.py— 784 pass, no regressions.🤖 Generated with Claude Code