Skip to content

refactor: unify as_dataarray; split broadcasting from coords validation#726

Open
FBumann wants to merge 1 commit into
fix/mask-coords-broadcastfrom
feat/unify-as-dataarray-coords
Open

refactor: unify as_dataarray; split broadcasting from coords validation#726
FBumann wants to merge 1 commit into
fix/mask-coords-broadcastfrom
feat/unify-as-dataarray-coords

Conversation

@FBumann
Copy link
Copy Markdown
Collaborator

@FBumann FBumann commented May 24, 2026

Closes #723. Stacked on #725 (which is stacked on #722).

What changes

as_dataarray_in_coords is folded into as_dataarray and split along the seam between "broadcast arr against coords" and "enforce the coords contract":

  • as_dataarray(arr, coords) — the broadcasting primitive. For every input type, the result is aligned with coords: positional inputs (numpy / unnamed pandas / scalar) are labeled by position, shared-dim coords are reindexed when values are equal in a different order, dims present in coords but not in arr are expanded, and the result is transposed to coords order. Extra dims and disagreeing value sets on shared dims pass through, so xarray broadcasting in expression arithmetic keeps working.
  • assert_compatible_with_coords(arr, coords) — the validation companion. Raises if arr introduces dims not in coords (was: as_dataarray_in_coords's extra-dim raise) or if a shared dim has disagreeing coord values (was: its "do not match" raise).

add_variables and add_constraints now call as_dataarray followed by assert_compatible_with_coords for lower / upper / mask. The previous as_dataarray_in_coords helper is deleted.

_coords_to_dict now filters MultiIndex level coords out of xarray.Coordinates inputs, so the new strict-by-default path treats e.g. station (and not its derived letter / num levels) as the dim. This was already a latent issue once strict semantics governed as_dataarray's coords arg.

Audit summary (the 12 call sites listed in #723)

Call site Strictness Behavior
model.py:705/706 lower/upper in add_variables strict as_dataarray + assert_compatible_with_coords
model.py:715 mask in add_variables strict same
model.py:979 mask in add_constraints strict same
expressions.py:584/613/1105/1668/2154 arithmetic broadcast call signatures unchanged; benefits from positional labeling + reindex for free, with no value-set raise on shared dims
variables.py:330 to_linexpr(coefficient) broadcast unchanged
expressions.py:341/2002/2030/2289, model.py:912/919/972, variables.py:1369 no coords unaffected

Breaking changes (relative to PR #725's strict semantics)

  • as_dataarray no longer raises on shared-dim value-set mismatch. Disagreeing values are passed through for downstream xarray alignment to handle. Callers that want the old behavior should call assert_compatible_with_coords after the conversion (add_variables/add_constraints already do).
  • as_dataarray no longer raises on extra dims. Same migration: use assert_compatible_with_coords.

Two existing tests updated to reflect the new "coords is source of truth, extras broadcast in" semantics:

  • test_as_dataarray_with_ndarray_coords_dict_set_dims_not_aligned: extra coord entries now expand into the result.
  • test_dataarray_extra_dims: rewritten so the subset check fires (rather than the value-mismatch check from the old order).

Test plan

  • test/test_common.py — added five tests pinning the new split (extra-dim preservation, disjoint shared-dim values, assert_compatible_with_coords extra-dim raise, value-mismatch raise, subset-dims allowed).
  • Full suite: 3698 passed, 29 skipped (parity with base).
  • pre-commit run --all-files clean (ruff/format/blackdoc/codespell).
  • Microbenchmark (dev-scripts/benchmark_as_dataarray.py, untracked): flat timings vs the base branch on both add_variables-heavy (≈22 ms / 50–57 ms mean) and arithmetic-heavy (≈80–82 ms) workloads.

🤖 Generated with Claude Code

Closes #723. Folds the body of `as_dataarray_in_coords` into `as_dataarray`
and extracts the contract checks into `assert_compatible_with_coords`, so
linopy now has one broadcasting primitive and one validation companion.

`as_dataarray(arr, coords)` aligns the result against `coords` for every
input type: labels positional inputs (numpy / unnamed pandas / scalar) by
position, reindexes same-values-different-order, expands missing dims,
and transposes to coords order. Extra dims and disagreeing value sets on
shared dims pass through unchanged, so xarray broadcasting in expression
arithmetic keeps working.

`assert_compatible_with_coords(arr, coords)` enforces the strict contract
(`arr.dims ⊆ coords.dims`, plus exact coord-value equality on shared
dims). `add_variables` and `add_constraints` now call it after
`as_dataarray` for `lower` / `upper` / `mask`, replacing the deleted
`as_dataarray_in_coords` helper.

`_coords_to_dict` filters MultiIndex level coords out of
`xarray.Coordinates` inputs so the new strict-by-default path treats
`station` (and not its derived `letter` / `num` levels) as the dim.

Test suite: 3698 passed (no regressions). Two existing tests were
updated to reflect the new "coords is source of truth" semantics:
`test_as_dataarray_with_ndarray_coords_dict_set_dims_not_aligned`
(extra coord entries now broadcast in) and
`test_dataarray_extra_dims` (now triggers the subset check rather than
the value-mismatch check).

Microbenchmark in dev-scripts/benchmark_as_dataarray.py shows flat
timings vs the base branch on both add_variables-heavy and arithmetic-
heavy workloads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant