Skip to content

fix: guard join rebind against non-xorq backends (#242)#243

Merged
hussainsultan merged 1 commit intomainfrom
fix/issue-242-plain-ibis-join-backends
Apr 21, 2026
Merged

fix: guard join rebind against non-xorq backends (#242)#243
hussainsultan merged 1 commit intomainfrom
fix/issue-242-plain-ibis-join-backends

Conversation

@hachej
Copy link
Copy Markdown
Collaborator

@hachej hachej commented Apr 20, 2026

Summary

  • Joins on plain ibis backends xorq can't wrap (BigQuery, Databricks, …) crashed inside _rebind_join_backends with ValueError: Don't know how to handle type <class 'ibis.expr.types.relations.Table'> from xorq's walk_nodes.
  • Mirror the try/except guard already used by _unify_backends so the rebind is skipped for plain ibis trees, letting ibis execute the join natively.
  • Add a regression test that forces _ensure_xorq_table to return a plain ibis table (simulating BigQuery's unsupported wrapping path) and verifies join_one still executes end-to-end.

Fixes #242.

Test plan

  • uv run pytest src/boring_semantic_layer/tests/test_plain_ibis.py — 12 passed, including new test_join_one_unsupported_backend.

🤖 Generated with Claude Code

_rebind_join_backends called walk_nodes unconditionally, which raises
ValueError on plain ibis expressions that xorq can't wrap (e.g.
BigQuery). Mirror the try/except already present in _unify_backends so
joins fall back to native ibis execution.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@hussainsultan hussainsultan merged commit 9a75dc6 into main Apr 21, 2026
3 checks passed
mattfili added a commit to mattfili/boring-semantic-layer that referenced this pull request May 1, 2026
* fix: use hashing_tag instead of tag so BSL metadata contributes to content hash

Previously, .tag() produced Tag nodes that xorq strips before hashing,
meaning two models with different measures/dimensions on the same table
would hash identically. Switching to .hashing_tag() (HashingTag) ensures
the serialized BSL metadata is included in the content hash.

Adds tests verifying different measures produce different hashes and
identical models produce the same hash.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* deps

* Combine boringdata#206 and boringdata#207: YAML calculated_measures + model-prefix fixes (boringdata#211)

* feat: support calculated_measures and .all() in YAML config

Add `calculated_measures` section to YAML schema for derived metrics
that reference other measures by name. Expressions are evaluated at
runtime against MeasureScope, which also enables `.all()` for
percent-of-total window aggregation patterns.

Closes boringdata#202, closes boringdata#115

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: support model prefix in with_dimensions/with_measures after joins

Add prefix-aware column navigation to ColumnScope and MeasureScope so
that dimension/measure lambdas can use model prefixes (e.g.,
`t.flights.carrier`) after joins. Previously this only worked in
`.filter()`. Also adds _DimensionTableProxy for dimension evaluation
with merged dimension map awareness.

Closes boringdata#136

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: raise on invalid prefixed column/dimension lookups instead of silent fallback

Proxy classes (_ColumnPrefixProxy, _DimPrefixProxy) now raise AttributeError
when a prefixed name isn't found, preventing silent resolution to unrelated
unprefixed columns. Also preserves AttributeError from proxy evaluation in
Dimension.__call__ for proper dependency resolution.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: keep helpful dimension error formatting with proxy fallback

---------

Co-authored-by: boringdata <boringdata@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Support plain ibis backends without xorq wrapping (boringdata#217)

Make BSL core features (query, filter, aggregate, join, execute) work
with any ibis backend, even those not registered in xorq (e.g. Databricks).

The xorq conversion in _ensure_xorq_table() now gracefully falls back to
plain ibis when the backend isn't xorq-supported. Profile loading also
falls back to ibis.connect() for unregistered backends.

In query.py, replace hard xorq.api imports with a _get_ibis_api() helper
that prefers xorq's vendored ibis when available but falls back to plain
ibis. This ensures filter expressions are built with the same ibis
implementation as the table they resolve against.

Serialization features (to_tagged/from_tagged) still require xorq.

Closes boringdata#216

Co-authored-by: boringdata-agent <boringdata-agent@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add derived_dimensions for time dimensions (boringdata#215)

* feat: support derived_dimensions for time dimensions

* fix: update xorq API calls and test infrastructure for xorq 0.3.12

- Rename hashing_tag() → tag() across production code and tests (xorq API rename)
- Add SortKey arg→expr compatibility shim for ibis↔xorq parameter mismatch
- Update MCP tests for fastmcp v3 API (_tool_manager._tools → await get_tool)
- Add skip markers for optional deps (altair, plotly, mcp, malloy)
- Fix non-deterministic row ordering in unnest test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: handle both ibis 11 (expr) and ibis 12 (arg) SortKey attribute names

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: remove unnecessary test skip markers

Dev dependencies are installed — no need for conditional skips.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: boringdata <boringdata@users.noreply.github.com>
Co-authored-by: boringdata-agent <boringdata-agent@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add thin HTTP transport for BSL models (boringdata#214)

* Add HTTP server transport for BSL models

* feat: support per-dimension time grains in HTTP server API

Add a `time_grains` dict field to QueryRequest that maps dimension names
to individual TimeGrain values (e.g., {"order_date": "TIME_GRAIN_MONTH",
"ship_date": "TIME_GRAIN_QUARTER"}). This enables different time grains
per dimension in a single query, matching the capability from PR boringdata#212.

The existing `time_grain` field is preserved for backward compatibility.
A model validator prevents specifying both fields simultaneously. When
`time_grains` is provided, grain transformations are applied at the
server layer using model.with_dimensions() before calling query().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use simple grain names in time_grains API field

Accept "month", "quarter", etc. instead of "TIME_GRAIN_MONTH" —
matches the HTTP-friendly convention from PR boringdata#212 and normalizes
internally via _make_grain_id.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: move time_grains support into core query()

Add time_grains param to query() for per-dimension grain control
(e.g., time_grains={"order_date": "TIME_GRAIN_MONTH", "ship_date": "TIME_GRAIN_QUARTER"}).
The server layer now just normalizes short names and passes through.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: accept short grain names ("month") directly in core query()

query() now accepts both "month" and "TIME_GRAIN_MONTH" via
_normalize_grain(), so the server layer needs no conversion.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update time grain docs for short names and per-dimension grains

- Core query() now accepts short grain names ("month") natively
- Server passes time_grains straight through, no normalization needed
- MCP tool now exposes time_grains parameter
- Updated all docs/prompts/help topics with short-form examples
  and time_grains (per-dimension) usage

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add missing time_grains param to second .query() method in expr.py

The SemanticJoin/SemanticAggregate .query() method was missing the
time_grains parameter — only the SemanticModel one had it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: exit with non-zero status when server deps are missing

cmd_serve now calls sys.exit(1) instead of bare return when
FastAPI/uvicorn are not installed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: boringdata <boringdata@users.noreply.github.com>
Co-authored-by: boringdata-agent <boringdata-agent@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version to 0.4.0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version to 0.3.10

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: route dimension-only queries to source table on joins (boringdata#225)

* fix: route dimension-only queries to source table on joins (boringdata#224)

When a star-schema join is queried with only dimensions (no measures),
and all requested dimensions originate from a single joined table,
bypass the fact table and query the dimension table directly. This
ensures dimension members with zero matching fact rows are returned.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add logging, filter support, and comprehensive tests for dim-only shortcut

- Replace bare `except Exception: pass` with logger.debug so fallbacks
  are traceable
- Support pre-aggregation filters that only reference the target
  dimension table (filters on other tables correctly disable shortcut)
- Add 9 additional tests: fact-table dims, JSON dict filters (documents
  fallback behavior), derived dimensions, SQL verification, limit
  composition, method-chaining API, and filter edge cases
- Tighten type annotations on _dimension_only_source_table

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: boringdata <boringdata@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version to 0.3.11

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: deferred join_one — dimension lookups after aggregation (boringdata#220)

* feat: grain-aware join_one via is_entity dimensions (boringdata#218)

When both models declare is_entity dimensions and their entity sets
differ, join_one auto-upgrades to join_many so BSL's pre-aggregation
logic aligns the grains before joining. This prevents fan-out and
double-counting in multi-fact star schemas (e.g. monthly revenue +
daily hours) without requiring manual pre-aggregation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: deferred join_one — dimension lookups after aggregation (boringdata#64)

When a join_one dimension table's PK (is_entity) matches the join key
and no measures from that table are used, BSL now defers the join until
after aggregation. This avoids unnecessary join expansion before GROUP BY
for pure dimension lookups.

Also refines grain mismatch detection (boringdata#218) to skip pure dimension
tables (no measures), routing them to the deferred path instead of
the pre-aggregation path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: prevent deferred join when filters reference dimension table

Filters on deferred dimension columns would be silently dropped,
producing incorrect results. Now _find_deferrable_joins checks if
any filter references the right table's columns and skips deferral
if so, falling back to the standard join-before-aggregate path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use original join predicate for deferred joins (composite key safety)

Instead of sorting left/right join keys independently and zipping them
(which can mismatch columns with different names), store and reuse the
original on= predicate for the post-aggregation LEFT JOIN. This
preserves the correct key pairing for composite joins.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: serialize join cardinality in hashing_tag metadata

Include cardinality ("one"/"many"/"cross") in the SemanticJoinOp
serialization so that join_one and join_many produce different hashes
in xorq's content-addressed naming. Also fix reconstruct to use the
correct join method based on serialized cardinality.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: handle derived dimensions and add edge case tests

- Derived dimensions (e.g., t.name.upper()) on deferred tables are now
  materialized via mutate before the post-agg join, instead of being
  silently dropped.
- Add edge case tests: snowflake chained join_one, derived dimensions,
  and prefixed filters on deferred tables.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: boringdata <boringdata@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: prune unused dimension joins from queries (boringdata#228)

* feat: prune unused dimension joins from queries (boringdata#227)

When a query only references a subset of the joined tables (via dimension
keys or measure names), joins to unreferenced tables are now skipped.
This avoids expensive joins to dimension tables that contribute nothing
to the result — e.g. a pure SUM() on a fact table no longer joins 4
dimension tables.

The implementation extracts table prefixes from the query's keys and
aggs, passes them as parent_requirements to SemanticJoinOp.to_untagged(),
which recursively prunes right-side leaf tables not in the needed set.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: don't prune joins in pre-aggregation path

The pre-agg path needs all tables for dimension bridges — passing
parent_requirements=None ensures no pruning happens there.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: only prune left-join join_one, add edge case tests

Guard pruning by cardinality and join type: only prune join_one with
how="left". Inner joins act as row filters and must not be removed.
join_many/join_cross are never pruned (join_many is intercepted by
the pre-agg path; join_cross changes row counts).

Adds edge case tests:
- Inner join with orphan rows: pruning correctly preserves inner join
- Filter on dimension table: pruning correctly disabled
- SQL verification: pruned tables absent from compiled SQL

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: preserve join pruning correctness

---------

Co-authored-by: boringdata <boringdata@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: repair demo_bsl_v2 script

* feat: grain-aware join_one via is_entity dimensions (boringdata#219)

* fix: default missing cardinality to 'many' in join deserialization (boringdata#223)

Pre-existing tagged payloads serialized before cardinality was emitted
have no 'cardinality' key. Defaulting to 'one' would silently skip
pre-aggregation on former join_many joins. Default to 'many' instead
since it is the safe superset.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: limit grain upgrades to multi-fact joins

---------

Co-authored-by: boringdata <boringdata@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: cover read-only joined dimensions (boringdata#237)

Co-authored-by: boringdata <boringdata@users.noreply.github.com>

* chore: prepare v0.3.12 release

* feat: add MCP period comparison workflow (boringdata#238)

Co-authored-by: boringdata <boringdata@users.noreply.github.com>

* feat: expose BSL TagHandler via xorq.from_tag_node entry point (boringdata#235)

* feat: expose BSL TagHandler via xorq.from_tag_node entry point (XOR-295)

Move BSL-specific tag handling out of xorq: BSL now ships a TagHandler
(boring_semantic_layer.xorq_integration:bsl_tag_handler) and registers it
under the xorq.from_tag_node entry point group, so xorq's registry
discovers it automatically without hard-coded BSL knowledge.

- extract_metadata walks nested tag-node metadata down to the innermost
  SemanticTableOp and returns the sidecar dict (dimension/measure names).
- from_tag_node reconstructs the base SemanticModel from the innermost
  SemanticTableOp source, not the full query chain, so callers can issue
  fresh .query() calls.
- Pin xorq to 4b3d8033 via [tool.uv.sources] until a release ships the
  xorq.expr.builders TagHandler registry.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: move xorq_integration to serialization.tag_handler

Rename the module to live next to the serialization code it depends on
(extract_xorq_metadata, reconstruct_bsl_operation, BSLSerializationContext,
thaw). The parent package now carries the "serialization" context, so the
module name can focus on its role — a TagHandler.

Update the xorq.from_tag_node entry point accordingly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump xorq to >=0.3.19 and drop git pin

xorq 0.3.19 ships the xorq.expr.builders TagHandler registry, so the
temporary [tool.uv.sources] git pin is no longer needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: ghoersti <ghoersti@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: treat UnboundTable as a leaf node in _reconstruct_table

_reconstruct_table() walks the expression tree looking for leaf nodes
(Read, InMemoryTable, DatabaseTable) but did not recognize UnboundTable.
Add it to the set of recognized leaf node types, handled the same way as
DatabaseTable.

Closes boringdata#240

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: safe fallthrough in _reconstruct_table for unknown expression types

The fallthrough at the end of _reconstruct_table() still called
.to_expr() on xorq_expr which is already a Table expression. Return
it directly to avoid AttributeError on any unrecognized leaf node type.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Revert "fix: safe fallthrough in _reconstruct_table for unknown expression types"

This reverts commit 7bc3eae.

* fix: preserve join cardinality on round-trip, fix pre-agg grain for derived dims

_reconstruct_join always used join_many, losing the original cardinality.
This caused join_one models to take the pre-aggregation path after
to_tagged → from_tagged, which computed wrong grain for derived dimensions
(e.g. dep_time.truncate("M") used raw dep_time instead of the truncated value).

- Serialize cardinality in _extract_join metadata
- Restore correct join_one/join_many in _reconstruct_join based on cardinality
- Materialize derived dimensions on raw_tbl in pre-agg grain computation
- Add regression test for round-tripped join_one with derived dim + prefixed measure

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: guard join rebind against non-xorq backends (boringdata#242) (boringdata#243)

_rebind_join_backends called walk_nodes unconditionally, which raises
ValueError on plain ibis expressions that xorq can't wrap (e.g.
BigQuery). Mirror the try/except already present in _unify_backends so
joins fall back to native ibis execution.

Co-authored-by: boringdata <boringdata@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: add reemit to BSL TagHandler for catalog rebuild (boringdata#244)

* feat: add reemit to BSL TagHandler for catalog rebuild support

Register a `reemit` callable on `bsl_tag_handler` so that
`xorq catalog replay --rebuild` can re-emit BSL-tagged subtrees
with translated sources. The rebuild logic re-stamps the original
tag metadata onto the rebuilt source—no metadata manipulation needed
because BSL's tag captures the full query recipe.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: guard reemit wiring against xorq without the field

CI installs xorq 0.3.19 which doesn't have `reemit` on TagHandler
yet. Conditionally pass it only when the field exists, and skip
rebuild tests when it's absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* ci: install xorq from feat/catalog/replay-rebuild branch

Temporary: install xorq from the feature branch that has the
`reemit` field on TagHandler so rebuild tests run in CI instead
of being skipped. Revert once the xorq release ships.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: guard reemit against missing parent and add edge-case tests

Fail fast with a clear ValueError when tag_node.parent is None instead
of a bare AttributeError.  Add non-identity transform tests and a
missing-parent test to cover paths the original suite missed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* revert: remove xorq branch pin from CI

xorq's ci-downstream-bsl workflow handles testing BSL against
in-flight xorq changes, so BSL CI doesn't need to pin a branch.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: metadata field on Dimension and Measure (boringdata#245) (boringdata#246)

Adds a free-form ``metadata: dict`` field to ``Dimension`` and ``Measure``
that is merged into ``to_json()`` output. Downstream consumers (LLM/MCP
tools, dashboards, docs) can attach domain-specific hints like ``format``,
``unit``, ``example_values`` or ``is_additive`` without subclassing.

- ``metadata`` defaults to ``{}`` and is excluded from equality/hash
- merged after base fields, so custom keys can override if desired
- preserved through ``with_dimensions`` / ``with_measures``
- YAML extended format accepts a ``metadata:`` block on dimensions,
  measures, and calculated measures

Co-authored-by: boringdata <boringdata@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Revert "feat: add MCP period comparison workflow (boringdata#238)"

This reverts upstream commit e79000e to keep our MCP layer focused on
the FastMCP 3.0 surface we maintain locally. The non-MCP query.py /
expr.py changes in that commit were tightly coupled to the period
comparison feature and not consumed elsewhere yet.

* fix: restore mcp-code-mode in dev extras + reconcile uv.lock

Upstream's pyproject.toml didn't include mcp-code-mode in the dev
group, so the merge dropped it. Our test_dependency_groups suite
enforces that every optional extra is reachable from dev. Re-add it
and regenerate uv.lock against the merged pyproject.

---------

Co-authored-by: Hussain Sultan <hussainz@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Julien Hurault <julien.hurault@sumeo.io>
Co-authored-by: boringdata <boringdata@users.noreply.github.com>
Co-authored-by: boringdata-agent <boringdata-agent@users.noreply.github.com>
Co-authored-by: ghoersti <45543592+ghoersti@users.noreply.github.com>
Co-authored-by: ghoersti <ghoersti@users.noreply.github.com>
Co-authored-by: Dan Lovell <dlovell@gmail.com>
Co-authored-by: matthew fili <fili@matthews-MacBook-Pro.local>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ValueError when executing joins on non-xorq-vendored backends (e.g. BigQuery)

3 participants