fix: guard join rebind against non-xorq backends (#242) by hachej · Pull Request #243 · boringdata/boring-semantic-layer

hachej · 2026-04-20T19:18:19Z

Summary

Joins on plain ibis backends xorq can't wrap (BigQuery, Databricks, …) crashed inside _rebind_join_backends with ValueError: Don't know how to handle type <class 'ibis.expr.types.relations.Table'> from xorq's walk_nodes.
Mirror the try/except guard already used by _unify_backends so the rebind is skipped for plain ibis trees, letting ibis execute the join natively.
Add a regression test that forces _ensure_xorq_table to return a plain ibis table (simulating BigQuery's unsupported wrapping path) and verifies join_one still executes end-to-end.

Fixes #242.

Test plan

uv run pytest src/boring_semantic_layer/tests/test_plain_ibis.py — 12 passed, including new test_join_one_unsupported_backend.

🤖 Generated with Claude Code

_rebind_join_backends called walk_nodes unconditionally, which raises ValueError on plain ibis expressions that xorq can't wrap (e.g. BigQuery). Mirror the try/except already present in _unify_backends so joins fall back to native ibis execution. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: use hashing_tag instead of tag so BSL metadata contributes to content hash Previously, .tag() produced Tag nodes that xorq strips before hashing, meaning two models with different measures/dimensions on the same table would hash identically. Switching to .hashing_tag() (HashingTag) ensures the serialized BSL metadata is included in the content hash. Adds tests verifying different measures produce different hashes and identical models produce the same hash. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * deps * Combine boringdata#206 and boringdata#207: YAML calculated_measures + model-prefix fixes (boringdata#211) * feat: support calculated_measures and .all() in YAML config Add `calculated_measures` section to YAML schema for derived metrics that reference other measures by name. Expressions are evaluated at runtime against MeasureScope, which also enables `.all()` for percent-of-total window aggregation patterns. Closes boringdata#202, closes boringdata#115 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: support model prefix in with_dimensions/with_measures after joins Add prefix-aware column navigation to ColumnScope and MeasureScope so that dimension/measure lambdas can use model prefixes (e.g., `t.flights.carrier`) after joins. Previously this only worked in `.filter()`. Also adds _DimensionTableProxy for dimension evaluation with merged dimension map awareness. Closes boringdata#136 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: raise on invalid prefixed column/dimension lookups instead of silent fallback Proxy classes (_ColumnPrefixProxy, _DimPrefixProxy) now raise AttributeError when a prefixed name isn't found, preventing silent resolution to unrelated unprefixed columns. Also preserves AttributeError from proxy evaluation in Dimension.__call__ for proper dependency resolution. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: keep helpful dimension error formatting with proxy fallback --------- Co-authored-by: boringdata <boringdata@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * Support plain ibis backends without xorq wrapping (boringdata#217) Make BSL core features (query, filter, aggregate, join, execute) work with any ibis backend, even those not registered in xorq (e.g. Databricks). The xorq conversion in _ensure_xorq_table() now gracefully falls back to plain ibis when the backend isn't xorq-supported. Profile loading also falls back to ibis.connect() for unregistered backends. In query.py, replace hard xorq.api imports with a _get_ibis_api() helper that prefers xorq's vendored ibis when available but falls back to plain ibis. This ensures filter expressions are built with the same ibis implementation as the table they resolve against. Serialization features (to_tagged/from_tagged) still require xorq. Closes boringdata#216 Co-authored-by: boringdata-agent <boringdata-agent@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add derived_dimensions for time dimensions (boringdata#215) * feat: support derived_dimensions for time dimensions * fix: update xorq API calls and test infrastructure for xorq 0.3.12 - Rename hashing_tag() → tag() across production code and tests (xorq API rename) - Add SortKey arg→expr compatibility shim for ibis↔xorq parameter mismatch - Update MCP tests for fastmcp v3 API (_tool_manager._tools → await get_tool) - Add skip markers for optional deps (altair, plotly, mcp, malloy) - Fix non-deterministic row ordering in unnest test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: handle both ibis 11 (expr) and ibis 12 (arg) SortKey attribute names Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: remove unnecessary test skip markers Dev dependencies are installed — no need for conditional skips. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: boringdata <boringdata@users.noreply.github.com> Co-authored-by: boringdata-agent <boringdata-agent@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add thin HTTP transport for BSL models (boringdata#214) * Add HTTP server transport for BSL models * feat: support per-dimension time grains in HTTP server API Add a `time_grains` dict field to QueryRequest that maps dimension names to individual TimeGrain values (e.g., {"order_date": "TIME_GRAIN_MONTH", "ship_date": "TIME_GRAIN_QUARTER"}). This enables different time grains per dimension in a single query, matching the capability from PR boringdata#212. The existing `time_grain` field is preserved for backward compatibility. A model validator prevents specifying both fields simultaneously. When `time_grains` is provided, grain transformations are applied at the server layer using model.with_dimensions() before calling query(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: use simple grain names in time_grains API field Accept "month", "quarter", etc. instead of "TIME_GRAIN_MONTH" — matches the HTTP-friendly convention from PR boringdata#212 and normalizes internally via _make_grain_id. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: move time_grains support into core query() Add time_grains param to query() for per-dimension grain control (e.g., time_grains={"order_date": "TIME_GRAIN_MONTH", "ship_date": "TIME_GRAIN_QUARTER"}). The server layer now just normalizes short names and passes through. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: accept short grain names ("month") directly in core query() query() now accepts both "month" and "TIME_GRAIN_MONTH" via _normalize_grain(), so the server layer needs no conversion. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update time grain docs for short names and per-dimension grains - Core query() now accepts short grain names ("month") natively - Server passes time_grains straight through, no normalization needed - MCP tool now exposes time_grains parameter - Updated all docs/prompts/help topics with short-form examples and time_grains (per-dimension) usage Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add missing time_grains param to second .query() method in expr.py The SemanticJoin/SemanticAggregate .query() method was missing the time_grains parameter — only the SemanticModel one had it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: exit with non-zero status when server deps are missing cmd_serve now calls sys.exit(1) instead of bare return when FastAPI/uvicorn are not installed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: boringdata <boringdata@users.noreply.github.com> Co-authored-by: boringdata-agent <boringdata-agent@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version to 0.4.0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version to 0.3.10 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: route dimension-only queries to source table on joins (boringdata#225) * fix: route dimension-only queries to source table on joins (boringdata#224) When a star-schema join is queried with only dimensions (no measures), and all requested dimensions originate from a single joined table, bypass the fact table and query the dimension table directly. This ensures dimension members with zero matching fact rows are returned. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add logging, filter support, and comprehensive tests for dim-only shortcut - Replace bare `except Exception: pass` with logger.debug so fallbacks are traceable - Support pre-aggregation filters that only reference the target dimension table (filters on other tables correctly disable shortcut) - Add 9 additional tests: fact-table dims, JSON dict filters (documents fallback behavior), derived dimensions, SQL verification, limit composition, method-chaining API, and filter edge cases - Tighten type annotations on _dimension_only_source_table Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: boringdata <boringdata@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version to 0.3.11 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: deferred join_one — dimension lookups after aggregation (boringdata#220) * feat: grain-aware join_one via is_entity dimensions (boringdata#218) When both models declare is_entity dimensions and their entity sets differ, join_one auto-upgrades to join_many so BSL's pre-aggregation logic aligns the grains before joining. This prevents fan-out and double-counting in multi-fact star schemas (e.g. monthly revenue + daily hours) without requiring manual pre-aggregation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: deferred join_one — dimension lookups after aggregation (boringdata#64) When a join_one dimension table's PK (is_entity) matches the join key and no measures from that table are used, BSL now defers the join until after aggregation. This avoids unnecessary join expansion before GROUP BY for pure dimension lookups. Also refines grain mismatch detection (boringdata#218) to skip pure dimension tables (no measures), routing them to the deferred path instead of the pre-aggregation path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: prevent deferred join when filters reference dimension table Filters on deferred dimension columns would be silently dropped, producing incorrect results. Now _find_deferrable_joins checks if any filter references the right table's columns and skips deferral if so, falling back to the standard join-before-aggregate path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: use original join predicate for deferred joins (composite key safety) Instead of sorting left/right join keys independently and zipping them (which can mismatch columns with different names), store and reuse the original on= predicate for the post-aggregation LEFT JOIN. This preserves the correct key pairing for composite joins. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: serialize join cardinality in hashing_tag metadata Include cardinality ("one"/"many"/"cross") in the SemanticJoinOp serialization so that join_one and join_many produce different hashes in xorq's content-addressed naming. Also fix reconstruct to use the correct join method based on serialized cardinality. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: handle derived dimensions and add edge case tests - Derived dimensions (e.g., t.name.upper()) on deferred tables are now materialized via mutate before the post-agg join, instead of being silently dropped. - Add edge case tests: snowflake chained join_one, derived dimensions, and prefixed filters on deferred tables. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: boringdata <boringdata@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: prune unused dimension joins from queries (boringdata#228) * feat: prune unused dimension joins from queries (boringdata#227) When a query only references a subset of the joined tables (via dimension keys or measure names), joins to unreferenced tables are now skipped. This avoids expensive joins to dimension tables that contribute nothing to the result — e.g. a pure SUM() on a fact table no longer joins 4 dimension tables. The implementation extracts table prefixes from the query's keys and aggs, passes them as parent_requirements to SemanticJoinOp.to_untagged(), which recursively prunes right-side leaf tables not in the needed set. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: don't prune joins in pre-aggregation path The pre-agg path needs all tables for dimension bridges — passing parent_requirements=None ensures no pruning happens there. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: only prune left-join join_one, add edge case tests Guard pruning by cardinality and join type: only prune join_one with how="left". Inner joins act as row filters and must not be removed. join_many/join_cross are never pruned (join_many is intercepted by the pre-agg path; join_cross changes row counts). Adds edge case tests: - Inner join with orphan rows: pruning correctly preserves inner join - Filter on dimension table: pruning correctly disabled - SQL verification: pruned tables absent from compiled SQL Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: preserve join pruning correctness --------- Co-authored-by: boringdata <boringdata@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: repair demo_bsl_v2 script * feat: grain-aware join_one via is_entity dimensions (boringdata#219) * fix: default missing cardinality to 'many' in join deserialization (boringdata#223) Pre-existing tagged payloads serialized before cardinality was emitted have no 'cardinality' key. Defaulting to 'one' would silently skip pre-aggregation on former join_many joins. Default to 'many' instead since it is the safe superset. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: limit grain upgrades to multi-fact joins --------- Co-authored-by: boringdata <boringdata@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: cover read-only joined dimensions (boringdata#237) Co-authored-by: boringdata <boringdata@users.noreply.github.com> * chore: prepare v0.3.12 release * feat: add MCP period comparison workflow (boringdata#238) Co-authored-by: boringdata <boringdata@users.noreply.github.com> * feat: expose BSL TagHandler via xorq.from_tag_node entry point (boringdata#235) * feat: expose BSL TagHandler via xorq.from_tag_node entry point (XOR-295) Move BSL-specific tag handling out of xorq: BSL now ships a TagHandler (boring_semantic_layer.xorq_integration:bsl_tag_handler) and registers it under the xorq.from_tag_node entry point group, so xorq's registry discovers it automatically without hard-coded BSL knowledge. - extract_metadata walks nested tag-node metadata down to the innermost SemanticTableOp and returns the sidecar dict (dimension/measure names). - from_tag_node reconstructs the base SemanticModel from the innermost SemanticTableOp source, not the full query chain, so callers can issue fresh .query() calls. - Pin xorq to 4b3d8033 via [tool.uv.sources] until a release ships the xorq.expr.builders TagHandler registry. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: move xorq_integration to serialization.tag_handler Rename the module to live next to the serialization code it depends on (extract_xorq_metadata, reconstruct_bsl_operation, BSLSerializationContext, thaw). The parent package now carries the "serialization" context, so the module name can focus on its role — a TagHandler. Update the xorq.from_tag_node entry point accordingly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump xorq to >=0.3.19 and drop git pin xorq 0.3.19 ships the xorq.expr.builders TagHandler registry, so the temporary [tool.uv.sources] git pin is no longer needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: ghoersti <ghoersti@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: treat UnboundTable as a leaf node in _reconstruct_table _reconstruct_table() walks the expression tree looking for leaf nodes (Read, InMemoryTable, DatabaseTable) but did not recognize UnboundTable. Add it to the set of recognized leaf node types, handled the same way as DatabaseTable. Closes boringdata#240 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: safe fallthrough in _reconstruct_table for unknown expression types The fallthrough at the end of _reconstruct_table() still called .to_expr() on xorq_expr which is already a Table expression. Return it directly to avoid AttributeError on any unrecognized leaf node type. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Revert "fix: safe fallthrough in _reconstruct_table for unknown expression types" This reverts commit 7bc3eae. * fix: preserve join cardinality on round-trip, fix pre-agg grain for derived dims _reconstruct_join always used join_many, losing the original cardinality. This caused join_one models to take the pre-aggregation path after to_tagged → from_tagged, which computed wrong grain for derived dimensions (e.g. dep_time.truncate("M") used raw dep_time instead of the truncated value). - Serialize cardinality in _extract_join metadata - Restore correct join_one/join_many in _reconstruct_join based on cardinality - Materialize derived dimensions on raw_tbl in pre-agg grain computation - Add regression test for round-tripped join_one with derived dim + prefixed measure Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: guard join rebind against non-xorq backends (boringdata#242) (boringdata#243) _rebind_join_backends called walk_nodes unconditionally, which raises ValueError on plain ibis expressions that xorq can't wrap (e.g. BigQuery). Mirror the try/except already present in _unify_backends so joins fall back to native ibis execution. Co-authored-by: boringdata <boringdata@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: add reemit to BSL TagHandler for catalog rebuild (boringdata#244) * feat: add reemit to BSL TagHandler for catalog rebuild support Register a `reemit` callable on `bsl_tag_handler` so that `xorq catalog replay --rebuild` can re-emit BSL-tagged subtrees with translated sources. The rebuild logic re-stamps the original tag metadata onto the rebuilt source—no metadata manipulation needed because BSL's tag captures the full query recipe. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: guard reemit wiring against xorq without the field CI installs xorq 0.3.19 which doesn't have `reemit` on TagHandler yet. Conditionally pass it only when the field exists, and skip rebuild tests when it's absent. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * ci: install xorq from feat/catalog/replay-rebuild branch Temporary: install xorq from the feature branch that has the `reemit` field on TagHandler so rebuild tests run in CI instead of being skipped. Revert once the xorq release ships. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: guard reemit against missing parent and add edge-case tests Fail fast with a clear ValueError when tag_node.parent is None instead of a bare AttributeError. Add non-identity transform tests and a missing-parent test to cover paths the original suite missed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * revert: remove xorq branch pin from CI xorq's ci-downstream-bsl workflow handles testing BSL against in-flight xorq changes, so BSL CI doesn't need to pin a branch. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * feat: metadata field on Dimension and Measure (boringdata#245) (boringdata#246) Adds a free-form ``metadata: dict`` field to ``Dimension`` and ``Measure`` that is merged into ``to_json()`` output. Downstream consumers (LLM/MCP tools, dashboards, docs) can attach domain-specific hints like ``format``, ``unit``, ``example_values`` or ``is_additive`` without subclassing. - ``metadata`` defaults to ``{}`` and is excluded from equality/hash - merged after base fields, so custom keys can override if desired - preserved through ``with_dimensions`` / ``with_measures`` - YAML extended format accepts a ``metadata:`` block on dimensions, measures, and calculated measures Co-authored-by: boringdata <boringdata@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Revert "feat: add MCP period comparison workflow (boringdata#238)" This reverts upstream commit e79000e to keep our MCP layer focused on the FastMCP 3.0 surface we maintain locally. The non-MCP query.py / expr.py changes in that commit were tightly coupled to the period comparison feature and not consumed elsewhere yet. * fix: restore mcp-code-mode in dev extras + reconcile uv.lock Upstream's pyproject.toml didn't include mcp-code-mode in the dev group, so the merge dropped it. Our test_dependency_groups suite enforces that every optional extra is reachable from dev. Re-add it and regenerate uv.lock against the merged pyproject. --------- Co-authored-by: Hussain Sultan <hussainz@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Julien Hurault <julien.hurault@sumeo.io> Co-authored-by: boringdata <boringdata@users.noreply.github.com> Co-authored-by: boringdata-agent <boringdata-agent@users.noreply.github.com> Co-authored-by: ghoersti <45543592+ghoersti@users.noreply.github.com> Co-authored-by: ghoersti <ghoersti@users.noreply.github.com> Co-authored-by: Dan Lovell <dlovell@gmail.com> Co-authored-by: matthew fili <fili@matthews-MacBook-Pro.local>

hussainsultan merged commit 9a75dc6 into main Apr 21, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: guard join rebind against non-xorq backends (#242)#243

fix: guard join rebind against non-xorq backends (#242)#243
hussainsultan merged 1 commit intomainfrom
fix/issue-242-plain-ibis-join-backends

hachej commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

hachej commented Apr 20, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants