[pull] master from cube-js:master#523
Merged
Merged
Conversation
… CubeScan (#10977) * feat(cubesql): merge view joins on shared cube member into single CubeScan Generalize the push-down-cube-join rewrite so that a join between two CubeScans (typically views) on a dimension that resolves to the same underlying cube member is merged into a single CubeScan, just like the existing __cubeJoinField cube-to-cube join. A view dimension keeps its original cube.dimension path in alias_member, which is used to detect that both sides of the equi-join reference the same shared key. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * test(cubesql): cover view join merge on shared cube member Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * test(cubesql): add group-by view join query for shared cube member Mirror the motivating query exactly: SELECT c.customer_city, measure(o.revenue), measure(c.avg_age) FROM customers_view c LEFT JOIN orders_view o ON o.customer_city = c.customer_city GROUP BY 1 and assert it merges into a single grouped multi-fact CubeScan. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * feat(cubesql): respect inner/left/right join semantics for view joins When merging a join between two views on a shared cube member, the downstream multi-fact query is rendered as a FULL OUTER JOIN over the shared key. To recover the requested join semantics, the rewrite now adds a measure 'set' filter on each side that must be present: - INNER: both sides required - LEFT: left side required - RIGHT: right side required - FULL: no extra filter Branch presence is detected via a measure of the side (the grouping key is COALESCEd across sides downstream, so it cannot tell sides apart). Covered with left/inner group-by tests. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * refactor(cubesql): use join key (not a measure) for view-join presence filter Detect side presence with the side's join-key dimension instead of an arbitrary measure. The join key is always available and is the actual shared-key marker, avoiding the nullable-measure caveat and the case where a side has no selected measure. - LEFT: left join key must be set - RIGHT: right join key must be set - INNER: both join keys must be set - FULL: no extra filter Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * feat(cubesql): only merge view joins when the join key is fully within dimensions Make the merge gate explicit: the entire join key must resolve to dimensions (or time dimensions) on both sides and to the same underlying cube member. A join key that touches a measure/segment/etc. is rejected and the join falls back to normal (non-merged) handling. Add a negative test that joining two views on measures is not merged. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * fix(cubesql): gate view-join test module with cfg(test); drop unused var - Add #[cfg(test)] to the test_cube_join_views module so it is not compiled into non-test builds (fixes unresolved pretty_assertions and unused-import errors under clippy -D warnings and the native builds). - Remove the unused right_filters_var from push_down_cube_join. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * chore: re-trigger CI (flaky Windows native + concurrency-canceled redshift) Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * feat(cubesql): only merge view joins under aggregate grouping by the join key Move the shared-member view-join merge out of push_down_cube_join (which runs on the always-ungrouped raw join) into a new rule that matches an Aggregate over the join. The merge now only fires when: - the query is grouped (an Aggregate sits over the join), and - the GROUP BY is exactly the shared join key. This rejects ungrouped queries (e.g. SELECT * over the join) and queries that group by a non-join-key dimension, both of which would otherwise produce an incorrect multi-fact pushdown. push_down_cube_join is restored to its original __cubeJoinField-only behavior. Tests: grouped left/inner merge (with join-key set filters); ungrouped, group-by-mismatch, and measure-key joins are not merged. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * feat(cubesql): gate view-join merge on the Tesseract SQL planner The shared-member view-join merge produces a multi-fact query that is only handled correctly by the Tesseract SQL planner (FULL OUTER stitch over the shared key). Add an `enable_tesseract_sql_planner` config option (read from CUBEJS_TESSERACT_SQL_PLANNER) and only fire the rewrite when it is enabled. Add a test that the merge does not happen when Tesseract is disabled. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * docs: document multi-fact queries via SQL API view joins Document that joining two views on a dimension that resolves to the same underlying cube member (grouped by that key) triggers a multi-fact query in the SQL API, including the join-type semantics (inner/left/right/full) and the Tesseract requirement. Add the behavior to the multi-fact views page and a cross-referencing section in the SQL API joins reference. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * fix(cubesql): require all join-key columns on a side to share one cube/view A CubeScan can expose members from multiple cubes/views, so enforce that every join-key column on each side resolves to the same cube/view. A mismatch would make the merged join hint ambiguous, so such joins are no longer merged. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * refactor(cubesql): build view-join presence filters only on successful merge; test RIGHT/FULL Address review nits: - Construct the join-semantics set filters and mutate subst inside the innermost iteration, right before returning true, so a false return never leaves a stale subst entry or orphan filter e-nodes. - Add RIGHT JOIN and FULL JOIN tests to lock in the join-type table (right join key set filter; no filter for full). (The composite-key single-cube-per-side check was already added in a prior commit.) Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * feat(cubesql): MultiFactJoinWrapper for N-way view joins and filter push-down Introduce a MultiFactJoinWrapper intermediate egraph node so the shared-member view-join rewrite is no longer a single aggregate-bound rule. The rewrite now splits into: - shared-member-join-to-wrapper: Join(CubeScan, CubeScan) -> wrapper(CubeScan) - shared-member-join-extend-wrapper: Join(wrapper(CubeScan), CubeScan) -> wrapper(CubeScan), enabling joins of 3+ views - multi-fact-join-wrapper-filter-push-down: Filter(wrapper) -> wrapper(Filter), pushing WHERE/ON filters into the merged scan - aggregate-multi-fact-join-wrapper: unwrap only when GROUP BY matches the recorded join key The wrapper records the join key (as underlying cube members) so the finalize rule can verify the GROUP BY, while joins and filters compose beforehand. Adds tests for 3-way and 4-way FULL joins, a WHERE filter, and an ON-clause filter, in addition to the existing 2-way LEFT/INNER/RIGHT/FULL coverage. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * docs: document N-way view joins and filter support in the SQL API Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * fix(cubesql): address review nits on MultiFactJoinWrapper rewrite - Remove duplicate #[allow(clippy::too_many_arguments)] on merge_shared_member_join - Document the left-deep-only assumption of shared-member-join-extend-wrapper (right-associative a JOIN (b JOIN c) is not chained) - Document that finalize only accepts plain-column GROUP BY (wrapped exprs like DATE_TRUNC fall back to standard handling) - Add a 3-way LEFT join test pinning per-pass presence-filter accumulation through the extend-wrapper rule Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * feat(cubesql): support view joins on date_trunc / shared time dimensions A join written directly on DATE_TRUNC (ON DATE_TRUNC(g, a.ts) = DATE_TRUNC(g, b.ts)) is lowered by the planner to Filter(<eq>, CrossJoin(...)) rather than a column equi-join, so it never reached the shared-member merge. Add a shared-time-member-cross-join-to-wrapper rule that recognizes this shape, resolves both truncated columns to the same underlying time member at the same granularity, and merges into an INNER multi-fact CubeScan (both keys marked present). Grouping by DATE_TRUNC already worked via referenced-column collapse; the finalize comment is corrected accordingly. Adds time dimensions to the test views and tests for join-on-raw-time + GROUP BY DATE_TRUNC (LEFT) and join-on-DATE_TRUNC + GROUP BY DATE_TRUNC (INNER). Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * test(cubesql): cover composite-key view joins (multiple dimensions) Add a shared customer_state dimension and tests for joining two views on a composite key (customer_city + customer_state) and grouping by both, plus a negative test that a partial GROUP BY (only one of the two join keys) does not merge. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * feat(cubesql): support view joins on date_trunc combined with a dimension The planner turns 'ON a.dim = b.dim AND DATE_TRUNC(g, a.ts) = DATE_TRUNC(g, b.ts)' into Filter(<trunc eq>, Join(a.dim = b.dim, ...)). The column join becomes a MultiFactJoinWrapper; add a multi-fact-join-wrapper-absorb-time-key rule that folds the truncated time member into the wrapper's recorded join key (marking both time columns present, since a post-join equality is INNER on that key) so a query grouped by both the time dimension and the dimension merges into one multi-fact CubeScan. Adds a test for the mixed DATE_TRUNC + dimension join. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * fix(cubesql): store join-key granularity and require it to match GROUP BY Per review: the multi-fact stitch happens at the GROUP BY grain, so the join key's granularity must equal the GROUP BY granularity for a time member. Previously only the underlying member name was recorded, so joining on DATE_TRUNC('month', ...) while grouping by DATE_TRUNC('day', ...) merged and stitched at day grain, diverging from the written join. Store join_members as (underlying member, Option<granularity>) on the wrapper (None for plain dimensions, Some(grain) for DATE_TRUNC time keys), and at finalize extract each GROUP BY expression's granularity from its original_expr (rather than referenced_expr, which drops the grain) and require the full (member, granularity) sets to match. Replaces the raw-time-join test with a granularity-mismatch negative test. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * docs(cubesql): require grained join for time-dimension multi-fact merge Align docs with the strict (member, grain) match: the join key granularity must equal the GROUP BY granularity, so a raw-time-column join does not pair with a DATE_TRUNC group-by. Remove the now-unsupported 'join on the raw time column, group by DATE_TRUNC' example and document the requirement. Add a negative test pinning that the raw-time join + DATE_TRUNC group-by is not merged. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com>
…lic surface (#11041) * docs(api): add AI Endpoints chat spec to Platform API - Add an "AI Endpoints" section to the Platform API tab with the Chat API (POST /chat/stream-chat-state), extracted from the ai-engineer OpenAPI spec into api-reference/chat.yaml via scripts/extract-chat.js. - Document Bearer auth on the REST and SCIM APIs in authentication.mdx. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs(api): add Core Data endpoints, restructure API tab, hide internal routes Build out the API docs tab into three families — Core Data, AI, and Platform — and trim it to the curated public surface. - Core Data API: extract POST /v1/load and GET /v1/meta from the in-repo cubejs-api-gateway/openspec.yml into api-reference/core-data.yaml via scripts/extract-core-data.js, with the deployment data-API host (cubecloudapp.dev, region + deployment) and Api-Key/JWT auth. Rename /v1/load to "JSON query" and hand-author the "SQL query" endpoint (POST /v1/cubesql), including an enum + default for the cache parameter. - Reorder the API tab: Getting Started, Core Data Endpoints, AI Endpoints, Platform Endpoints. Flatten single-endpoint groups; rename the tab to "API"; rename SCIM groups to Users/Groups; drop the Changelog group for now. - Hide internal/incomplete Platform routes (Groups, User Groups, Agents, Metadata, Users, User Attributes, Resource Policies, App Theme, AI Engineer) from docs.json and api.yaml, and add an EXCLUDE_OPERATIONS list to scripts/extract-api.mjs so a re-pull won't resurface them. - Introduction: add a three-API overview with per-family auth, move Available endpoints (now covering all three APIs, with per-API base URLs) ahead of the client-library sections, and document both the Core Data (@cubejs-client/core) and Platform (@cube-dev/platform-client) client libraries. - Authentication: document Bearer on REST/SCIM and repoint the SCIM base-URL link to #platform-api. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs(api): consolidate client libraries section, broaden intro description - Merge the Core Data and Platform client-library sections under a single "Client libraries" heading with one subsection each. - Update the Introduction description to cover all three API families (Core Data, AI, Platform) instead of the Platform/SCIM-only framing. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )