chore(tesseract): resolve join trees once per query#10982
Conversation
Introduce JoinTree: a resolved join tree (BaseCube + compiled on_sql + multiplication_factor) built once by JoinTreeBuilder. MultiFactJoinGroups now caches Rc<JoinTree> instead of Rc<dyn JoinDefinition>, so join ON SQL is compiled a single time at group-build time rather than recompiled on every use during planning (make_join_logical_plan and collect_sub_query_dimensions now reuse the compiled on_sql). LogicalJoin is unchanged and assembled cheaply from a JoinTree. Move compile_join_condition into CommonUtils so both JoinTreeBuilder and resolve_join_members share it. Drop the lifetime from MultipliedMeasuresCollector in favor of Rc<JoinTree>.
Add JoinTreeCache, owned by QueryTools, keyed by JoinHints and storing (JoinKey, Rc<JoinTree>). build_groups resolves joins through it, so the same hints are not re-resolved across the many group builds in a single query — notably the per-pre-aggregation-candidate matching loop and the per-measure passes in full_key_aggregate_measures, which otherwise re-cross the JS bridge and recompile ON SQL for identical hints. get_or_build takes the build closure per call, so the cache holds no Rc<QueryTools> back-reference and forms no reference cycle. No invalidation: the join graph is immutable for a QueryTools lifetime.
|
Claude finished @waralexrom's task in 1m 39s —— View job Code reviewOverall this is a clean, well-scoped refactor. The split between Notes & minor suggestions
No correctness, security, or testing concerns. Test coverage ( |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #10982 +/- ##
==========================================
- Coverage 78.84% 78.82% -0.03%
==========================================
Files 470 470
Lines 93390 93438 +48
Branches 3466 3466
==========================================
+ Hits 73637 73649 +12
- Misses 19250 19286 +36
Partials 503 503
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Explain why the lookup borrow is dropped before build() runs and must not be folded into a single entry() call.
|
Claude finished @waralexrom's task in 42s —— View job Code review (re-run on synchronize)Verdict: LGTM — no blocking issues. 0 high, 0 medium, 2 low (carried over from prior review, non-blocking). Details
What changed since last review
Still-open (non-blocking) notes carried over from prior review
Pure nits ( Quality / security / perf / tests / docs
|
Summary
Resolve a query's join graph once instead of repeatedly during planning. Introduces a
JoinTree(resolved join with compiled ON SQL) thatMultiFactJoinGroupscaches, plus a per-queryJoinTreeCacheso identical join hints are not re-resolved across the many group builds in a single query.Changes
JoinTree(BaseCube + compiledon_sql+multiplication_factor), built once byJoinTreeBuilder;MultiFactJoinGroupsnow storesRc<JoinTree>instead ofRc<dyn JoinDefinition>, so ON SQL is compiled once at group-build time rather than recompiled inmake_join_logical_plan/collect_sub_query_dimensions.LogicalJoinis unchanged and assembled cheaply from aJoinTree.compile_join_conditionintoCommonUtils(shared byJoinTreeBuilderandresolve_join_members); drop the lifetime fromMultipliedMeasuresCollectorin favor ofRc<JoinTree>.JoinTreeCache(owned byQueryTools, keyed byJoinHints, storing(JoinKey, Rc<JoinTree>));build_groupsresolves through it. Eliminates repeated identical resolution — notably the per-pre-aggregation-candidate matching loop and the per-measure passes infull_key_aggregate_measures, which otherwise re-cross the JS bridge and recompile ON SQL. No invalidation needed: the join graph is immutable for aQueryToolslifetime.Testing
cargo checkclean, no warnings.multi_fact,subquery_dim_with_multi_fact,subquery_dim_with_multiplied_measure,transitive_joins,view_multi_fact.cargo fmt --checkclean.