Skip to content

refactor(cubestore): Introduce QueryResult enum for SqlService execute#10817

Merged
ovr merged 5 commits intomasterfrom
refactor/cubestore-stream-part-1
May 5, 2026
Merged

refactor(cubestore): Introduce QueryResult enum for SqlService execute#10817
ovr merged 5 commits intomasterfrom
refactor/cubestore-stream-part-1

Conversation

@ovr
Copy link
Copy Markdown
Member

@ovr ovr commented May 5, 2026

No description provided.

@github-actions github-actions Bot added cube store Issues relating to Cube Store rust Pull requests that update Rust code labels May 5, 2026
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 5, 2026

Claude finished @ovr's task in 2m 34s —— View job


Code Review: QueryResult enum for SqlService execute

  • Gather context and understand the PR
  • Review rust/cubestore/cubestore/src/sql/mod.rs (main changes)
  • Review remaining changed files
  • Post comprehensive review

Summary

This PR introduces a QueryResult enum that wraps the return type of SqlService::exec_query and exec_query_with_context, changing them from Result<Arc<DataFrame>, CubeError> to Result<QueryResult, CubeError>. The enum has two variants: Frame (existing materialized behavior) and Stream (new, for lazy RecordBatch streaming). This is a preparatory refactor — "part 1" of a streaming initiative — where all existing code paths still use Frame, and callers now call .collect().await to get back the Arc<DataFrame>.

Overall Assessment

This is a solid, well-structured refactor. The approach of introducing the enum first with only the Frame variant in use, then wiring in Stream in a follow-up, is the right way to split this kind of change. The mechanical updates across 10 files are consistent and correct.

Code Quality

Strengths:

  • Clean enum design with appropriate Debug, PartialEq, and From impls
  • collect() correctly uses spawn_blocking for the CPU-bound batches_to_dataframe conversion
  • All callers are updated consistently — no missed spots
  • The BasicSqlClient adapter in cubestore-sql-tests cleanly bridges the new API to the old SqlClient trait

Suggestions (see inline comments):

  • Duplicate methods: schema() and get_schema() are identical — remove one
  • Unused From impls: The From<DataFrame> and From<Arc<DataFrame>> impls are defined but never used. All 22+ call sites manually write QueryResult::Frame(Arc::new(...)). Using .into() would significantly reduce verbosity
  • Empty frame helper: Ok(QueryResult::Frame(Arc::new(DataFrame::new(vec![], vec![])))) appears ~10 times. A QueryResult::empty() constructor would clean this up considerably
  • PartialEq contract: The impl returns false for any Stream comparison, violating reflexivity. This won't cause issues today, but consider documenting this or removing the impl

Security

No security concerns. This is a pure internal refactoring — no new input handling, authentication changes, or external API surface changes.

Performance

No performance regressions. The Frame variant path is a zero-cost wrapper (just enum wrapping + unwrapping). The Stream path's collect() correctly offloads batches_to_dataframe to a blocking thread pool via spawn_blocking, which is the right call since that conversion is CPU-bound.

One thing to watch in part 2: the MySQL handler (mysql/mod.rs) and HTTP handler (http/mod.rs) both eagerly .collect() the result. When Stream results are introduced, these will be natural candidates for true streaming to the client protocol.

Testing

All existing tests are updated with the .collect().await pattern. The mechanical nature of these changes (119 .collect().await additions in mod.rs alone) is the expected consequence of this refactoring approach. The test assertions still verify the same data — correctness is preserved.

No new tests are needed for this PR specifically, since Stream is not yet constructed anywhere. Part 2 should include tests that exercise the Stream → collect() path.

Documentation

The PR description is empty — would be helpful to add a brief note about the motivation (streaming support) and that this is part 1 of a multi-PR effort. This context helps future readers understand why Stream exists but is unused.

Inline Comments Posted

I've left 6 inline comments on specific code locations with detailed suggestions.


Comment thread rust/cubestore/cubestore/src/sql/mod.rs
Comment on lines +140 to +148
impl PartialEq for QueryResult {
fn eq(&self, other: &Self) -> bool {
match (self, other) {
(QueryResult::Frame(a), QueryResult::Frame(b)) => a == b,
// Streaming results carry a live BoxStream and aren't comparable.
_ => false,
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PartialEq impl silently returns false for any comparison involving a Stream variant — including Stream vs Stream. This is correct behavior (streams aren't comparable), but it breaks the reflexivity contract of PartialEq (x == x should be true).

This won't cause issues today since Stream is never constructed yet, but once it is, any assert_eq! on a Stream result will fail unexpectedly. Consider either:

  1. Not implementing PartialEq at all (compile errors are better than silent wrong answers)
  2. Adding a doc comment explaining the intentional contract violation

Comment thread rust/cubestore/cubestore/src/sql/mod.rs
Comment thread rust/cubestore/cubestore/src/mysql/mod.rs
Comment thread rust/cubestore/cubestore/src/sql/mod.rs Outdated
Comment thread rust/cubestore/cubestore/src/sql/mod.rs
@ovr ovr marked this pull request as ready for review May 5, 2026 13:53
@ovr ovr requested a review from a team as a code owner May 5, 2026 13:53
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 5, 2026

Claude finished @ovr's task in 3m 15s —— View job


Code Review: QueryResult enum for SqlService execute (Post-revision)

  • Gather context and review latest commits
  • Review current state of rust/cubestore/cubestore/src/sql/mod.rs
  • Review remaining changed files
  • Post comprehensive review

Summary

This PR introduces a QueryResult enum wrapping the return type of SqlService::exec_query and exec_query_with_context, changing from Result<Arc<DataFrame>, CubeError> to Result<QueryResult, CubeError>. The enum has two variants: Frame (all existing code paths) and Stream (new, for lazy RecordBatch streaming). This is part 1 of a streaming initiative — callers now call .collect().await to materialize results.

Since the first review, the author addressed suggestions with 3 follow-up commits:

  • 21c4c2f — Adopted .into() and DataFrame::empty() across SqlServiceImpl, significantly reducing verbosity
  • c0794cc — Removed the duplicate get_schema() method on QueryResult
  • 470b660 — Changed test closures to return Result<(), CubeError> with proper error propagation in the test harness

Overall Assessment

This is a well-executed, clean refactor. The enum-first approach (introducing QueryResult with only Frame in use, wiring Stream in a follow-up) is the right way to split this. The follow-up commits addressed the first review's key suggestions effectively.

Code Quality

Strengths:

  • Clean enum design with Debug, PartialEq (with documented limitation for Stream), and From impls
  • collect() correctly uses spawn_blocking for CPU-bound batches_to_dataframe
  • All callers updated consistently — no missed spots across 15 files
  • BasicSqlClient in cubestore-sql-tests cleanly bridges the new QueryResult API to the existing SqlClient trait
  • The .into() adoption is thorough in mod.rs — all 30+ call sites use it
  • DataFrame::empty() eliminates repetitive DataFrame::new(vec![], vec![]) in the main SQL service

Minor suggestions (non-blocking):

  • cachestore.rs still has 8 instances of Arc::new(DataFrame::new(vec![], vec![])) that could use DataFrame::empty() for consistency (see inline comment). These are internal methods returning Arc<DataFrame>, not QueryResult, so it's cosmetic
  • QueryResult::schema() is defined but currently has no callers — presumably reserved for part 2 when Stream variants need schema access without collecting (see inline comment)
  • The PartialEq comment from the first review was addressed — the _ => false branch now has a clarifying comment about streams not being comparable

Test Harness Improvement

The Result<(), CubeError> change to test functions (config/mod.rs) is a nice ergonomic win beyond the core refactor:

match timeout_at(Instant::now() + timeout, test_fn(services.clone())).await {
    Err(_) => panic!("Test timed out after {} seconds", timeout.as_secs()),
    Ok(Err(e)) => panic!("Test failed: {}", e.display_with_backtrace()),
    Ok(Ok(())) => {}
}

This means test failures now surface with full backtraces instead of plain unwrap() panics. The pattern is applied consistently across start_test, start_migration_test, start_test_worker, and start_test_with_options. Some existing tests still use .unwrap() instead of ? inside their closures (e.g., cleanup.rs), which could be converted in a follow-up for full benefit.

Security

No concerns. Pure internal refactoring — no new input handling, auth changes, or API surface changes.

Performance

No regressions. The Frame path is zero-cost (enum wrap/unwrap). The Stream path's collect() correctly offloads batches_to_dataframe to a blocking thread pool. The MySQL and HTTP handlers eagerly .collect() — these are natural candidates for true streaming in part 2.

Testing

All 120 .collect().await additions in tests are mechanical and correct. Test assertions remain unchanged. No new tests needed since Stream is not yet constructed. Part 2 should include tests exercising the Stream → collect() path.

Nits

  • Commit message c0794cc has a typo: "methopd" → "method"

Comment thread rust/cubestore/cubestore/src/sql/mod.rs
Comment thread rust/cubestore/cubestore/src/remotefs/cleanup.rs
@codecov
Copy link
Copy Markdown

codecov Bot commented May 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 58.09%. Comparing base (87f4dc1) to head (0bfa3eb).
⚠️ Report is 3 commits behind head on master.

❗ There is a different number of reports uploaded between BASE (87f4dc1) and HEAD (0bfa3eb). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (87f4dc1) HEAD (0bfa3eb)
cubesql 1 0
Additional details and impacted files
@@             Coverage Diff             @@
##           master   #10817       +/-   ##
===========================================
- Coverage   78.79%   58.09%   -20.70%     
===========================================
  Files         470      216      -254     
  Lines       92135    16899    -75236     
  Branches     3424     3426        +2     
===========================================
- Hits        72597     9818    -62779     
+ Misses      19035     6578    -12457     
  Partials      503      503               
Flag Coverage Δ
cube-backend 58.09% <ø> (+<0.01%) ⬆️
cubesql ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ovr ovr merged commit 2baf0bc into master May 5, 2026
56 checks passed
@ovr ovr deleted the refactor/cubestore-stream-part-1 branch May 5, 2026 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cube store Issues relating to Cube Store rust Pull requests that update Rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants