Skip to content

perf(cubeorchestrator): Improve performance of get_vanilla_row (βˆ’66.8%, 3x)#10783

Merged
ovr merged 5 commits intomasterfrom
feat/rest-api-columnar-format
Apr 30, 2026
Merged

perf(cubeorchestrator): Improve performance of get_vanilla_row (βˆ’66.8%, 3x)#10783
ovr merged 5 commits intomasterfrom
feat/rest-api-columnar-format

Conversation

@ovr
Copy link
Copy Markdown
Member

@ovr ovr commented Apr 30, 2026

Build a VanillaColumnPlan once per request and walk it per row, instead of redoing alias->member, annotation, and member-name parsing per cell.

Pre-size the row IndexMap to skip incremental rehashes during fill.

Benchmark on my Apple M3 Max (TransformedData::transform vanilla path, cells/sec):

cols Γ— rows before after speedup
8 Γ— 1,000 5.51 Mc/s 16.36 Mc/s 2.97Γ—
8 Γ— 10,000 5.66 Mc/s 17.45 Mc/s 3.08Γ—
8 Γ— 50,000 5.16 Mc/s 15.58 Mc/s 3.02Γ—
8 Γ— 100,000 5.28 Mc/s 16.47 Mc/s 3.12Γ—
16 Γ— 1,000 5.70 Mc/s 17.19 Mc/s 3.02Γ—
16 Γ— 10,000 5.57 Mc/s 17.52 Mc/s 3.15Γ—
16 Γ— 50,000 5.60 Mc/s 17.25 Mc/s 3.08Γ—
16 Γ— 100,000 5.47 Mc/s 17.39 Mc/s 3.18Γ—
32 Γ— 1,000 5.76 Mc/s 16.91 Mc/s 2.94Γ—
32 Γ— 10,000 5.66 Mc/s 16.25 Mc/s 2.87Γ—
32 Γ— 50,000 5.90 Mc/s 16.48 Mc/s 2.79Γ—
32 Γ— 100,000 5.69 Mc/s 16.80 Mc/s 2.95Γ—
64 Γ— 1,000 5.05 Mc/s 17.45 Mc/s 3.45Γ—
64 Γ— 10,000 5.48 Mc/s 16.54 Mc/s 3.02Γ—
64 Γ— 50,000 5.71 Mc/s 16.99 Mc/s 2.97Γ—
64 Γ— 100,000 5.95 Mc/s 16.80 Mc/s 2.82Γ—

Average ~3.0Γ— throughput improvement (5.6 -> 16.8 Mcells/s), on the typical server it will be 6-8x.

Build a VanillaColumnPlan once per request and walk it per row, instead
of redoing alias->member, annotation, and member-name parsing per cell.

Pre-size the row IndexMap to skip incremental rehashes during fill.

Benchmark on my Apple M3 Max (TransformedData::transform vanilla path, cells/sec):

| cols Γ— rows   | before    | after      | speedup |
| ------------- | --------- | ---------- | ------- |
| 8 Γ— 1,000     | 5.51 Mc/s | 16.36 Mc/s | 2.97Γ—   |
| 8 Γ— 10,000    | 5.66 Mc/s | 17.45 Mc/s | 3.08Γ—   |
| 8 Γ— 50,000    | 5.16 Mc/s | 15.58 Mc/s | 3.02Γ—   |
| 8 Γ— 100,000   | 5.28 Mc/s | 16.47 Mc/s | 3.12Γ—   |
| 16 Γ— 1,000    | 5.70 Mc/s | 17.19 Mc/s | 3.02Γ—   |
| 16 Γ— 10,000   | 5.57 Mc/s | 17.52 Mc/s | 3.15Γ—   |
| 16 Γ— 50,000   | 5.60 Mc/s | 17.25 Mc/s | 3.08Γ—   |
| 16 Γ— 100,000  | 5.47 Mc/s | 17.39 Mc/s | 3.18Γ—   |
| 32 Γ— 1,000    | 5.76 Mc/s | 16.91 Mc/s | 2.94Γ—   |
| 32 Γ— 10,000   | 5.66 Mc/s | 16.25 Mc/s | 2.87Γ—   |
| 32 Γ— 50,000   | 5.90 Mc/s | 16.48 Mc/s | 2.79Γ—   |
| 32 Γ— 100,000  | 5.69 Mc/s | 16.80 Mc/s | 2.95Γ—   |
| 64 Γ— 1,000    | 5.05 Mc/s | 17.45 Mc/s | 3.45Γ—   |
| 64 Γ— 10,000   | 5.48 Mc/s | 16.54 Mc/s | 3.02Γ—   |
| 64 Γ— 50,000   | 5.71 Mc/s | 16.99 Mc/s | 2.97Γ—   |
| 64 Γ— 100,000  | 5.95 Mc/s | 16.80 Mc/s | 2.82Γ—   |

Average ~3.0Γ— throughput improvement (5.6 -> 16.8 Mcells/s), on the server it will be 6-8x.
@github-actions github-actions Bot added the rust Pull requests that update Rust code label Apr 30, 2026
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 30, 2026

Claude finished @ovr's task in 3m 23s β€”β€” View job


Code Review: PR #10783

  • Gather context and read changed files
  • Review query_result_transform.rs β€” core optimization
  • Review transform.rs β€” benchmark changes
  • Post comprehensive review

Summary

Excellent performance optimization. The core idea β€” build a VanillaColumnPlan once per request and walk it per row β€” is the textbook approach for amortizing per-row setup. The implementation is clean and correct.

Correctness

I verified the refactoring preserves behavior across all code paths:

  • map_or(true, ...) β†’ is_some_and(...) inversion: Logically equivalent. When dimensions is None, both paths proceed with granularity tracking. When Some and the base member is found, both skip it.
  • split().collect::<Vec<_>>() β†’ match_indices(): Produces identical base_member and granularity values. The old format!("{}.{}", path[0], path[1]) and the new &member_name[..second] yield the same string. Verified for edge cases: no dots, one dot, two dots, three+ dots.
  • Error paths: build_vanilla_column_plan now performs alias lookup and annotation validation up front, so errors that previously surfaced on the first row now surface before any row processing. Existing error tests are correctly updated to test at plan-build time instead.

What's Good

  1. Lifetime-based borrowing β€” VanillaColumnPlan borrows &str slices from the request-scoped maps, so the plan lives exactly as long as needed with zero extra allocations per row.
  2. HashMap<&str, ...> for minimal_granularities β€” borrows from the plan's base_member slices instead of allocating String keys per row.
  3. match_indices for dot-counting β€” avoids the Vec<&str> allocation from the old split().collect().
  4. Pre-sized IndexMap β€” with_capacity(plan.len() + 1) eliminates incremental rehashes. The +1 for the optional compareDateRange/blending key is a nice touch.
  5. Thorough new unit tests for compute_vanilla_granularity_track β€” covers all granularity levels, no-dot/one-dot/three-dot rejection, and the dimension-already-requested skip logic.
  6. Improved benchmarks β€” parameterizing by column count Γ— row count with cells/sec throughput gives much better coverage than the old fixed-column approach.

Minor Suggestions

  • Visibility: VanillaGranularityTrack is pub but compute_vanilla_granularity_track is private β€” external callers can never obtain one. pub(crate) would be more precise (see inline comment).
  • Future opportunity: The same plan-based pattern could be applied to the compact path (get_compact_row), which still does per-row alias lookups in resolve_members_to_aliases. Not for this PR, but worth noting.
  • Future micro-opt: A has_granularity_tracking flag on the plan could skip the HashMap allocation entirely for requests with no time-dimension granularity columns.

Security / Safety

No concerns. The change is purely internal data-structure optimization with no new external inputs, no unsafe code, and no API surface changes.

Verdict

LGTM. Clean refactoring with a ~3Γ— throughput improvement and no behavioral changes. The one inline suggestion (pub(crate) on VanillaGranularityTrack) is a minor nit.

@ovr ovr changed the title perf(cubeorchestrator): Improve performance of get_vanilla_row (3x-8x) perf(cubeorchestrator): Improve performance of get_vanilla_row (βˆ’66.8%, 3x-8x) Apr 30, 2026
@ovr ovr changed the title perf(cubeorchestrator): Improve performance of get_vanilla_row (βˆ’66.8%, 3x-8x) perf(cubeorchestrator): Improve performance of get_vanilla_row (βˆ’66.8%, 3x) Apr 30, 2026
Comment thread rust/cubeorchestrator/src/query_result_transform.rs
Comment thread rust/cubeorchestrator/src/query_result_transform.rs
Comment thread rust/cubeorchestrator/src/query_result_transform.rs Outdated
Comment thread rust/cubeorchestrator/benches/transform.rs
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 30, 2026

Codecov Report

βœ… All modified and coverable lines are covered by tests.
βœ… Project coverage is 57.99%. Comparing base (1f749cf) to head (5b1da61).

❗ There is a different number of reports uploaded between BASE (1f749cf) and HEAD (5b1da61). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (1f749cf) HEAD (5b1da61)
cubesql 1 0
Additional details and impacted files
@@             Coverage Diff             @@
##           master   #10783       +/-   ##
===========================================
- Coverage   78.74%   57.99%   -20.75%     
===========================================
  Files         469      215      -254     
  Lines       92130    16776    -75354     
  Branches     3388     3388               
===========================================
- Hits        72548     9730    -62818     
+ Misses      19089     6553    -12536     
  Partials      493      493               
Flag Coverage Ξ”
cube-backend 57.99% <ΓΈ> (ΓΈ)
cubesql ?

Flags with carried forward coverage won't be shown. Click here to find out more.

β˜” View full report in Codecov by Sentry.
πŸ“’ Have feedback on the report? Share it here.

πŸš€ New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • πŸ“¦ JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

ovr added 3 commits April 30, 2026 16:41
| cols Γ— rows  | after      | time Ξ” |
| ------------ | ---------- | ------ |
| 8 Γ— 1,000    | 19.77 Mc/s | βˆ’14.7% |
| 8 Γ— 10,000   | 19.13 Mc/s | βˆ’12.2% |
| 8 Γ— 50,000   | 19.26 Mc/s | βˆ’14.2% |
| 8 Γ— 100,000  | 18.67 Mc/s | βˆ’11.7% |
| 16 Γ— 1,000   | 21.04 Mc/s | βˆ’15.7% |
| 16 Γ— 10,000  | 20.62 Mc/s | βˆ’15.5% |
| 16 Γ— 50,000  | 20.00 Mc/s | βˆ’13.6% |
| 16 Γ— 100,000 | 20.46 Mc/s | βˆ’18.1% |
| 32 Γ— 1,000   | 21.02 Mc/s | βˆ’16.0% |
| 32 Γ— 10,000  | 20.80 Mc/s | βˆ’15.2% |
| 32 Γ— 50,000  | 19.92 Mc/s | βˆ’13.1% |
| 32 Γ— 100,000 | 20.72 Mc/s | βˆ’16.5% |
| 64 Γ— 1,000   | 21.22 Mc/s | βˆ’18.9% |
| 64 Γ— 10,000  | 20.27 Mc/s | βˆ’16.9% |
| 64 Γ— 50,000  | 20.78 Mc/s | βˆ’16.4% |
| 64 Γ— 100,000 | 20.86 Mc/s | βˆ’18.7% |

Cumulative vanilla speedup vs. pre-optimization baseline: ~3.7Γ—
(5.5 -> ~20 Mcells/s).
Adds a second bench function `bench_transform_time_scenarios` that
exercises four time-dimension shapes at a fixed 16 cols Γ— 100k rows:

- no_time_dim: baseline (matches existing 16Γ—100k matrix entry)
- one_time_dim_day: one time dim with known granularity (level=4)
- one_time_dim_custom_granularity: unknown granularity, uses
  DEFAULT_LEVEL_FOR_UNKNOWN
- two_time_dims: two distinct base members in the granularity tracker

| scenario           | compact     | columnar    | vanilla     |
| ------------------ | ----------- | ----------- | ----------- |
| no_time_dim        | 15.83 Mc/s  | 14.37 Mc/s  | 20.39 Mc/s  |
| one_time_dim_day   | 10.07 Mc/s  |  9.76 Mc/s  | 10.61 Mc/s  |
| custom_granularity | 10.61 Mc/s  | 10.41 Mc/s  | 10.80 Mc/s  |
| two_time_dims      |  8.14 Mc/s  |  7.86 Mc/s  |  7.35 Mc/s  |
@ovr ovr merged commit 0c2cb89 into master Apr 30, 2026
31 of 32 checks passed
@ovr ovr deleted the feat/rest-api-columnar-format branch April 30, 2026 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rust Pull requests that update Rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants