perf(cubeorchestrator): Improve performance of get_vanilla_row (−66.8%, 3x) by ovr · Pull Request #10783 · cube-js/cube

ovr · 2026-04-30T13:21:54Z

Build a VanillaColumnPlan once per request and walk it per row, instead of redoing alias->member, annotation, and member-name parsing per cell.

Pre-size the row IndexMap to skip incremental rehashes during fill.

Benchmark on my Apple M3 Max (TransformedData::transform vanilla path, cells/sec):

cols × rows	before	after	speedup
8 × 1,000	5.51 Mc/s	16.36 Mc/s	2.97×
8 × 10,000	5.66 Mc/s	17.45 Mc/s	3.08×
8 × 50,000	5.16 Mc/s	15.58 Mc/s	3.02×
8 × 100,000	5.28 Mc/s	16.47 Mc/s	3.12×
16 × 1,000	5.70 Mc/s	17.19 Mc/s	3.02×
16 × 10,000	5.57 Mc/s	17.52 Mc/s	3.15×
16 × 50,000	5.60 Mc/s	17.25 Mc/s	3.08×
16 × 100,000	5.47 Mc/s	17.39 Mc/s	3.18×
32 × 1,000	5.76 Mc/s	16.91 Mc/s	2.94×
32 × 10,000	5.66 Mc/s	16.25 Mc/s	2.87×
32 × 50,000	5.90 Mc/s	16.48 Mc/s	2.79×
32 × 100,000	5.69 Mc/s	16.80 Mc/s	2.95×
64 × 1,000	5.05 Mc/s	17.45 Mc/s	3.45×
64 × 10,000	5.48 Mc/s	16.54 Mc/s	3.02×
64 × 50,000	5.71 Mc/s	16.99 Mc/s	2.97×
64 × 100,000	5.95 Mc/s	16.80 Mc/s	2.82×

Average ~3.0× throughput improvement (5.6 -> 16.8 Mcells/s), on the typical server it will be 6-8x.

Build a VanillaColumnPlan once per request and walk it per row, instead of redoing alias->member, annotation, and member-name parsing per cell. Pre-size the row IndexMap to skip incremental rehashes during fill. Benchmark on my Apple M3 Max (TransformedData::transform vanilla path, cells/sec): | cols × rows | before | after | speedup | | ------------- | --------- | ---------- | ------- | | 8 × 1,000 | 5.51 Mc/s | 16.36 Mc/s | 2.97× | | 8 × 10,000 | 5.66 Mc/s | 17.45 Mc/s | 3.08× | | 8 × 50,000 | 5.16 Mc/s | 15.58 Mc/s | 3.02× | | 8 × 100,000 | 5.28 Mc/s | 16.47 Mc/s | 3.12× | | 16 × 1,000 | 5.70 Mc/s | 17.19 Mc/s | 3.02× | | 16 × 10,000 | 5.57 Mc/s | 17.52 Mc/s | 3.15× | | 16 × 50,000 | 5.60 Mc/s | 17.25 Mc/s | 3.08× | | 16 × 100,000 | 5.47 Mc/s | 17.39 Mc/s | 3.18× | | 32 × 1,000 | 5.76 Mc/s | 16.91 Mc/s | 2.94× | | 32 × 10,000 | 5.66 Mc/s | 16.25 Mc/s | 2.87× | | 32 × 50,000 | 5.90 Mc/s | 16.48 Mc/s | 2.79× | | 32 × 100,000 | 5.69 Mc/s | 16.80 Mc/s | 2.95× | | 64 × 1,000 | 5.05 Mc/s | 17.45 Mc/s | 3.45× | | 64 × 10,000 | 5.48 Mc/s | 16.54 Mc/s | 3.02× | | 64 × 50,000 | 5.71 Mc/s | 16.99 Mc/s | 2.97× | | 64 × 100,000 | 5.95 Mc/s | 16.80 Mc/s | 2.82× | Average ~3.0× throughput improvement (5.6 -> 16.8 Mcells/s), on the server it will be 6-8x.

claude · 2026-04-30T13:22:21Z

Claude finished @ovr's task in 3m 23s —— View job

Code Review: PR #10783

Gather context and read changed files
Review query_result_transform.rs — core optimization
Review transform.rs — benchmark changes
Post comprehensive review

Summary

Excellent performance optimization. The core idea — build a VanillaColumnPlan once per request and walk it per row — is the textbook approach for amortizing per-row setup. The implementation is clean and correct.

Correctness

I verified the refactoring preserves behavior across all code paths:

map_or(true, ...) → is_some_and(...) inversion: Logically equivalent. When dimensions is None, both paths proceed with granularity tracking. When Some and the base member is found, both skip it.
split().collect::<Vec<_>>() → match_indices(): Produces identical base_member and granularity values. The old format!("{}.{}", path[0], path[1]) and the new &member_name[..second] yield the same string. Verified for edge cases: no dots, one dot, two dots, three+ dots.
Error paths: build_vanilla_column_plan now performs alias lookup and annotation validation up front, so errors that previously surfaced on the first row now surface before any row processing. Existing error tests are correctly updated to test at plan-build time instead.

What's Good

Lifetime-based borrowing — VanillaColumnPlan borrows &str slices from the request-scoped maps, so the plan lives exactly as long as needed with zero extra allocations per row.
HashMap<&str, ...> for minimal_granularities — borrows from the plan's base_member slices instead of allocating String keys per row.
match_indices for dot-counting — avoids the Vec<&str> allocation from the old split().collect().
Pre-sized IndexMap — with_capacity(plan.len() + 1) eliminates incremental rehashes. The +1 for the optional compareDateRange/blending key is a nice touch.
Thorough new unit tests for compute_vanilla_granularity_track — covers all granularity levels, no-dot/one-dot/three-dot rejection, and the dimension-already-requested skip logic.
Improved benchmarks — parameterizing by column count × row count with cells/sec throughput gives much better coverage than the old fixed-column approach.

Minor Suggestions

Visibility: VanillaGranularityTrack is pub but compute_vanilla_granularity_track is private — external callers can never obtain one. pub(crate) would be more precise (see inline comment).
Future opportunity: The same plan-based pattern could be applied to the compact path (get_compact_row), which still does per-row alias lookups in resolve_members_to_aliases. Not for this PR, but worth noting.
Future micro-opt: A has_granularity_tracking flag on the plan could skip the HashMap allocation entirely for requests with no time-dimension granularity columns.

Security / Safety

No concerns. The change is purely internal data-structure optimization with no new external inputs, no unsafe code, and no API surface changes.

Verdict

LGTM. Clean refactoring with a ~3× throughput improvement and no behavioral changes. The one inline suggestion (pub(crate) on VanillaGranularityTrack) is a minor nit.

Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>

codecov · 2026-04-30T13:46:42Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 57.99%. Comparing base (1f749cf) to head (5b1da61).

❗ There is a different number of reports uploaded between BASE (1f749cf) and HEAD (5b1da61). Click for more details.

HEAD has 1 upload less than BASE

Flag BASE (1f749cf) HEAD (5b1da61)

cubesql 1 0

Additional details and impacted files

@@             Coverage Diff             @@
##           master   #10783       +/-   ##
===========================================
- Coverage   78.74%   57.99%   -20.75%     
===========================================
  Files         469      215      -254     
  Lines       92130    16776    -75354     
  Branches     3388     3388               
===========================================
- Hits        72548     9730    -62818     
+ Misses      19089     6553    -12536     
  Partials      493      493

Flag	Coverage Δ
cube-backend	`57.99% <ø> (ø)`
cubesql	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

| cols × rows | after | time Δ | | ------------ | ---------- | ------ | | 8 × 1,000 | 19.77 Mc/s | −14.7% | | 8 × 10,000 | 19.13 Mc/s | −12.2% | | 8 × 50,000 | 19.26 Mc/s | −14.2% | | 8 × 100,000 | 18.67 Mc/s | −11.7% | | 16 × 1,000 | 21.04 Mc/s | −15.7% | | 16 × 10,000 | 20.62 Mc/s | −15.5% | | 16 × 50,000 | 20.00 Mc/s | −13.6% | | 16 × 100,000 | 20.46 Mc/s | −18.1% | | 32 × 1,000 | 21.02 Mc/s | −16.0% | | 32 × 10,000 | 20.80 Mc/s | −15.2% | | 32 × 50,000 | 19.92 Mc/s | −13.1% | | 32 × 100,000 | 20.72 Mc/s | −16.5% | | 64 × 1,000 | 21.22 Mc/s | −18.9% | | 64 × 10,000 | 20.27 Mc/s | −16.9% | | 64 × 50,000 | 20.78 Mc/s | −16.4% | | 64 × 100,000 | 20.86 Mc/s | −18.7% | Cumulative vanilla speedup vs. pre-optimization baseline: ~3.7× (5.5 -> ~20 Mcells/s).

Adds a second bench function `bench_transform_time_scenarios` that exercises four time-dimension shapes at a fixed 16 cols × 100k rows: - no_time_dim: baseline (matches existing 16×100k matrix entry) - one_time_dim_day: one time dim with known granularity (level=4) - one_time_dim_custom_granularity: unknown granularity, uses DEFAULT_LEVEL_FOR_UNKNOWN - two_time_dims: two distinct base members in the granularity tracker | scenario | compact | columnar | vanilla | | ------------------ | ----------- | ----------- | ----------- | | no_time_dim | 15.83 Mc/s | 14.37 Mc/s | 20.39 Mc/s | | one_time_dim_day | 10.07 Mc/s | 9.76 Mc/s | 10.61 Mc/s | | custom_granularity | 10.61 Mc/s | 10.41 Mc/s | 10.80 Mc/s | | two_time_dims | 8.14 Mc/s | 7.86 Mc/s | 7.35 Mc/s |

github-actions Bot added the rust Pull requests that update Rust code label Apr 30, 2026

ovr changed the title ~~perf(cubeorchestrator): Improve performance of get_vanilla_row (3x-8x)~~ perf(cubeorchestrator): Improve performance of get_vanilla_row (−66.8%, 3x-8x) Apr 30, 2026

ovr changed the title ~~perf(cubeorchestrator): Improve performance of get_vanilla_row (−66.8%, 3x-8x)~~ perf(cubeorchestrator): Improve performance of get_vanilla_row (−66.8%, 3x) Apr 30, 2026

vercel Bot deployed to Preview April 30, 2026 13:24 View deployment

claude Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread rust/cubeorchestrator/src/query_result_transform.rs

claude Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread rust/cubeorchestrator/src/query_result_transform.rs

claude Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread rust/cubeorchestrator/src/query_result_transform.rs Outdated

claude Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread rust/cubeorchestrator/benches/transform.rs

Update rust/cubeorchestrator/src/query_result_transform.rs

5b1da61

Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>

ovr added 3 commits April 30, 2026 16:41

chore: fmt

775e07b

waralexrom approved these changes Apr 30, 2026

View reviewed changes

ovr merged commit 0c2cb89 into master Apr 30, 2026
31 of 32 checks passed

ovr deleted the feat/rest-api-columnar-format branch April 30, 2026 15:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(cubeorchestrator): Improve performance of get_vanilla_row (−66.8%, 3x)#10783

perf(cubeorchestrator): Improve performance of get_vanilla_row (−66.8%, 3x)#10783
ovr merged 5 commits intomasterfrom
feat/rest-api-columnar-format

ovr commented Apr 30, 2026

Uh oh!

claude Bot commented Apr 30, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ovr commented Apr 30, 2026

Uh oh!

claude Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: PR #10783

Summary

Correctness

What's Good

Minor Suggestions

Security / Safety

Verdict

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Apr 30, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

claude Bot commented Apr 30, 2026 •

edited

Loading