Skip to content

perf(cubeorchestrator): Improve columnar transform (−77%, 4.4x)#10786

Merged
ovr merged 2 commits intomasterfrom
perf/query-result-columnar-transform
Apr 30, 2026
Merged

perf(cubeorchestrator): Improve columnar transform (−77%, 4.4x)#10786
ovr merged 2 commits intomasterfrom
perf/query-result-columnar-transform

Conversation

@ovr
Copy link
Copy Markdown
Member

@ovr ovr commented Apr 30, 2026

Bench on Apple M3 Max (TransformedData::transform/columnar):

shape time delta speed-up
c32_r1000 −77.3% 4.4x
c64_r1000 −78.5% 4.6x
c32_r10000 −72.2% 3.6x
c64_r10000 −49.6% 2.0x
c32_r50000 −40.7% 1.7x
c64_r50000 −35.2% 1.5x
c32_r100000 −29.0% 1.4x
c64_r100000 −27.7% 1.4x

Bench on Apple M3 Max (`TransformedData::transform/columnar`):

| shape         | time delta | speed-up |
| ------------- | ---------- | -------- |
| c32_r1000     | −77.3%     | 4.4x     |
| c64_r1000     | −78.5%     | 4.6x     |
| c32_r10000    | −72.2%     | 3.6x     |
| c64_r10000    | −49.6%     | 2.0x     |
| c32_r50000    | −40.7%     | 1.7x     |
| c64_r50000    | −35.2%     | 1.5x     |
| c32_r100000   | −29.0%     | 1.4x     |
| c64_r100000   | −27.7%     | 1.4x     |
@github-actions github-actions Bot added the rust Pull requests that update Rust code label Apr 30, 2026
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 30, 2026

Claude finished @ovr's task in 3m 21s —— View job


Review of PR #10786

  • Gather context and read the diff
  • Review code quality and correctness
  • Check for performance, security, and edge cases
  • Post detailed review

Verdict: This is a clean, well-structured performance optimization. The plan-based approach mirrors the existing VanillaPlan pattern and the code reads well. A few observations below.

What the PR does

The old columnar path ran two passes over the data:

  1. Row-major transform — called get_compact_row per row, repeating hash lookups (annotation, members_to_alias_map, columns_pos) for every cell.
  2. Transposetranspose_to_columns converted the row-major result to column-major, allocating an intermediate Vec<Vec<DBResponsePrimitive>>.

The new code builds a ColumnarColumnPlan once (amortizing all hash lookups), then materializes columns directly from the source rows. This eliminates both the intermediate allocation and the per-cell lookup overhead — which explains the impressive gains on smaller datasets where lookup cost dominates.

Correctness

I verified the new build_columnar_plan logic against the old get_compact_row + transpose_to_columns path:

  • Regular members: Both resolve annotation → alias → column_pos → transform_value. Equivalent.
  • CompareDateRangeQuery tail: Old code skipped the synthetic member in the loop (no annotation), then unconditionally appended the date range value. New code detects !resolved && is_last and emits Constant(get_date_range_value(...)). Equivalent — and slightly better: get_date_range_value is called once at plan time instead of once per row.
  • BlendingQuery tail: Same pattern. Both use get_blending_response_key, both look up annotation[alias] for member_type (the quirky behavior is correctly preserved and documented in the comment at line 556–558).
  • Fallback for unresolved non-tail members: Old code silently skipped (pushed nothing → positional misalignment if multiple members failed). New code pushes NullFilled, maintaining 1:1 member-to-column correspondence. This is actually a correctness improvement for that edge case, though in practice get_members should guarantee all non-synthetic members resolve.

The existing tests (test_*_columnar) validate that columnar output matches compact output cell-by-cell for regular, compare-date-range, and blending queries — good coverage of the behavioral equivalence.

Suggestions (non-blocking)

  1. transpose_to_columns is now dead in production — it's only referenced in two unit tests and a doc comment. Consider marking it #[cfg(test)] or removing it in a follow-up to keep the public surface clean.

  2. Minor: avoidable .cloned() allocation (query_result_transform.rs:598):

    let cell = row
        .get(*index)
        .cloned()
        .unwrap_or(DBResponseValue::Primitive(DBResponsePrimitive::Null));
    out.push(transform_value(cell, plan_entry.member_type));

    transform_value takes ownership, so the clone is necessary here given the current signature. But if transform_value were ever refactored to accept &DBResponseValue, this could avoid cloning for non-time columns that just pass through. Not actionable now — just a note for future profiling.

  3. Constant variant clones per row (query_result_transform.rs:605–607):

    ColumnarColumnSource::Constant(v) => {
        out.resize(row_count, v.clone());
    }

    Vec::resize clones for each new element. For DBResponsePrimitive::String this means N string allocations. This is still better than the old code (which called get_date_range_valuejoin() per row), so it's a net improvement. A potential micro-optimization would be to pre-allocate and use std::iter::repeat_with(|| v.clone()), but resize is idiomatic and the constant-column case is rare — not worth the added complexity.

Summary

Excellent perf PR. The architecture follows established patterns (VanillaPlan), the behavioral equivalence is thoroughly tested, and the edge-case handling is marginally improved. Ship it.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 30, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 78.75%. Comparing base (54a3351) to head (ddee10f).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #10786   +/-   ##
=======================================
  Coverage   78.74%   78.75%           
=======================================
  Files         469      469           
  Lines       92130    92130           
  Branches     3388     3388           
=======================================
+ Hits        72549    72553    +4     
+ Misses      19088    19084    -4     
  Partials      493      493           
Flag Coverage Δ
cube-backend 57.99% <ø> (ø)
cubesql 83.37% <ø> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ovr ovr merged commit 9e8a6ad into master Apr 30, 2026
88 of 91 checks passed
@ovr ovr deleted the perf/query-result-columnar-transform branch April 30, 2026 17:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rust Pull requests that update Rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants