Skip to content

perf(cubeorchestrator): Improve vanilla transform with StructuredObje…#10788

Draft
ovr wants to merge 1 commit into
masterfrom
perf/query-result-vanilla-structured-object
Draft

perf(cubeorchestrator): Improve vanilla transform with StructuredObje…#10788
ovr wants to merge 1 commit into
masterfrom
perf/query-result-vanilla-structured-object

Conversation

@ovr
Copy link
Copy Markdown
Member

@ovr ovr commented Apr 30, 2026

…ct (−65%, 2.9x)

Replace per-row IndexMap<String, DBResponsePrimitive> in get_vanilla_row with a new StructuredObject that stores values in a position-aligned Vec and shares its key list across the whole result set via Arc<StructuredObjectShape>. Keys live once per result set instead of being cloned N×M times.

Build the shape inside build_vanilla_plan (regular columns + granularity base members + compareDateRange/blending tail keys, deduped) and stash precomputed output positions on the plan, so get_vanilla_row writes via set_by_position with no per-row hash lookups or string allocations.

Vanilla benchmark deltas on Apple M3 Max (criterion, cells/sec; all p < 0.05):

Scenario Before After Δ time
vanilla/c08_r1000 469 µs 152 µs −67.5%
vanilla/c08_r10000 6.0 ms 1.56 ms −74.0%
vanilla/c08_r50000 24.9 ms 7.75 ms −68.9%
vanilla/c08_r100000 50.0 ms 16.05 ms −67.9%
vanilla/c16_r1000 856 µs 309 µs −63.9%
vanilla/c16_r10000 8.79 ms 3.07 ms −65.1%
vanilla/c16_r50000 43.6 ms 15.62 ms −64.2%
vanilla/c32_r10000 17.78 ms 6.4 ms −63.9%
vanilla/c32_r100000 171.7 ms 61.5 ms −64.2%
vanilla/c64_r1000 3.38 ms 1.15 ms −66.3%
vanilla/c64_r10000 34.0 ms 11.45 ms −66.4%
vanilla/c64_r50000 173.3 ms 59.2 ms −65.9%
vanilla/c64_r100000 340.2 ms 116.1 ms −65.9%
scenarios/vanilla/no_time_dim/c16_r100000 87.0 ms 31.7 ms −63.5%
scenarios/vanilla/one_time_dim_day/c16_r100000 173.1 ms 93.4 ms −46.1%
scenarios/vanilla/one_time_dim_custom_granularity/c16_r100000 177.1 ms 93.6 ms −47.1%
scenarios/vanilla/two_time_dims/c16_r100000 257.1 ms 142.8 ms −44.5%

…ct (−65%, 2.9x)

Replace per-row `IndexMap<String, DBResponsePrimitive>` in `get_vanilla_row`
with a new `StructuredObject` that stores values in a position-aligned `Vec`
and shares its key list across the whole result set via
`Arc<StructuredObjectShape>`. Keys live once per result set instead of being
cloned N×M times.

Build the shape inside `build_vanilla_plan` (regular columns + granularity
base members + `compareDateRange`/blending tail keys, deduped) and stash
precomputed output positions on the plan, so `get_vanilla_row` writes via
`set_by_position` with no per-row hash lookups or string allocations.

Vanilla benchmark deltas on Apple M3 Max (criterion, cells/sec; all p < 0.05):

| Scenario                                         | Before     | After     | Δ time    |
| ------------------------------------------------ | ---------- | --------- | --------- |
| vanilla/c08_r1000                                | 469 µs     | 152 µs    | −67.5%    |
| vanilla/c08_r10000                               | 6.0 ms     | 1.56 ms   | −74.0%    |
| vanilla/c08_r50000                               | 24.9 ms    | 7.75 ms   | −68.9%    |
| vanilla/c08_r100000                              | 50.0 ms    | 16.05 ms  | −67.9%    |
| vanilla/c16_r1000                                | 856 µs     | 309 µs    | −63.9%    |
| vanilla/c16_r10000                               | 8.79 ms    | 3.07 ms   | −65.1%    |
| vanilla/c16_r50000                               | 43.6 ms    | 15.62 ms  | −64.2%    |
| vanilla/c32_r10000                               | 17.78 ms   | 6.4 ms    | −63.9%    |
| vanilla/c32_r100000                              | 171.7 ms   | 61.5 ms   | −64.2%    |
| vanilla/c64_r1000                                | 3.38 ms    | 1.15 ms   | −66.3%    |
| vanilla/c64_r10000                               | 34.0 ms    | 11.45 ms  | −66.4%    |
| vanilla/c64_r50000                               | 173.3 ms   | 59.2 ms   | −65.9%    |
| vanilla/c64_r100000                              | 340.2 ms   | 116.1 ms  | −65.9%    |
| scenarios/vanilla/no_time_dim/c16_r100000        | 87.0 ms    | 31.7 ms   | −63.5%    |
| scenarios/vanilla/one_time_dim_day/c16_r100000   | 173.1 ms   | 93.4 ms   | −46.1%    |
| scenarios/vanilla/one_time_dim_custom_granularity/c16_r100000 | 177.1 ms | 93.6 ms | −47.1% |
| scenarios/vanilla/two_time_dims/c16_r100000      | 257.1 ms   | 142.8 ms  | −44.5%    |
@github-actions github-actions Bot added the rust Pull requests that update Rust code label Apr 30, 2026
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 30, 2026

Claude finished @ovr's task in 3m 2s —— View job


Code Review: StructuredObject for vanilla transform

  • Gather context and read changed files
  • Review structured_object.rs (new file)
  • Review query_result_transform.rs (modifications)
  • Review lib.rs (module declaration)
  • Post comprehensive review

Overall: This is a well-executed performance optimization. The core insight — share the key list once via Arc<StructuredObjectShape> instead of cloning N×M strings — is clean and the benchmark numbers (44–74% improvement) are impressive and consistent across all scenarios. The code is correct and well-structured.

Architecture & Design

The 4-pass plan construction (build_vanilla_plan) is a smart approach:

  1. Regular columns → shape positions
  2. Granularity base members → deduped into shape
  3. Query-type tail keys (compareDateRange, blending) → appended to shape
  4. Final column plan assembly with precomputed positions

This eliminates all per-row hash lookups, string allocations, and key cloning. The StructuredObject abstraction is reusable and could benefit other transform paths (compact, columnar) in the future.

Correctness

  • Granularity tracking: Correctly migrated from &str-keyed HashMap to usize-keyed, with the base-member write still happening after the column loop. Semantics preserved.
  • CompareDateRange: Always inserted into shape for the right query type, always set. Correct.
  • Blending: response_idx correctly uses position() (not insert()) to find only pre-existing columns. See inline comment for a minor behavioral nuance around null vs. absent keys.
  • set_by_position safety: All callers derive indices from the shape builder during plan construction, so out-of-bounds panics are not reachable in practice.
  • DBResponsePrimitive::Default: Adding Null as the default is correct and enables with_shape_default.

Items flagged (inline comments posted)

Priority Location Issue
Low query_result_transform.rs:708-714 Blending key now always present in output (as null) vs. previously absent when response column missing — minor semantic change, verify downstream consumers
Nit structured_object.rs:111-115 Consider #[inline] on set_by_position / get — hot-path methods in a separate module may not be inlined without LTO
Nit query_result_transform.rs:660 Arc::clone(&plan.shape) is more idiomatic than .clone() for signaling cheap refcount bump
Nit structured_object.rs:56-67 Minor clone ordering in insert() — ownership clarity

Testing

  • StructuredObject has solid unit tests: shape dedup/order, insert/get, iteration order, serialization, equality.
  • Existing vanilla row tests are properly adapted to the new API.
  • Granularity track tests updated for the new (base_member, level) return type.
  • The blending and compare-date-range paths through get_vanilla_row don't have dedicated unit tests (they're covered by integration/benchmark scenarios). Consider adding targeted unit tests for these paths in a follow-up if not already covered elsewhere.

Security & Memory Safety

No concerns. No unsafe code, no user-controlled indices, no unbounded allocations. The Arc sharing is correctly scoped to the result set lifetime.


Looks good to merge. The only item worth a quick sanity check before merging is the blending null-vs-absent behavioral change (inline comment), which is likely a no-op for actual consumers but worth confirming.

Comment on lines 708 to 714
QueryType::BlendingQuery => {
let blending_key = get_blending_query_key(query.time_dimensions.as_ref())?;
let response_key = get_blending_response_key(query.time_dimensions.as_ref())?;

if let Some(value) = row.get(&response_key) {
row.insert(blending_key, value.clone());
if let (Some(key_idx), Some(response_idx)) =
(plan.blending_key_index, plan.blending_response_index)
{
let value = row.values()[response_idx].clone();
row.set_by_position(key_idx, value);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subtle behavioral change worth noting: The old code only inserted the blending key into the row if row.get(&response_key) returned Some(value) — meaning the response column was actually populated. If db_row was shorter than expected and the response column wasn't set, the blending key was simply absent from the output.

Now, because the shape always includes blending_key (inserted unconditionally in pass 3), the output will always serialize "time.<granularity>": null for BlendingQuery when the response column has no value — instead of the key being absent entirely.

This is likely fine since null vs. absent is usually equivalent downstream, but worth confirming with consumers. If it matters, you could guard with a check that the response value isn't the default Null before writing.

Comment on lines +111 to +115
/// Fast-path setter for callers that already know the position (e.g. via a plan).
/// Panics if `idx >= len()`.
pub fn set_by_position(&mut self, idx: usize, value: V) {
self.values[idx] = value;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Since this is the performance-critical hot path (called per-cell), consider adding #[inline] here and on get() / insert(). These methods are defined in a separate module and the compiler may not inline them across crate/module boundaries without LTO. Given how small they are, explicit #[inline] is virtually free and ensures the optimization regardless of build profile.

Suggested change
/// Fast-path setter for callers that already know the position (e.g. via a plan).
/// Panics if `idx >= len()`.
pub fn set_by_position(&mut self, idx: usize, value: V) {
self.values[idx] = value;
}
#[inline]
pub fn set_by_position(&mut self, idx: usize, value: V) {
self.values[idx] = value;
}

// +1 to cover the optional tail entry (compareDateRange / blending key).
let mut row = IndexMap::with_capacity(plan.columns.len() + 1);
) -> Result<StructuredObject> {
let mut row: StructuredObject = StructuredObject::with_shape_default(plan.shape.clone());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: plan.shape.clone() is just Arc::clone (refcount bump), so this is correct and cheap. But for readability in a performance-focused PR, Arc::clone(&plan.shape) makes the intent more explicit — it signals "this is a cheap refcount clone, not a deep copy" to future readers.

Minor style preference, not blocking.

// FIXME: For now custom granularities are not supported, only common ones.
// There is no granularity type/class implementation in rust yet.
let mut minimal_granularities: HashMap<&str, (u8, DBResponsePrimitive)> = HashMap::new();
let mut minimal_granularities: HashMap<usize, (u8, DBResponsePrimitive)> = HashMap::new();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvement: keying the HashMap by usize instead of &str eliminates per-lookup string hashing. Since usize hashing is essentially free (identity), this is a meaningful win on the granularity tracking path.

Comment on lines +56 to +67
pub fn insert(&mut self, key: impl Into<String>) -> usize {
let key = key.into();

if let Some(&i) = self.index.get(&key) {
return i;
}

let i = self.keys.len();
self.index.insert(key.clone(), i);
self.keys.push(key);
i
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key.into() + key.clone() pattern allocates once for the into() conversion and clones once for the HashMap insert vs the Vec push. You could avoid the clone by inserting into the HashMap first (taking ownership) and pushing a second owned copy, but a cleaner approach is:

Suggested change
pub fn insert(&mut self, key: impl Into<String>) -> usize {
let key = key.into();
if let Some(&i) = self.index.get(&key) {
return i;
}
let i = self.keys.len();
self.index.insert(key.clone(), i);
self.keys.push(key);
i
}
pub fn insert(&mut self, key: impl Into<String>) -> usize {
let key = key.into();
if let Some(&i) = self.index.get(&key) {
return i;
}
let i = self.keys.len();
self.keys.push(key.clone());
self.index.insert(key, i);
i

This way the clone() goes to keys (which only needs a read-only copy) and the original owned String goes to index (which needs ownership). Functionally identical, just slightly clearer about who needs ownership. Truly a nit — not blocking.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 30, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 78.75%. Comparing base (22694eb) to head (ac406e0).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff             @@
##           master   #10788       +/-   ##
===========================================
+ Coverage   57.99%   78.75%   +20.75%     
===========================================
  Files         215      469      +254     
  Lines       16776    92130    +75354     
  Branches     3388     3388               
===========================================
+ Hits         9730    72553    +62823     
- Misses       6553    19084    +12531     
  Partials      493      493               
Flag Coverage Δ
cube-backend 57.99% <ø> (ø)
cubesql 83.37% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rust Pull requests that update Rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant