perf(cubeorchestrator): Improve vanilla transform with StructuredObje…#10788
perf(cubeorchestrator): Improve vanilla transform with StructuredObje…#10788ovr wants to merge 1 commit into
Conversation
…ct (−65%, 2.9x) Replace per-row `IndexMap<String, DBResponsePrimitive>` in `get_vanilla_row` with a new `StructuredObject` that stores values in a position-aligned `Vec` and shares its key list across the whole result set via `Arc<StructuredObjectShape>`. Keys live once per result set instead of being cloned N×M times. Build the shape inside `build_vanilla_plan` (regular columns + granularity base members + `compareDateRange`/blending tail keys, deduped) and stash precomputed output positions on the plan, so `get_vanilla_row` writes via `set_by_position` with no per-row hash lookups or string allocations. Vanilla benchmark deltas on Apple M3 Max (criterion, cells/sec; all p < 0.05): | Scenario | Before | After | Δ time | | ------------------------------------------------ | ---------- | --------- | --------- | | vanilla/c08_r1000 | 469 µs | 152 µs | −67.5% | | vanilla/c08_r10000 | 6.0 ms | 1.56 ms | −74.0% | | vanilla/c08_r50000 | 24.9 ms | 7.75 ms | −68.9% | | vanilla/c08_r100000 | 50.0 ms | 16.05 ms | −67.9% | | vanilla/c16_r1000 | 856 µs | 309 µs | −63.9% | | vanilla/c16_r10000 | 8.79 ms | 3.07 ms | −65.1% | | vanilla/c16_r50000 | 43.6 ms | 15.62 ms | −64.2% | | vanilla/c32_r10000 | 17.78 ms | 6.4 ms | −63.9% | | vanilla/c32_r100000 | 171.7 ms | 61.5 ms | −64.2% | | vanilla/c64_r1000 | 3.38 ms | 1.15 ms | −66.3% | | vanilla/c64_r10000 | 34.0 ms | 11.45 ms | −66.4% | | vanilla/c64_r50000 | 173.3 ms | 59.2 ms | −65.9% | | vanilla/c64_r100000 | 340.2 ms | 116.1 ms | −65.9% | | scenarios/vanilla/no_time_dim/c16_r100000 | 87.0 ms | 31.7 ms | −63.5% | | scenarios/vanilla/one_time_dim_day/c16_r100000 | 173.1 ms | 93.4 ms | −46.1% | | scenarios/vanilla/one_time_dim_custom_granularity/c16_r100000 | 177.1 ms | 93.6 ms | −47.1% | | scenarios/vanilla/two_time_dims/c16_r100000 | 257.1 ms | 142.8 ms | −44.5% |
|
Claude finished @ovr's task in 3m 2s —— View job Code Review:
|
| Priority | Location | Issue |
|---|---|---|
| Low | query_result_transform.rs:708-714 |
Blending key now always present in output (as null) vs. previously absent when response column missing — minor semantic change, verify downstream consumers |
| Nit | structured_object.rs:111-115 |
Consider #[inline] on set_by_position / get — hot-path methods in a separate module may not be inlined without LTO |
| Nit | query_result_transform.rs:660 |
Arc::clone(&plan.shape) is more idiomatic than .clone() for signaling cheap refcount bump |
| Nit | structured_object.rs:56-67 |
Minor clone ordering in insert() — ownership clarity |
Testing
StructuredObjecthas solid unit tests: shape dedup/order, insert/get, iteration order, serialization, equality.- Existing vanilla row tests are properly adapted to the new API.
- Granularity track tests updated for the new
(base_member, level)return type. - The blending and compare-date-range paths through
get_vanilla_rowdon't have dedicated unit tests (they're covered by integration/benchmark scenarios). Consider adding targeted unit tests for these paths in a follow-up if not already covered elsewhere.
Security & Memory Safety
No concerns. No unsafe code, no user-controlled indices, no unbounded allocations. The Arc sharing is correctly scoped to the result set lifetime.
Looks good to merge. The only item worth a quick sanity check before merging is the blending null-vs-absent behavioral change (inline comment), which is likely a no-op for actual consumers but worth confirming.
| QueryType::BlendingQuery => { | ||
| let blending_key = get_blending_query_key(query.time_dimensions.as_ref())?; | ||
| let response_key = get_blending_response_key(query.time_dimensions.as_ref())?; | ||
|
|
||
| if let Some(value) = row.get(&response_key) { | ||
| row.insert(blending_key, value.clone()); | ||
| if let (Some(key_idx), Some(response_idx)) = | ||
| (plan.blending_key_index, plan.blending_response_index) | ||
| { | ||
| let value = row.values()[response_idx].clone(); | ||
| row.set_by_position(key_idx, value); | ||
| } |
There was a problem hiding this comment.
Subtle behavioral change worth noting: The old code only inserted the blending key into the row if row.get(&response_key) returned Some(value) — meaning the response column was actually populated. If db_row was shorter than expected and the response column wasn't set, the blending key was simply absent from the output.
Now, because the shape always includes blending_key (inserted unconditionally in pass 3), the output will always serialize "time.<granularity>": null for BlendingQuery when the response column has no value — instead of the key being absent entirely.
This is likely fine since null vs. absent is usually equivalent downstream, but worth confirming with consumers. If it matters, you could guard with a check that the response value isn't the default Null before writing.
| /// Fast-path setter for callers that already know the position (e.g. via a plan). | ||
| /// Panics if `idx >= len()`. | ||
| pub fn set_by_position(&mut self, idx: usize, value: V) { | ||
| self.values[idx] = value; | ||
| } |
There was a problem hiding this comment.
Nit: Since this is the performance-critical hot path (called per-cell), consider adding #[inline] here and on get() / insert(). These methods are defined in a separate module and the compiler may not inline them across crate/module boundaries without LTO. Given how small they are, explicit #[inline] is virtually free and ensures the optimization regardless of build profile.
| /// Fast-path setter for callers that already know the position (e.g. via a plan). | |
| /// Panics if `idx >= len()`. | |
| pub fn set_by_position(&mut self, idx: usize, value: V) { | |
| self.values[idx] = value; | |
| } | |
| #[inline] | |
| pub fn set_by_position(&mut self, idx: usize, value: V) { | |
| self.values[idx] = value; | |
| } |
| // +1 to cover the optional tail entry (compareDateRange / blending key). | ||
| let mut row = IndexMap::with_capacity(plan.columns.len() + 1); | ||
| ) -> Result<StructuredObject> { | ||
| let mut row: StructuredObject = StructuredObject::with_shape_default(plan.shape.clone()); |
There was a problem hiding this comment.
Nit: plan.shape.clone() is just Arc::clone (refcount bump), so this is correct and cheap. But for readability in a performance-focused PR, Arc::clone(&plan.shape) makes the intent more explicit — it signals "this is a cheap refcount clone, not a deep copy" to future readers.
Minor style preference, not blocking.
| // FIXME: For now custom granularities are not supported, only common ones. | ||
| // There is no granularity type/class implementation in rust yet. | ||
| let mut minimal_granularities: HashMap<&str, (u8, DBResponsePrimitive)> = HashMap::new(); | ||
| let mut minimal_granularities: HashMap<usize, (u8, DBResponsePrimitive)> = HashMap::new(); |
There was a problem hiding this comment.
Nice improvement: keying the HashMap by usize instead of &str eliminates per-lookup string hashing. Since usize hashing is essentially free (identity), this is a meaningful win on the granularity tracking path.
| pub fn insert(&mut self, key: impl Into<String>) -> usize { | ||
| let key = key.into(); | ||
|
|
||
| if let Some(&i) = self.index.get(&key) { | ||
| return i; | ||
| } | ||
|
|
||
| let i = self.keys.len(); | ||
| self.index.insert(key.clone(), i); | ||
| self.keys.push(key); | ||
| i | ||
| } |
There was a problem hiding this comment.
The key.into() + key.clone() pattern allocates once for the into() conversion and clones once for the HashMap insert vs the Vec push. You could avoid the clone by inserting into the HashMap first (taking ownership) and pushing a second owned copy, but a cleaner approach is:
| pub fn insert(&mut self, key: impl Into<String>) -> usize { | |
| let key = key.into(); | |
| if let Some(&i) = self.index.get(&key) { | |
| return i; | |
| } | |
| let i = self.keys.len(); | |
| self.index.insert(key.clone(), i); | |
| self.keys.push(key); | |
| i | |
| } | |
| pub fn insert(&mut self, key: impl Into<String>) -> usize { | |
| let key = key.into(); | |
| if let Some(&i) = self.index.get(&key) { | |
| return i; | |
| } | |
| let i = self.keys.len(); | |
| self.keys.push(key.clone()); | |
| self.index.insert(key, i); | |
| i |
This way the clone() goes to keys (which only needs a read-only copy) and the original owned String goes to index (which needs ownership). Functionally identical, just slightly clearer about who needs ownership. Truly a nit — not blocking.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #10788 +/- ##
===========================================
+ Coverage 57.99% 78.75% +20.75%
===========================================
Files 215 469 +254
Lines 16776 92130 +75354
Branches 3388 3388
===========================================
+ Hits 9730 72553 +62823
- Misses 6553 19084 +12531
Partials 493 493
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…ct (−65%, 2.9x)
Replace per-row
IndexMap<String, DBResponsePrimitive>inget_vanilla_rowwith a newStructuredObjectthat stores values in a position-alignedVecand shares its key list across the whole result set viaArc<StructuredObjectShape>. Keys live once per result set instead of being cloned N×M times.Build the shape inside
build_vanilla_plan(regular columns + granularity base members +compareDateRange/blending tail keys, deduped) and stash precomputed output positions on the plan, soget_vanilla_rowwrites viaset_by_positionwith no per-row hash lookups or string allocations.Vanilla benchmark deltas on Apple M3 Max (criterion, cells/sec; all p < 0.05):