perf(cubeorchestrator): Improve performance of get_vanilla_row (β66.8%, 3x)#10783
perf(cubeorchestrator): Improve performance of get_vanilla_row (β66.8%, 3x)#10783
Conversation
Build a VanillaColumnPlan once per request and walk it per row, instead of redoing alias->member, annotation, and member-name parsing per cell. Pre-size the row IndexMap to skip incremental rehashes during fill. Benchmark on my Apple M3 Max (TransformedData::transform vanilla path, cells/sec): | cols Γ rows | before | after | speedup | | ------------- | --------- | ---------- | ------- | | 8 Γ 1,000 | 5.51 Mc/s | 16.36 Mc/s | 2.97Γ | | 8 Γ 10,000 | 5.66 Mc/s | 17.45 Mc/s | 3.08Γ | | 8 Γ 50,000 | 5.16 Mc/s | 15.58 Mc/s | 3.02Γ | | 8 Γ 100,000 | 5.28 Mc/s | 16.47 Mc/s | 3.12Γ | | 16 Γ 1,000 | 5.70 Mc/s | 17.19 Mc/s | 3.02Γ | | 16 Γ 10,000 | 5.57 Mc/s | 17.52 Mc/s | 3.15Γ | | 16 Γ 50,000 | 5.60 Mc/s | 17.25 Mc/s | 3.08Γ | | 16 Γ 100,000 | 5.47 Mc/s | 17.39 Mc/s | 3.18Γ | | 32 Γ 1,000 | 5.76 Mc/s | 16.91 Mc/s | 2.94Γ | | 32 Γ 10,000 | 5.66 Mc/s | 16.25 Mc/s | 2.87Γ | | 32 Γ 50,000 | 5.90 Mc/s | 16.48 Mc/s | 2.79Γ | | 32 Γ 100,000 | 5.69 Mc/s | 16.80 Mc/s | 2.95Γ | | 64 Γ 1,000 | 5.05 Mc/s | 17.45 Mc/s | 3.45Γ | | 64 Γ 10,000 | 5.48 Mc/s | 16.54 Mc/s | 3.02Γ | | 64 Γ 50,000 | 5.71 Mc/s | 16.99 Mc/s | 2.97Γ | | 64 Γ 100,000 | 5.95 Mc/s | 16.80 Mc/s | 2.82Γ | Average ~3.0Γ throughput improvement (5.6 -> 16.8 Mcells/s), on the server it will be 6-8x.
|
Claude finished @ovr's task in 3m 23s ββ View job Code Review: PR #10783
SummaryExcellent performance optimization. The core idea β build a CorrectnessI verified the refactoring preserves behavior across all code paths:
What's Good
Minor Suggestions
Security / SafetyNo concerns. The change is purely internal data-structure optimization with no new external inputs, no unsafe code, and no API surface changes. VerdictLGTM. Clean refactoring with a ~3Γ throughput improvement and no behavioral changes. The one inline suggestion ( |
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
Codecov Reportβ
All modified and coverable lines are covered by tests.
Additional details and impacted files@@ Coverage Diff @@
## master #10783 +/- ##
===========================================
- Coverage 78.74% 57.99% -20.75%
===========================================
Files 469 215 -254
Lines 92130 16776 -75354
Branches 3388 3388
===========================================
- Hits 72548 9730 -62818
+ Misses 19089 6553 -12536
Partials 493 493
Flags with carried forward coverage won't be shown. Click here to find out more. β View full report in Codecov by Sentry. π New features to boost your workflow:
|
| cols Γ rows | after | time Ξ | | ------------ | ---------- | ------ | | 8 Γ 1,000 | 19.77 Mc/s | β14.7% | | 8 Γ 10,000 | 19.13 Mc/s | β12.2% | | 8 Γ 50,000 | 19.26 Mc/s | β14.2% | | 8 Γ 100,000 | 18.67 Mc/s | β11.7% | | 16 Γ 1,000 | 21.04 Mc/s | β15.7% | | 16 Γ 10,000 | 20.62 Mc/s | β15.5% | | 16 Γ 50,000 | 20.00 Mc/s | β13.6% | | 16 Γ 100,000 | 20.46 Mc/s | β18.1% | | 32 Γ 1,000 | 21.02 Mc/s | β16.0% | | 32 Γ 10,000 | 20.80 Mc/s | β15.2% | | 32 Γ 50,000 | 19.92 Mc/s | β13.1% | | 32 Γ 100,000 | 20.72 Mc/s | β16.5% | | 64 Γ 1,000 | 21.22 Mc/s | β18.9% | | 64 Γ 10,000 | 20.27 Mc/s | β16.9% | | 64 Γ 50,000 | 20.78 Mc/s | β16.4% | | 64 Γ 100,000 | 20.86 Mc/s | β18.7% | Cumulative vanilla speedup vs. pre-optimization baseline: ~3.7Γ (5.5 -> ~20 Mcells/s).
Adds a second bench function `bench_transform_time_scenarios` that exercises four time-dimension shapes at a fixed 16 cols Γ 100k rows: - no_time_dim: baseline (matches existing 16Γ100k matrix entry) - one_time_dim_day: one time dim with known granularity (level=4) - one_time_dim_custom_granularity: unknown granularity, uses DEFAULT_LEVEL_FOR_UNKNOWN - two_time_dims: two distinct base members in the granularity tracker | scenario | compact | columnar | vanilla | | ------------------ | ----------- | ----------- | ----------- | | no_time_dim | 15.83 Mc/s | 14.37 Mc/s | 20.39 Mc/s | | one_time_dim_day | 10.07 Mc/s | 9.76 Mc/s | 10.61 Mc/s | | custom_granularity | 10.61 Mc/s | 10.41 Mc/s | 10.80 Mc/s | | two_time_dims | 8.14 Mc/s | 7.86 Mc/s | 7.35 Mc/s |
Build a VanillaColumnPlan once per request and walk it per row, instead of redoing alias->member, annotation, and member-name parsing per cell.
Pre-size the row IndexMap to skip incremental rehashes during fill.
Benchmark on my Apple M3 Max (TransformedData::transform vanilla path, cells/sec):
Average ~3.0Γ throughput improvement (5.6 -> 16.8 Mcells/s), on the typical server it will be 6-8x.