perf: Use `Arc<str>` instead of `Cow<&'a>` in the analyzer #9824

comphead · 2024-03-27T15:40:04Z

Which issue does this PR close?

Rationale for this change

In the analyzer there are mostly immutable strings so we may want to use Arc<str> which performs better than Cow<&'a str> for immutable strings

What changes are included in this PR?

Replace datatype

Are these changes tested?

yes

Are there any user-facing changes?

comphead · 2024-03-27T15:40:40Z

@alamb do you run sql planner benchmark vs baseline?

comphead · 2024-03-27T23:33:48Z

With Arc<String>

logical_select_one_from_700
                        time:   [554.07 µs 558.95 µs 563.83 µs]
                        change: [-8.0915% -7.3335% -6.5955%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild

physical_select_one_from_700
                        time:   [2.2570 ms 2.2633 ms 2.2695 ms]
                        change: [-12.030% -11.648% -11.261%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking logical_select_all_from_1000: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.9s, or reduce sample count to 70.
logical_select_all_from_1000
                        time:   [67.969 ms 68.259 ms 68.548 ms]
                        change: [-0.2127% +0.4030% +0.9935%] (p = 0.20 > 0.05)
                        No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild

Benchmarking physical_select_all_from_1000: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 48.5s, or reduce sample count to 10.
physical_select_all_from_1000
                        time:   [489.95 ms 490.95 ms 491.99 ms]
                        change: [-0.7288% -0.4071% -0.0631%] (p = 0.02 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

logical_trivial_join_low_numbered_columns
                        time:   [562.12 µs 574.36 µs 585.82 µs]
                        change: [-8.8467% -7.8261% -6.8917%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) low severe
  1 (1.00%) high mild

logical_trivial_join_high_numbered_columns
                        time:   [621.58 µs 625.02 µs 628.50 µs]
                        change: [-6.5314% -5.9528% -5.3728%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

logical_aggregate_with_join
                        time:   [816.72 µs 822.19 µs 827.20 µs]
                        change: [-8.8637% -8.3159% -7.7789%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) low mild

physical_plan_tpch_q1   time:   [3.7234 ms 3.7387 ms 3.7540 ms]
                        change: [-2.8314% -2.3141% -1.7672%] (p = 0.00 < 0.05)
                        Performance has improved.

physical_plan_tpch_q2   time:   [5.7200 ms 5.7401 ms 5.7604 ms]
                        change: [-8.4496% -8.0030% -7.5362%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

Benchmarking physical_plan_tpch_q3: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.9s, enable flat sampling, or reduce sample count to 40.
physical_plan_tpch_q3   time:   [1.9747 ms 1.9844 ms 1.9939 ms]
                        change: [-7.7369% -7.1870% -6.6937%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild

Benchmarking physical_plan_tpch_q4: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.5s, enable flat sampling, or reduce sample count to 50.
physical_plan_tpch_q4   time:   [1.6579 ms 1.6718 ms 1.6862 ms]
                        change: [-5.6940% -4.8575% -3.9896%] (p = 0.00 < 0.05)
                        Performance has improved.

physical_plan_tpch_q5   time:   [2.8634 ms 2.8781 ms 2.8929 ms]
                        change: [-7.8068% -7.2595% -6.7050%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

Benchmarking physical_plan_tpch_q6: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.9s, enable flat sampling, or reduce sample count to 60.
physical_plan_tpch_q6   time:   [1.1392 ms 1.1436 ms 1.1482 ms]
                        change: [-4.6954% -4.0361% -3.3698%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

physical_plan_tpch_q7   time:   [4.0553 ms 4.0677 ms 4.0804 ms]
                        change: [-7.6730% -7.2340% -6.7920%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

physical_plan_tpch_q8   time:   [5.8935 ms 5.9138 ms 5.9343 ms]
                        change: [-9.1448% -8.6347% -8.1282%] (p = 0.00 < 0.05)
                        Performance has improved.

physical_plan_tpch_q9   time:   [4.4677 ms 4.4835 ms 4.4998 ms]
                        change: [-7.6792% -7.2431% -6.7645%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

physical_plan_tpch_q10  time:   [2.9195 ms 2.9303 ms 2.9413 ms]
                        change: [-7.6631% -7.1036% -6.5909%] (p = 0.00 < 0.05)
                        Performance has improved.

physical_plan_tpch_q11  time:   [2.3232 ms 2.3311 ms 2.3393 ms]
                        change: [-6.2374% -5.7992% -5.3430%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

physical_plan_tpch_q12  time:   [2.0727 ms 2.0801 ms 2.0881 ms]
                        change: [-5.2569% -4.7417% -4.2173%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  6 (6.00%) high mild

Benchmarking physical_plan_tpch_q13: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.4s, enable flat sampling, or reduce sample count to 60.
physical_plan_tpch_q13  time:   [1.2721 ms 1.2762 ms 1.2804 ms]
                        change: [-6.9536% -6.4074% -5.8920%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  2 (2.00%) high severe

Benchmarking physical_plan_tpch_q14: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.5s, enable flat sampling, or reduce sample count to 50.
physical_plan_tpch_q14  time:   [1.6768 ms 1.6809 ms 1.6852 ms]
                        change: [-15.647% -13.021% -10.477%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

physical_plan_tpch_q16  time:   [2.4060 ms 2.4128 ms 2.4197 ms]
                        change: [-11.865% -11.222% -10.595%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

physical_plan_tpch_q17  time:   [2.2082 ms 2.2186 ms 2.2297 ms]
                        change: [-5.8705% -5.3824% -4.9079%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  6 (6.00%) high mild
  1 (1.00%) high severe

physical_plan_tpch_q18  time:   [2.4494 ms 2.4596 ms 2.4697 ms]
                        change: [-7.3244% -6.7553% -6.1892%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  9 (9.00%) low mild
  2 (2.00%) high mild

physical_plan_tpch_q19  time:   [5.4409 ms 5.4581 ms 5.4756 ms]
                        change: [-5.0691% -4.5998% -4.1102%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

physical_plan_tpch_q20  time:   [2.9386 ms 2.9486 ms 2.9589 ms]
                        change: [-6.3729% -5.9270% -5.4502%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

physical_plan_tpch_q21  time:   [4.3011 ms 4.3159 ms 4.3312 ms]
                        change: [-9.1963% -8.7694% -8.3255%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

physical_plan_tpch_q22  time:   [2.1718 ms 2.1784 ms 2.1854 ms]
                        change: [-4.9862% -4.3892% -3.8289%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

Benchmarking physical_plan_tpch_all: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.7s, or reduce sample count to 70.
physical_plan_tpch_all  time:   [66.636 ms 66.820 ms 67.003 ms]
                        change: [-8.6094% -8.1990% -7.7800%] (p = 0.00 < 0.05)
                        Performance has improved.

logical_plan_tpch_all   time:   [12.334 ms 12.394 ms 12.452 ms]
                        change: [-4.1030% -3.4164% -2.7061%] (p = 0.00 < 0.05)
                        Performance has improved.

comphead · 2024-03-28T00:20:01Z

@alamb @jayzhan211 I can see some improvements for the Arc<String> let me know folks if you wanna proceed with this

jayzhan211 · 2024-03-28T09:20:23Z

Is it possible to replace with Arc<str>?

comphead · 2024-03-28T15:05:50Z

Is it possible to replace with Arc<str>?

I'll try but imho I dont expect much diff as those strings are mostly immutable and should not be much diff between Vec<u8> which is String and [u8] which is str

alamb · 2024-03-28T17:14:47Z

Is it possible to replace with Arc<str>?

I'll try but imho I dont expect much diff as those strings are mostly immutable and should not be much diff between Vec<u8> which is String and [u8] which is str

I agree there is not a lot of practical difference -- but I think the Arc<str>is the more standard (and does avoid one level of indirection:

for Arc<str> it is Arc --> [u8]
For Arc<String> it is Arc -> String/Vec -> [u8]

alamb · 2024-03-28T17:15:36Z

@alamb @jayzhan211 I can see some improvements for the Arc<String> let me know folks if you wanna proceed with this

I think we should proceed 🚀

What I think we should do is get version 37 released so we can proceed.

comphead · 2024-03-28T17:25:09Z

yes, its 1 more level of indirection, I'll try to go with Arc<str>, lets see how many changes it will take

alamb · 2024-03-28T20:35:40Z

Most of the time, you can make an Arc<str> like Arc::from(my_str_variable) (which I didn't fully appreciate when I first encountered the pattern. I think @crepererum showed me that one back in the day)

comphead · 2024-03-29T22:04:07Z

oops

comphead · 2024-03-29T23:39:34Z

with Arc<str> results below. I'd say with Arc<String> numbers looked slightly better

logical_select_one_from_700
                        time:   [580.12 µs 585.06 µs 589.73 µs]
                        change: [-6.3590% -5.7328% -5.0564%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

physical_select_one_from_700
                        time:   [2.3826 ms 2.3916 ms 2.4006 ms]
                        change: [-7.0877% -6.6384% -6.1781%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

Benchmarking logical_select_all_from_1000: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.2s, or reduce sample count to 60.
logical_select_all_from_1000
                        time:   [70.414 ms 70.720 ms 71.037 ms]
                        change: [+3.3675% +4.0240% +4.6863%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

Benchmarking physical_select_all_from_1000: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 50.0s, or reduce sample count to 10.
physical_select_all_from_1000
                        time:   [506.06 ms 507.48 ms 508.95 ms]
                        change: [+2.5742% +2.9462% +3.3299%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

logical_trivial_join_low_numbered_columns
                        time:   [587.37 µs 592.07 µs 595.97 µs]
                        change: [-7.8608% -6.8978% -6.0233%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  7 (7.00%) low severe
  2 (2.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe

logical_trivial_join_high_numbered_columns
                        time:   [630.89 µs 632.24 µs 633.53 µs]
                        change: [-5.9321% -5.4946% -5.0383%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low severe
  4 (4.00%) low mild
  4 (4.00%) high mild

logical_aggregate_with_join
                        time:   [823.32 µs 827.32 µs 830.69 µs]
                        change: [-8.0075% -7.5928% -7.1907%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild

physical_plan_tpch_q1   time:   [3.7308 ms 3.7435 ms 3.7561 ms]
                        change: [-2.6559% -2.1877% -1.7373%] (p = 0.00 < 0.05)
                        Performance has improved.

physical_plan_tpch_q2   time:   [5.7711 ms 5.7942 ms 5.8175 ms]
                        change: [-7.6441% -7.1368% -6.6634%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild

Benchmarking physical_plan_tpch_q3: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 10.0s, enable flat sampling, or reduce sample count to 40.
physical_plan_tpch_q3   time:   [1.9749 ms 1.9794 ms 1.9844 ms]
                        change: [-7.4539% -6.9879% -6.5380%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  4 (4.00%) high mild
  1 (1.00%) high severe

Benchmarking physical_plan_tpch_q4: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.3s, enable flat sampling, or reduce sample count to 50.
physical_plan_tpch_q4   time:   [1.6496 ms 1.6537 ms 1.6581 ms]
                        change: [-6.7606% -6.0924% -5.4526%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild

physical_plan_tpch_q5   time:   [2.8363 ms 2.8451 ms 2.8541 ms]
                        change: [-8.7362% -8.3231% -7.8946%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild

Benchmarking physical_plan_tpch_q6: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.7s, enable flat sampling, or reduce sample count to 60.
physical_plan_tpch_q6   time:   [1.1385 ms 1.1419 ms 1.1455 ms]
                        change: [-4.6024% -3.9168% -3.1923%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) low mild
  4 (4.00%) high mild
  1 (1.00%) high severe

physical_plan_tpch_q7   time:   [4.0771 ms 4.0917 ms 4.1067 ms]
                        change: [-7.1486% -6.6862% -6.1909%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

physical_plan_tpch_q8   time:   [6.0481 ms 6.0697 ms 6.0910 ms]
                        change: [-6.7553% -6.2264% -5.7076%] (p = 0.00 < 0.05)
                        Performance has improved.

physical_plan_tpch_q9   time:   [4.5076 ms 4.5234 ms 4.5394 ms]
                        change: [-6.8838% -6.4167% -5.9600%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

physical_plan_tpch_q10  time:   [2.9380 ms 2.9475 ms 2.9574 ms]
                        change: [-7.1054% -6.5586% -6.0542%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild

physical_plan_tpch_q11  time:   [2.3309 ms 2.3379 ms 2.3450 ms]
                        change: [-5.9245% -5.5252% -5.1065%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild

physical_plan_tpch_q12  time:   [2.0939 ms 2.1058 ms 2.1193 ms]
                        change: [-4.2735% -3.5686% -2.8289%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

Benchmarking physical_plan_tpch_q13: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.7s, enable flat sampling, or reduce sample count to 60.
physical_plan_tpch_q13  time:   [1.3206 ms 1.3255 ms 1.3305 ms]
                        change: [-3.3886% -2.7905% -2.2341%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

Benchmarking physical_plan_tpch_q14: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.8s, enable flat sampling, or reduce sample count to 50.
physical_plan_tpch_q14  time:   [1.7025 ms 1.7112 ms 1.7211 ms]
                        change: [-14.099% -11.398% -8.8080%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

physical_plan_tpch_q16  time:   [2.4419 ms 2.4517 ms 2.4616 ms]
                        change: [-10.500% -9.7909% -9.1054%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

physical_plan_tpch_q17  time:   [2.2205 ms 2.2303 ms 2.2406 ms]
                        change: [-5.3559% -4.8812% -4.3619%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe

physical_plan_tpch_q18  time:   [2.4850 ms 2.4959 ms 2.5070 ms]
                        change: [-5.9772% -5.3788% -4.8142%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

physical_plan_tpch_q19  time:   [5.5130 ms 5.5335 ms 5.5539 ms]
                        change: [-3.7923% -3.2835% -2.7258%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild

physical_plan_tpch_q20  time:   [2.9745 ms 2.9860 ms 2.9981 ms]
                        change: [-5.2189% -4.7349% -4.2429%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

physical_plan_tpch_q21  time:   [4.4102 ms 4.4247 ms 4.4394 ms]
                        change: [-6.9218% -6.4708% -6.0108%] (p = 0.00 < 0.05)
                        Performance has improved.

physical_plan_tpch_q22  time:   [2.1663 ms 2.1753 ms 2.1852 ms]
                        change: [-5.1750% -4.5239% -3.8635%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  4 (4.00%) high severe

Benchmarking physical_plan_tpch_all: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.8s, or reduce sample count to 70.
physical_plan_tpch_all  time:   [67.608 ms 67.925 ms 68.253 ms]
                        change: [-7.2306% -6.6810% -6.1284%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

logical_plan_tpch_all   time:   [12.434 ms 12.540 ms 12.636 ms]
                        change: [-3.4299% -2.2768% -1.3315%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) low severe
  2 (2.00%) low mild

comphead · 2024-04-01T16:10:55Z

Latest Arc<String> consistently improves 5-7%, in very good cases even 14%

logical_select_one_from_700
                        time:   [576.23 µs 577.70 µs 579.05 µs]
                        change: [-6.5082% -5.9265% -5.3353%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low severe
  4 (4.00%) low mild
  1 (1.00%) high mild
  3 (3.00%) high severe

physical_select_one_from_700
                        time:   [2.1965 ms 2.2016 ms 2.2071 ms]
                        change: [-14.414% -14.055% -13.708%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) high mild
  5 (5.00%) high severe

Benchmarking logical_select_all_from_1000: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.6s, or reduce sample count to 70.
logical_select_all_from_1000
                        time:   [65.417 ms 65.721 ms 66.037 ms]
                        change: [-3.9689% -3.3293% -2.7003%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

Benchmarking physical_select_all_from_1000: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 47.2s, or reduce sample count to 10.
physical_select_all_from_1000
                        time:   [469.29 ms 470.48 ms 471.69 ms]
                        change: [-4.8974% -4.5601% -4.1916%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

logical_trivial_join_low_numbered_columns
                        time:   [606.50 µs 608.04 µs 609.58 µs]
                        change: [-4.6959% -4.1899% -3.6424%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low severe
  3 (3.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe

logical_trivial_join_high_numbered_columns
                        time:   [634.81 µs 637.24 µs 639.53 µs]
                        change: [-4.7751% -4.3011% -3.8219%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  7 (7.00%) low mild
  1 (1.00%) high mild
  3 (3.00%) high severe

logical_aggregate_with_join
                        time:   [833.85 µs 835.68 µs 837.53 µs]
                        change: [-7.1013% -6.7186% -6.3543%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  1 (1.00%) high mild

physical_plan_tpch_q1   time:   [3.5415 ms 3.5513 ms 3.5618 ms]
                        change: [-7.6268% -7.2103% -6.8134%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe

physical_plan_tpch_q2   time:   [5.5515 ms 5.5835 ms 5.6185 ms]
                        change: [-11.115% -10.514% -9.8843%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

Benchmarking physical_plan_tpch_q3: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.9s, enable flat sampling, or reduce sample count to 50.
physical_plan_tpch_q3   time:   [2.0002 ms 2.0202 ms 2.0393 ms]
                        change: [-6.7269% -5.8744% -5.0422%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

Benchmarking physical_plan_tpch_q4: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.5s, enable flat sampling, or reduce sample count to 50.
physical_plan_tpch_q4   time:   [1.6361 ms 1.6525 ms 1.6692 ms]
                        change: [-7.7611% -6.8830% -6.0578%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

physical_plan_tpch_q5   time:   [2.8525 ms 2.8742 ms 2.8964 ms]
                        change: [-8.1109% -7.3859% -6.6532%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking physical_plan_tpch_q6: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.1s, enable flat sampling, or reduce sample count to 60.
physical_plan_tpch_q6   time:   [1.1371 ms 1.1439 ms 1.1512 ms]
                        change: [-4.8222% -4.1028% -3.3634%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

physical_plan_tpch_q7   time:   [4.0255 ms 4.0757 ms 4.1341 ms]
                        change: [-8.3332% -7.0531% -5.6627%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

physical_plan_tpch_q8   time:   [5.6720 ms 5.6937 ms 5.7166 ms]
                        change: [-12.538% -12.035% -11.493%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

physical_plan_tpch_q9   time:   [4.2870 ms 4.2995 ms 4.3133 ms]
                        change: [-11.457% -11.051% -10.665%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe

physical_plan_tpch_q10  time:   [2.8187 ms 2.8270 ms 2.8359 ms]
                        change: [-10.879% -10.380% -9.9175%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe

physical_plan_tpch_q11  time:   [2.2546 ms 2.2614 ms 2.2689 ms]
                        change: [-9.0121% -8.6145% -8.1883%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

physical_plan_tpch_q12  time:   [2.0119 ms 2.0176 ms 2.0239 ms]
                        change: [-8.0297% -7.6041% -7.1867%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

Benchmarking physical_plan_tpch_q13: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.4s, enable flat sampling, or reduce sample count to 60.
physical_plan_tpch_q13  time:   [1.2584 ms 1.2620 ms 1.2663 ms]
                        change: [-7.9586% -7.4495% -6.9731%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) low mild
  4 (4.00%) high mild
  2 (2.00%) high severe

Benchmarking physical_plan_tpch_q14: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.3s, enable flat sampling, or reduce sample count to 50.
physical_plan_tpch_q14  time:   [1.6386 ms 1.6432 ms 1.6482 ms]
                        change: [-17.798% -15.212% -12.713%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  2 (2.00%) high severe

physical_plan_tpch_q16  time:   [2.3289 ms 2.3359 ms 2.3439 ms]
                        change: [-14.684% -14.050% -13.412%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

physical_plan_tpch_q17  time:   [2.1194 ms 2.1255 ms 2.1323 ms]
                        change: [-9.7062% -9.3504% -8.9638%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe

physical_plan_tpch_q18  time:   [2.3301 ms 2.3373 ms 2.3448 ms]
                        change: [-11.888% -11.391% -10.910%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

physical_plan_tpch_q19  time:   [5.2007 ms 5.2191 ms 5.2392 ms]
                        change: [-9.2573% -8.7779% -8.2815%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

physical_plan_tpch_q20  time:   [2.8280 ms 2.8368 ms 2.8466 ms]
                        change: [-9.9140% -9.4954% -9.0702%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

physical_plan_tpch_q21  time:   [4.1423 ms 4.1551 ms 4.1689 ms]
                        change: [-12.567% -12.169% -11.735%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe

physical_plan_tpch_q22  time:   [2.0965 ms 2.1030 ms 2.1100 ms]
                        change: [-8.2668% -7.7013% -7.1833%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

Benchmarking physical_plan_tpch_all: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.5s, or reduce sample count to 70.
physical_plan_tpch_all  time:   [64.117 ms 64.298 ms 64.491 ms]
                        change: [-12.073% -11.663% -11.246%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

logical_plan_tpch_all   time:   [12.591 ms 12.638 ms 12.686 ms]
                        change: [-2.1208% -1.5156% -0.8282%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild

@alamb @jayzhan211 which way you guys prefer?

crepererum · 2024-04-02T08:41:17Z

The technical difference between the two is:

Arc<String> stores pointer to allocation, the allocation capacity and the used length (because String is mutable, like Vec)
Arc<str> stores pointer to allocation (which always is always trimmed) and the length

So Arc<String> is wasteful since you pay for a mutable data structure that you cannot mutate anymore. Same goes for Arc<Vec<T>> which should always be Arc<[T]>.

If you measure a perf difference, then this is probably because one (or more) of these points hold:

noise
compiler optimizer bug
the trimming from String -> Arc<str> is too costly. However it saves memory, so you get memory efficiency in return

That said, since Arc<str> also doesn't regress performance, I would opt for the more memory-efficient implementation (Arc<str>).

alamb · 2024-04-02T13:15:48Z

@alamb @jayzhan211 which way you guys prefer?

I prefer Arc<str> for the reasons mentioned by @crepererum #9824 (comment)

alamb · 2024-04-02T13:16:35Z

I am feeling DataFusion planning time is going to be very much improved in Version 38.0.0 🚀 -- I'll check this PR out later today. Very excited

comphead · 2024-04-02T15:43:04Z

Modified to Arc<str> as discussed. The better performance benefit for Arc<String> I think still explained by using String more often across the analyzer than str and the conversion is cheaper

comphead · 2024-04-02T15:44:02Z

datafusion/common/src/schema_reference.rs

@@ -38,9 +33,9 @@ impl SchemaReference<'_> {
    }
 }

-pub type OwnedSchemaReference = SchemaReference<'static>;
+pub type OwnedSchemaReference = SchemaReference;


Gonna deprecate this type in following PR

comphead · 2024-04-02T15:44:29Z

datafusion/common/src/table_reference.rs

@@ -246,24 +240,7 @@ impl<'a> TableReference<'a> {
    /// Converts directly into an [`OwnedTableReference`] by cloning
    /// the underlying data.
    pub fn to_owned_reference(&self) -> OwnedTableReference {


gonna deprecate this method in following PR

alamb

TLDR is I think this is awesome -- thank you @comphead . Our planning speed is going to be very sweet indeed.

Modified to Arc as discussed. The better performance benefit for Arc I think still explained by using String more often across the analyzer than str and the conversion is cheaper

Yeah, this is really strange. I think the issue is exactly as you describe, which is that using an Arc<String> can simply take the string that was allocated by

Latest Arc consistently improves 5-7%, in very good cases even 14%

I am sorry I think I missed this comment. In would like to change my vote and say Arc<String> though I realize you already changed the PR once so I think it would also be ok to merge this as is

alamb · 2024-04-02T19:20:03Z

I am also running the benchmarks as well and I am going to try and make a test PR that tries using Impl Into<Arc<str>> and see if the Rust compiler is smart enough to make that faster. Will update here with results

Update: testing with #9916

ps. it would be sweet to see what this does for the TPC-DS benchmarks #9907

comphead · 2024-04-02T19:29:21Z

I'll merge it as is, and will create another one with Arc again, so we can see if there is more benefits

alamb · 2024-04-02T19:30:44Z

I'll merge it as is, and will create another one with Arc again, so we can see if there is more benefits

Sounds good -- we can also potentially switch to using Arc<String> as a follow on if we run a benchmark and show it going faster

alamb · 2024-04-02T21:16:58Z

Update it turns out that I got significant performance improvements by using impl Into<Arc<str>> -- #9916 🚀

comphead · 2024-04-02T21:23:01Z

Update it turns out that I got significant performance improvements by using impl Into<Arc<str>> -- #9916 🚀

Yes, my very first implementation was to get the impl Into<String> instead of &str. Looks like compiler in this case can find optimal construction and avoid reallocation

github-actions bot added sql SQL Planner core Core DataFusion crate substrait Changes to the substrait crate labels Mar 27, 2024

comphead mentioned this pull request Mar 27, 2024

Replace Cow<str> in TableReference to Arc<str> #9764

Closed

comphead marked this pull request as draft March 27, 2024 15:41

comphead closed this Mar 28, 2024

comphead force-pushed the dev0 branch from fe30507 to 81c96fc Compare March 28, 2024 23:49

comphead reopened this Mar 29, 2024

github-actions bot added sql SQL Planner logical-expr Logical plan and expressions core Core DataFusion crate substrait Changes to the substrait crate and removed sql SQL Planner core Core DataFusion crate substrait Changes to the substrait crate labels Mar 29, 2024

comphead force-pushed the dev0 branch from 65a7ef8 to 1e99002 Compare March 29, 2024 23:43

comphead added 2 commits April 1, 2024 17:44

Arc<str>

5f7708e

Arc<String> for TableReference

24d3c5b

merge

b92768a

comphead force-pushed the dev0 branch from 2086ca2 to b92768a Compare April 2, 2024 01:01

comphead changed the title ~~[WIP] Analyzer: Arc<String> vs Cow~~ perf: Use Arc<String> instead of Cow<&'a> in the analyzer Apr 2, 2024

comphead marked this pull request as ready for review April 2, 2024 01:28

comphead requested review from alamb and jayzhan211 April 2, 2024 01:28

alamb mentioned this pull request Apr 2, 2024

DataFusion weekly project plan (Andrew Lamb) - April 1, 2024 #9899

Closed

7 tasks

Arc<str>

0eae4fd

comphead commented Apr 2, 2024

View reviewed changes

comphead changed the title ~~perf: Use Arc<String> instead of Cow<&'a> in the analyzer~~ perf: Use Arc<str> instead of Cow<&'a> in the analyzer Apr 2, 2024

alamb approved these changes Apr 2, 2024

View reviewed changes

alamb mentioned this pull request Apr 2, 2024

Improve planning speed using impl Into<Arc<str>> to create Arc<str> rather than &str #9916

Merged

comphead merged commit f51fda5 into apache:main Apr 2, 2024
27 checks passed

alamb mentioned this pull request Apr 3, 2024

Add TPCH-DS planning benchmark #9907

Merged

comphead mentioned this pull request Apr 4, 2024

Remove OwnedTableReference and OwnedSchemaReference #9933

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Use `Arc<str>` instead of `Cow<&'a>` in the analyzer #9824

perf: Use `Arc<str>` instead of `Cow<&'a>` in the analyzer #9824

comphead commented Mar 27, 2024 •

edited

Loading

comphead commented Mar 27, 2024

comphead commented Mar 27, 2024 •

edited

Loading

comphead commented Mar 28, 2024

jayzhan211 commented Mar 28, 2024

comphead commented Mar 28, 2024 •

edited

Loading

alamb commented Mar 28, 2024

alamb commented Mar 28, 2024

comphead commented Mar 28, 2024

alamb commented Mar 28, 2024

comphead commented Mar 29, 2024

comphead commented Mar 29, 2024 •

edited

Loading

comphead commented Apr 1, 2024

crepererum commented Apr 2, 2024 •

edited

Loading

alamb commented Apr 2, 2024

alamb commented Apr 2, 2024

comphead commented Apr 2, 2024 •

edited

Loading

comphead Apr 2, 2024

comphead Apr 4, 2024

comphead Apr 2, 2024

comphead Apr 4, 2024

alamb left a comment

alamb commented Apr 2, 2024 •

edited

Loading

comphead commented Apr 2, 2024

alamb commented Apr 2, 2024

alamb commented Apr 2, 2024

comphead commented Apr 2, 2024

perf: Use Arc<str> instead of Cow<&'a> in the analyzer #9824

perf: Use Arc<str> instead of Cow<&'a> in the analyzer #9824

Conversation

comphead commented Mar 27, 2024 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

comphead commented Mar 27, 2024

comphead commented Mar 27, 2024 • edited Loading

comphead commented Mar 28, 2024

jayzhan211 commented Mar 28, 2024

comphead commented Mar 28, 2024 • edited Loading

alamb commented Mar 28, 2024

alamb commented Mar 28, 2024

comphead commented Mar 28, 2024

alamb commented Mar 28, 2024

comphead commented Mar 29, 2024

comphead commented Mar 29, 2024 • edited Loading

comphead commented Apr 1, 2024

crepererum commented Apr 2, 2024 • edited Loading

alamb commented Apr 2, 2024

alamb commented Apr 2, 2024

comphead commented Apr 2, 2024 • edited Loading

comphead Apr 2, 2024

Choose a reason for hiding this comment

comphead Apr 4, 2024

Choose a reason for hiding this comment

comphead Apr 2, 2024

Choose a reason for hiding this comment

comphead Apr 4, 2024

Choose a reason for hiding this comment

alamb left a comment

Choose a reason for hiding this comment

alamb commented Apr 2, 2024 • edited Loading

comphead commented Apr 2, 2024

alamb commented Apr 2, 2024

alamb commented Apr 2, 2024

comphead commented Apr 2, 2024

perf: Use `Arc<str>` instead of `Cow<&'a>` in the analyzer #9824

perf: Use `Arc<str>` instead of `Cow<&'a>` in the analyzer #9824

comphead commented Mar 27, 2024 •

edited

Loading

comphead commented Mar 27, 2024 •

edited

Loading

comphead commented Mar 28, 2024 •

edited

Loading

comphead commented Mar 29, 2024 •

edited

Loading

crepererum commented Apr 2, 2024 •

edited

Loading

comphead commented Apr 2, 2024 •

edited

Loading

alamb commented Apr 2, 2024 •

edited

Loading