refactor: Reduce string allocations in Expr::display_name (use write instead of format!) #10454

erratic-pattern · 2024-05-11T00:15:11Z

Just a small refactor that should, in theory, reduce string allocations and thus benefit concurrent throughput by reducing allocator lock contention. However, when running concurrency benchmarks on my M3 Max I only saw a minor improvement in InfluxDB (about 40 additional queries per second with 3 threads, but no change in overall curve as # of threads increases).

I don't have any strong opinion on whether or not this should be merged in, but I might as well submit it as a PR.

alamb

I think it is a great idea to speed up planning by avoiding string allocations in Expr::display_name @erratic-pattern -- thank you 🙏

I left some suggestions for small additional improvements, but I also think this PR could be merged as is.

I am going to run my planing benchmarks too to see if we can measure any difference here

alamb · 2024-05-11T11:03:48Z

datafusion/expr/src/expr.rs

+fn write_function_name<'a>(
+    w: &'a mut (dyn Write + 'a),


Rather than using dynamic dispatch I think you can make this just a normal generic function and make the code simpler and likely more performant

Something like

fn write_function_name<W: Write>( w: &mut W,

I actually tried it locally and it seems to work well. Here is a PR erratic-pattern#1 to this branch for your consideration

Rather than using dynamic dispatch I think you can make this just a normal generic function and make the code simpler and likely more performant

Something like

fn write_function_name<W: Write>( w: &mut W,

I actually tried it locally and it seems to work well. Here is a PR erratic-pattern#1 to this branch for your consideration

I didn't find a meaningful difference in performance here vs monomorphic types (I tried &mut String explicitly in my testing, which is similar code to the generic here). In fact Formatter in the standard library also has an internal dyn Write, though it has an actual need for ad-hoc polymorphism whereas we don't. I am guessing it gets optimized in most cases, but it certainly wouldn't hurt to make it explicitly generic so I agree with this change.

alamb · 2024-05-11T11:05:45Z

datafusion/expr/src/expr.rs

@@ -1693,10 +1708,9 @@ pub(crate) fn create_name(e: &Expr) -> Result<String> {
                if let Some(char) = escape_char {
                    format!("CHAR '{char}'")
                } else {
-                    "".to_string()
+                    "".to_owned()


I think you could avoid this allocation (and the format! above it).

alamb · 2024-05-11T11:06:00Z

datafusion/expr/src/expr.rs

@@ -1717,111 +1732,118 @@ pub(crate) fn create_name(e: &Expr) -> Result<String> {
                if let Some(char) = escape_char {
                    format!("CHAR '{char}'")
                } else {
-                    "".to_string()
+                    "".to_owned()


same comment as above -- let's get rid of all the extra allocations

alamb · 2024-05-11T18:56:50Z

Thanks @erratic-pattern -- I took the liberty of merging the branch up from main to resolve a merge conflict as well

alamb · 2024-05-11T20:03:00Z

Wow -- according to my benchmarks this change makes a non trivial difference in performance. We just keep driving tese numbers down

group                                         main                                   reduce-string-allocations-no-cow
-----                                         ----                                   --------------------------------
logical_aggregate_with_join                   1.01  1214.3±63.26µs        ? ?/sec    1.00  1197.2±15.52µs        ? ?/sec
logical_plan_tpcds_all                        1.01    158.7±1.76ms        ? ?/sec    1.00    157.8±1.40ms        ? ?/sec
logical_plan_tpch_all                         1.00     17.0±0.18ms        ? ?/sec    1.00     17.0±0.22ms        ? ?/sec
logical_select_all_from_1000                  1.05     18.8±0.14ms        ? ?/sec    1.00     18.0±0.11ms        ? ?/sec
logical_select_one_from_700                   1.01    816.2±8.81µs        ? ?/sec    1.00   807.9±10.28µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.00   757.9±23.06µs        ? ?/sec    1.01    761.9±8.19µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.00    749.3±8.82µs        ? ?/sec    1.00    747.1±6.33µs        ? ?/sec
physical_plan_tpcds_all                       1.04   1354.5±6.54ms        ? ?/sec    1.00   1296.4±9.46ms        ? ?/sec
physical_plan_tpch_all                        1.05     92.7±1.15ms        ? ?/sec    1.00     88.5±1.09ms        ? ?/sec
physical_plan_tpch_q1                         1.10      5.2±0.06ms        ? ?/sec    1.00      4.7±0.05ms        ? ?/sec
physical_plan_tpch_q10                        1.07      4.5±0.06ms        ? ?/sec    1.00      4.2±0.09ms        ? ?/sec
physical_plan_tpch_q11                        1.06      3.9±0.06ms        ? ?/sec    1.00      3.7±0.06ms        ? ?/sec
physical_plan_tpch_q12                        1.12      3.1±0.06ms        ? ?/sec    1.00      2.8±0.05ms        ? ?/sec
physical_plan_tpch_q13                        1.07      2.2±0.03ms        ? ?/sec    1.00      2.0±0.02ms        ? ?/sec
physical_plan_tpch_q14                        1.09      2.7±0.04ms        ? ?/sec    1.00      2.5±0.04ms        ? ?/sec
physical_plan_tpch_q16                        1.07      3.8±0.06ms        ? ?/sec    1.00      3.5±0.05ms        ? ?/sec
physical_plan_tpch_q17                        1.05      3.5±0.06ms        ? ?/sec    1.00      3.4±0.05ms        ? ?/sec
physical_plan_tpch_q18                        1.03      4.0±0.06ms        ? ?/sec    1.00      3.9±0.07ms        ? ?/sec
physical_plan_tpch_q19                        1.12      6.4±0.09ms        ? ?/sec    1.00      5.8±0.09ms        ? ?/sec
physical_plan_tpch_q2                         1.03      7.8±0.09ms        ? ?/sec    1.00      7.6±0.08ms        ? ?/sec
physical_plan_tpch_q20                        1.07      4.7±0.08ms        ? ?/sec    1.00      4.4±0.08ms        ? ?/sec
physical_plan_tpch_q21                        1.02      6.2±0.08ms        ? ?/sec    1.00      6.1±0.07ms        ? ?/sec
physical_plan_tpch_q22                        1.07      3.4±0.05ms        ? ?/sec    1.00      3.2±0.05ms        ? ?/sec
physical_plan_tpch_q3                         1.06      3.2±0.06ms        ? ?/sec    1.00      3.0±0.04ms        ? ?/sec
physical_plan_tpch_q4                         1.02      2.3±0.05ms        ? ?/sec    1.00      2.3±0.06ms        ? ?/sec
physical_plan_tpch_q5                         1.01      4.4±0.07ms        ? ?/sec    1.00      4.4±0.06ms        ? ?/sec
physical_plan_tpch_q6                         1.07  1603.4±29.06µs        ? ?/sec    1.00  1494.7±42.11µs        ? ?/sec
physical_plan_tpch_q7                         1.04      5.7±0.08ms        ? ?/sec    1.00      5.5±0.10ms        ? ?/sec
physical_plan_tpch_q8                         1.02      7.3±0.08ms        ? ?/sec    1.00      7.2±0.07ms        ? ?/sec
physical_plan_tpch_q9                         1.03      5.7±0.09ms        ? ?/sec    1.00      5.5±0.08ms        ? ?/sec
physical_select_all_from_1000                 1.04     61.2±0.31ms        ? ?/sec    1.00     59.1±0.34ms        ? ?/sec
physical_select_one_from_700                  1.04      3.7±0.05ms        ? ?/sec    1.00      3.5±0.03ms        ? ?/sec

alamb · 2024-05-12T10:02:59Z

🚀

erratic-pattern · 2024-05-12T22:11:09Z

Nice! Those results look a lot better than what I found on my laptop. Very hard to get consistent benchmark results on a personal computer when there's so much process scheduling noise

alamb · 2024-05-13T09:47:52Z

Very hard to get consistent benchmark results on a personal computer when there's so much process scheduling noise

Yeah, I have a gcp VM running on which I run the benchmarks

github-actions bot added the logical-expr Logical plan and expressions label May 11, 2024

alamb mentioned this pull request May 11, 2024

Use static dispatch for write erratic-pattern/arrow-datafusion#1

Merged

alamb changed the title ~~refactor: use Write instead of format! to implement display_name~~ refactor: use Reduce string allocations in Expr::display_name (use write instead of format! to implement display_name) May 11, 2024

alamb changed the title ~~refactor: use Reduce string allocations in Expr::display_name (use write instead of format! to implement display_name)~~ refactor: use Reduce string allocations in Expr::display_name (use write instead of format!) May 11, 2024

alamb reviewed May 11, 2024

View reviewed changes

alamb approved these changes May 11, 2024

View reviewed changes

erratic-pattern changed the title ~~refactor: use Reduce string allocations in Expr::display_name (use write instead of format!)~~ refactor: Reduce string allocations in Expr::display_name (use write instead of format!) May 11, 2024

github-actions bot added the optimizer Optimizer rules label May 11, 2024

alamb approved these changes May 11, 2024

View reviewed changes

erratic-pattern and others added 3 commits May 11, 2024 15:01

refactor: use Write instead of format! to implement display_name

3077fff

Use static dispatch for write

c763260

remove more allocations

1d8ecd1

erratic-pattern force-pushed the adam/reduce-string-allocations-no-cow branch from 63a1656 to 1d8ecd1 Compare May 11, 2024 19:01

erratic-pattern requested a review from alamb May 11, 2024 19:02

alamb merged commit 8cc92a9 into apache:main May 12, 2024
23 checks passed

This was referenced May 13, 2024

DataFusion weekly project plan (Andrew Lamb) - May 13, 2024 #10482

Closed

Make CommonSubexprEliminate faster by stop copying so many strings #10426

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: Reduce string allocations in Expr::display_name (use write instead of format!) #10454

refactor: Reduce string allocations in Expr::display_name (use write instead of format!) #10454

erratic-pattern commented May 11, 2024 •

edited

alamb left a comment

alamb May 11, 2024

erratic-pattern May 11, 2024

alamb May 11, 2024

alamb May 11, 2024

alamb commented May 11, 2024

alamb commented May 11, 2024

alamb commented May 12, 2024

erratic-pattern commented May 12, 2024

alamb commented May 13, 2024

refactor: Reduce string allocations in Expr::display_name (use write instead of format!) #10454

refactor: Reduce string allocations in Expr::display_name (use write instead of format!) #10454

Conversation

erratic-pattern commented May 11, 2024 • edited

alamb left a comment

Choose a reason for hiding this comment

alamb May 11, 2024

Choose a reason for hiding this comment

erratic-pattern May 11, 2024

Choose a reason for hiding this comment

alamb May 11, 2024

Choose a reason for hiding this comment

alamb May 11, 2024

Choose a reason for hiding this comment

alamb commented May 11, 2024

alamb commented May 11, 2024

alamb commented May 12, 2024

erratic-pattern commented May 12, 2024

alamb commented May 13, 2024

erratic-pattern commented May 11, 2024 •

edited