Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Epic] A collection of issues to improve planning performance / speed / efficiency #5637

Open
6 of 15 tasks
alamb opened this issue Mar 18, 2023 · 10 comments
Open
6 of 15 tasks
Labels
enhancement New feature or request

Comments

@alamb alamb added the enhancement New feature or request label Mar 18, 2023
@alamb alamb changed the title [Epic] A collection of issues to improve planning speed / efficiency [Epic] A collection of issues to improve slow planning speed / efficiency Sep 29, 2023
@karlovnv
Copy link

Also I'd like to consider replace list in DFSchema by case_insensitive_hashmap or something similar in order to get value with O(1) complexity instead of O(N).

As I understand, now complexity is O(N^2) due two loops of iterations (datafusion_common::dfschema::DFSchema::index_of_column_by_name and datafusion_common::table_reference::TableReference::resolved_eq)

@alamb
Copy link
Contributor Author

alamb commented Sep 29, 2023

Yes, I think there is a lot of room for improvement (though we need to be careful about taking on crate dependencies that might not have a good long term maintenance story)

@alamb alamb changed the title [Epic] A collection of issues to improve slow planning speed / efficiency [Epic] A collection of issues to improve planning performance / speed / efficiency Nov 3, 2023
@alamb
Copy link
Contributor Author

alamb commented Mar 16, 2024

Here are some other recent discussions about how to improve planning speed:

@alamb
Copy link
Contributor Author

alamb commented Apr 11, 2024

An update here. Thanks to a bunch of work by @haohuaijin @matthewmturner @jayzhan211 @peter-toth @jackwener and myself, the planning speed on 38.0.0 is looking to be quite a bit better 20%-700% better in many cases. I am fairly confident there is still another factor of 2 to be had by completing #9637, which I expect to complete over the next few weeks

+ critcmp main 37.0.0
group                                         37.0.0                                 main
-----                                         ------                                 ----
logical_aggregate_with_join                   1.05  1271.9±10.23µs        ? ?/sec    1.00  1210.1±16.14µs        ? ?/sec
logical_plan_tpcds_all                        1.07    167.2±1.37ms        ? ?/sec    1.00    156.4±0.88ms        ? ?/sec
logical_plan_tpch_all                         1.01     17.2±0.18ms        ? ?/sec    1.00     17.0±0.15ms        ? ?/sec
logical_select_all_from_1000                  4.84     93.5±0.41ms        ? ?/sec    1.00     19.3±0.10ms        ? ?/sec
logical_select_one_from_700                   1.00   751.6±12.41µs        ? ?/sec    1.06    795.9±8.14µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.06   795.8±10.91µs        ? ?/sec    1.00    750.1±8.38µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.04   764.2±18.21µs        ? ?/sec    1.00   737.4±18.35µs        ? ?/sec
physical_plan_tpcds_all                       1.46       2.2±0.01s        ? ?/sec    1.00   1479.1±3.64ms        ? ?/sec
physical_plan_tpch_all                        1.35    134.5±0.81ms        ? ?/sec    1.00     99.6±0.77ms        ? ?/sec
physical_plan_tpch_q1                         1.43      7.7±0.06ms        ? ?/sec    1.00      5.4±0.07ms        ? ?/sec
physical_plan_tpch_q10                        1.38      6.4±0.05ms        ? ?/sec    1.00      4.6±0.02ms        ? ?/sec
physical_plan_tpch_q11                        1.24      5.1±0.03ms        ? ?/sec    1.00      4.1±0.03ms        ? ?/sec
physical_plan_tpch_q12                        1.25      4.1±0.02ms        ? ?/sec    1.00      3.3±0.01ms        ? ?/sec
physical_plan_tpch_q13                        1.22      2.7±0.02ms        ? ?/sec    1.00      2.2±0.01ms        ? ?/sec
physical_plan_tpch_q14                        1.22      3.5±0.02ms        ? ?/sec    1.00      2.9±0.02ms        ? ?/sec
physical_plan_tpch_q16                        1.33      5.3±0.02ms        ? ?/sec    1.00      4.0±0.02ms        ? ?/sec
physical_plan_tpch_q17                        1.29      4.9±0.03ms        ? ?/sec    1.00      3.8±0.02ms        ? ?/sec
physical_plan_tpch_q18                        1.33      5.5±0.06ms        ? ?/sec    1.00      4.1±0.02ms        ? ?/sec
physical_plan_tpch_q19                        1.29     10.1±0.09ms        ? ?/sec    1.00      7.9±0.05ms        ? ?/sec
physical_plan_tpch_q2                         1.44     12.3±0.09ms        ? ?/sec    1.00      8.5±0.06ms        ? ?/sec
physical_plan_tpch_q20                        1.32      6.4±0.05ms        ? ?/sec    1.00      4.9±0.02ms        ? ?/sec
physical_plan_tpch_q21                        1.41      9.5±0.03ms        ? ?/sec    1.00      6.8±0.06ms        ? ?/sec
physical_plan_tpch_q22                        1.29      4.7±0.03ms        ? ?/sec    1.00      3.6±0.03ms        ? ?/sec
physical_plan_tpch_q3                         1.27      4.2±0.03ms        ? ?/sec    1.00      3.3±0.02ms        ? ?/sec
physical_plan_tpch_q4                         1.39      3.4±0.02ms        ? ?/sec    1.00      2.4±0.02ms        ? ?/sec
physical_plan_tpch_q5                         1.27      6.1±0.06ms        ? ?/sec    1.00      4.8±0.03ms        ? ?/sec
physical_plan_tpch_q6                         1.17      2.1±0.01ms        ? ?/sec    1.00  1752.6±12.06µs        ? ?/sec
physical_plan_tpch_q7                         1.39      8.7±0.08ms        ? ?/sec    1.00      6.2±0.03ms        ? ?/sec
physical_plan_tpch_q8                         1.52     12.2±0.08ms        ? ?/sec    1.00      8.0±0.03ms        ? ?/sec
physical_plan_tpch_q9                         1.53      9.2±0.06ms        ? ?/sec    1.00      6.0±0.05ms        ? ?/sec
physical_select_all_from_1000                 7.42    683.8±1.12ms        ? ?/sec    1.00     92.2±0.49ms        ? ?/sec
physical_select_one_from_700                  1.12      4.2±0.02ms        ? ?/sec    1.00      3.7±0.04ms        ? ?/sec

I compared 37.0.0 (with the tpcds benchmark) on this branch: https://github.com/alamb/arrow-datafusion/tree/alamb/37_bench

Comparison


```shell
set -x -e
## This script tests planning speed of 37.0.0 against the speed on planning on main
git fetch -p apache
git fetch -p alamb

# remove old test runs
rm -rf target/criterion/

# use a version of 37 with the tpcds benchmarks
BRANCH_NAME="37.0.0"
git checkout alamb/37_bench
git reset --hard alamb/alamb/37_bench
cargo update

cargo bench --bench sql_planner -- --save-baseline ${BRANCH_NAME}

echo "** Comparing to main"
git checkout main
git reset --hard apache/main
cargo update
cargo bench --bench sql_planner -- --save-baseline main

critcmp main ${BRANCH_NAME}

@karlovnv
Copy link

@alamb Hi, amazing work have been done! It's became much more speedy.

But it seems that the complexity of algorithms is still O(n^2)
Here we have graph avg query execution time over the number of columns:

image

@alamb
Copy link
Contributor Author

alamb commented Apr 26, 2024

I agree there are still places that are N^2 in the number of columns.

With @haohuaijin 's great work in #9595 I think adding an index (perhaps computed on demand) to DFSchema might be more tractable to do without causing performance regressions for smaller column counts.

It would be great if someone wanted to give that a try

@matthewmturner
Copy link
Contributor

We recently updated to the latest Datafusion and we've seen our planning time go from ~20ms to ~10ms! Great job on this.

@alamb
Copy link
Contributor Author

alamb commented Apr 30, 2024

We recently updated to the latest Datafusion and we've seen our planning time go from ~20ms to ~10ms! Great job on this.

That is great to hear --- thanks for the report @matthewmturner

BTW I think there is still significant improvement to be had by completing #9637. I don't think we'll get it all done by 38.0.0 but I think we'll improve it some more

@alamb
Copy link
Contributor Author

alamb commented May 3, 2024

Current progress

group                                         37.0.0                                 main
-----                                         ------                                 ----
logical_aggregate_with_join                   1.07  1281.7±14.79µs        ? ?/sec    1.00  1198.9±14.54µs        ? ?/sec
logical_plan_tpcds_all                        1.09    172.4±1.92ms        ? ?/sec    1.00    157.5±1.68ms        ? ?/sec
logical_plan_tpch_all                         1.06     17.7±0.24ms        ? ?/sec    1.00     16.7±0.18ms        ? ?/sec
logical_select_all_from_1000                  5.17     96.2±0.48ms        ? ?/sec    1.00     18.6±0.14ms        ? ?/sec
logical_select_one_from_700                   1.00    739.2±8.71µs        ? ?/sec    1.09   809.4±35.23µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.06   797.4±34.31µs        ? ?/sec    1.00    750.7±8.05µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.02    752.9±6.78µs        ? ?/sec    1.00    738.4±9.07µs        ? ?/sec
physical_plan_tpcds_all                       1.65       2.2±0.01s        ? ?/sec    1.00   1340.8±8.03ms        ? ?/sec
physical_plan_tpch_all                        1.54    139.9±1.25ms        ? ?/sec    1.00     90.8±1.36ms        ? ?/sec
physical_plan_tpch_q1                         1.59      8.2±0.06ms        ? ?/sec    1.00      5.1±0.07ms        ? ?/sec
physical_plan_tpch_q10                        1.53      6.6±0.06ms        ? ?/sec    1.00      4.3±0.09ms        ? ?/sec
physical_plan_tpch_q11                        1.36      5.3±0.10ms        ? ?/sec    1.00      3.9±0.06ms        ? ?/sec
physical_plan_tpch_q12                        1.42      4.3±0.10ms        ? ?/sec    1.00      3.0±0.05ms        ? ?/sec
physical_plan_tpch_q13                        1.33      2.8±0.04ms        ? ?/sec    1.00      2.1±0.02ms        ? ?/sec
physical_plan_tpch_q14                        1.35      3.7±0.07ms        ? ?/sec    1.00      2.7±0.04ms        ? ?/sec
physical_plan_tpch_q16                        1.46      5.5±0.09ms        ? ?/sec    1.00      3.7±0.07ms        ? ?/sec
physical_plan_tpch_q17                        1.46      5.1±0.08ms        ? ?/sec    1.00      3.5±0.05ms        ? ?/sec
physical_plan_tpch_q18                        1.44      5.7±0.09ms        ? ?/sec    1.00      3.9±0.09ms        ? ?/sec
physical_plan_tpch_q19                        1.65     10.3±0.09ms        ? ?/sec    1.00      6.3±0.09ms        ? ?/sec
physical_plan_tpch_q2                         1.62     12.6±0.10ms        ? ?/sec    1.00      7.8±0.11ms        ? ?/sec
physical_plan_tpch_q20                        1.46      6.7±0.09ms        ? ?/sec    1.00      4.6±0.08ms        ? ?/sec
physical_plan_tpch_q21                        1.58      9.9±0.09ms        ? ?/sec    1.00      6.2±0.09ms        ? ?/sec
physical_plan_tpch_q22                        1.47      5.0±0.07ms        ? ?/sec    1.00      3.4±0.07ms        ? ?/sec
physical_plan_tpch_q3                         1.40      4.4±0.06ms        ? ?/sec    1.00      3.1±0.05ms        ? ?/sec
physical_plan_tpch_q4                         1.50      3.5±0.06ms        ? ?/sec    1.00      2.3±0.06ms        ? ?/sec
physical_plan_tpch_q5                         1.43      6.4±0.10ms        ? ?/sec    1.00      4.4±0.08ms        ? ?/sec
physical_plan_tpch_q6                         1.37      2.1±0.05ms        ? ?/sec    1.00  1572.0±16.95µs        ? ?/sec
physical_plan_tpch_q7                         1.60      8.9±0.11ms        ? ?/sec    1.00      5.6±0.09ms        ? ?/sec
physical_plan_tpch_q8                         1.72     12.5±0.09ms        ? ?/sec    1.00      7.3±0.11ms        ? ?/sec
physical_plan_tpch_q9                         1.71      9.5±0.10ms        ? ?/sec    1.00      5.6±0.10ms        ? ?/sec
physical_select_all_from_1000                 11.52   701.4±1.22ms        ? ?/sec    1.00     60.9±0.32ms        ? ?/sec
physical_select_one_from_700                  1.17      4.2±0.06ms        ? ?/sec    1.00      3.6±0.03ms        ? ?/sec

50% faster for tpcds and tpch planning

physical_plan_tpcds_all                       1.65       2.2±0.01s        ? ?/sec    1.00   1340.8±8.03ms        ? ?/sec
physical_plan_tpch_all                        1.54    139.9±1.25ms        ? ?/sec    1.00     90.8±1.36ms        ? ?/sec

Note I expect another 30-40% combined savings between #10356 and #10209 and #9873

@alamb
Copy link
Contributor Author

alamb commented May 15, 2024

Here is where we currently stand with planning performance compared to 37 and 38

Highlight: TPC-DS 76% faster planning, TPCH 64% faster

group                                         37.0.0                                 38.0.0                                 main
-----                                         ------                                 ------                                 ----
physical_plan_tpcds_all                       1.76       2.2±0.01s        ? ?/sec    1.06  1322.5±10.01ms        ? ?/sec    1.00  1253.2±10.31ms        ? ?/sec
physical_plan_tpch_all                        1.64    140.6±2.49ms        ? ?/sec    1.04     89.5±1.43ms        ? ?/sec    1.00     85.8±1.53ms        ? ?/sec

Highlight: SELECT * .. with 1000 columns is 11x faster

group                                         37.0.0                                 38.0.0                                 main
-----                                         ------                                 ------                                 ----
physical_select_all_from_1000                 11.37   689.9±1.24ms        ? ?/sec    1.00     60.7±0.29ms        ? ?/sec    1.01     61.4±0.34ms        ? ?/sec
physical_select_one_from_700                  1.17      4.1±0.04ms        ? ?/sec    1.00      3.5±0.03ms        ? ?/sec    1.01      3.6±0.06ms        ? ?/sec
++ critcmp main 38.0.0 37.0.0
group                                         37.0.0                                 38.0.0                                 main
-----                                         ------                                 ------                                 ----
logical_aggregate_with_join                   1.27  1275.1±14.29µs        ? ?/sec    1.22  1219.9±21.06µs        ? ?/sec    1.00  1003.7±14.68µs        ? ?/sec
logical_plan_tpcds_all                        1.14    171.6±2.25ms        ? ?/sec    1.05    157.8±1.74ms        ? ?/sec    1.00    150.9±1.77ms        ? ?/sec
logical_plan_tpch_all                         1.08     17.9±0.35ms        ? ?/sec    1.02     17.0±0.17ms        ? ?/sec    1.00     16.6±0.18ms        ? ?/sec
logical_select_all_from_1000                  5.05     94.5±0.86ms        ? ?/sec    1.00     18.7±0.10ms        ? ?/sec    1.01     18.9±0.16ms        ? ?/sec
logical_select_one_from_700                   1.00   750.1±11.67µs        ? ?/sec    1.08   810.9±32.07µs        ? ?/sec    1.08   811.4±16.40µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.05   793.2±10.89µs        ? ?/sec    1.01    762.2±8.23µs        ? ?/sec    1.00   756.0±12.71µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.03   762.6±12.18µs        ? ?/sec    1.00   741.5±10.89µs        ? ?/sec    1.00    740.0±7.52µs        ? ?/sec
physical_plan_tpcds_all                       1.76       2.2±0.01s        ? ?/sec    1.06  1322.5±10.01ms        ? ?/sec    1.00  1253.2±10.31ms        ? ?/sec
physical_plan_tpch_all                        1.64    140.6±2.49ms        ? ?/sec    1.04     89.5±1.43ms        ? ?/sec    1.00     85.8±1.53ms        ? ?/sec
physical_plan_tpch_q1                         1.81      8.2±0.08ms        ? ?/sec    1.10      5.0±0.13ms        ? ?/sec    1.00      4.5±0.06ms        ? ?/sec
physical_plan_tpch_q10                        1.65      6.7±0.09ms        ? ?/sec    1.07      4.3±0.06ms        ? ?/sec    1.00      4.0±0.05ms        ? ?/sec
physical_plan_tpch_q11                        1.49      5.3±0.11ms        ? ?/sec    1.07      3.8±0.08ms        ? ?/sec    1.00      3.5±0.07ms        ? ?/sec
physical_plan_tpch_q12                        1.59      4.3±0.07ms        ? ?/sec    1.12      3.0±0.05ms        ? ?/sec    1.00      2.7±0.06ms        ? ?/sec
physical_plan_tpch_q13                        1.39      2.8±0.04ms        ? ?/sec    1.02      2.1±0.03ms        ? ?/sec    1.00      2.0±0.04ms        ? ?/sec
physical_plan_tpch_q14                        1.51      3.6±0.06ms        ? ?/sec    1.12      2.7±0.08ms        ? ?/sec    1.00      2.4±0.04ms        ? ?/sec
physical_plan_tpch_q16                        1.63      5.5±0.07ms        ? ?/sec    1.09      3.7±0.13ms        ? ?/sec    1.00      3.4±0.05ms        ? ?/sec
physical_plan_tpch_q17                        1.57      5.1±0.07ms        ? ?/sec    1.07      3.5±0.10ms        ? ?/sec    1.00      3.2±0.05ms        ? ?/sec
physical_plan_tpch_q18                        1.56      5.7±0.10ms        ? ?/sec    1.06      3.9±0.12ms        ? ?/sec    1.00      3.7±0.06ms        ? ?/sec
physical_plan_tpch_q19                        1.93     10.6±0.09ms        ? ?/sec    1.11      6.1±0.14ms        ? ?/sec    1.00      5.5±0.10ms        ? ?/sec
physical_plan_tpch_q2                         1.71     12.8±0.07ms        ? ?/sec    1.03      7.7±0.15ms        ? ?/sec    1.00      7.5±0.12ms        ? ?/sec
physical_plan_tpch_q20                        1.61      6.9±0.13ms        ? ?/sec    1.02      4.4±0.07ms        ? ?/sec    1.00      4.3±0.12ms        ? ?/sec
physical_plan_tpch_q21                        1.69     10.0±0.16ms        ? ?/sec    1.03      6.1±0.12ms        ? ?/sec    1.00      5.9±0.09ms        ? ?/sec
physical_plan_tpch_q22                        1.56      4.9±0.13ms        ? ?/sec    1.04      3.3±0.06ms        ? ?/sec    1.00      3.1±0.05ms        ? ?/sec
physical_plan_tpch_q3                         1.49      4.4±0.10ms        ? ?/sec    1.07      3.2±0.11ms        ? ?/sec    1.00      3.0±0.04ms        ? ?/sec
physical_plan_tpch_q4                         1.58      3.5±0.05ms        ? ?/sec    1.05      2.3±0.07ms        ? ?/sec    1.00      2.2±0.03ms        ? ?/sec
physical_plan_tpch_q5                         1.49      6.3±0.10ms        ? ?/sec    1.05      4.4±0.09ms        ? ?/sec    1.00      4.2±0.07ms        ? ?/sec
physical_plan_tpch_q6                         1.48      2.1±0.05ms        ? ?/sec    1.08  1553.5±30.77µs        ? ?/sec    1.00  1435.2±12.73µs        ? ?/sec
physical_plan_tpch_q7                         1.71      9.1±0.08ms        ? ?/sec    1.05      5.6±0.08ms        ? ?/sec    1.00      5.3±0.07ms        ? ?/sec
physical_plan_tpch_q8                         1.84     12.7±0.17ms        ? ?/sec    1.05      7.3±0.12ms        ? ?/sec    1.00      6.9±0.11ms        ? ?/sec
physical_plan_tpch_q9                         1.86      9.7±0.08ms        ? ?/sec    1.04      5.4±0.10ms        ? ?/sec    1.00      5.2±0.08ms        ? ?/sec
physical_select_all_from_1000                 11.37   689.9±1.24ms        ? ?/sec    1.00     60.7±0.29ms        ? ?/sec    1.01     61.4±0.34ms        ? ?/sec
physical_select_one_from_700                  1.17      4.1±0.04ms        ? ?/sec    1.00      3.5±0.03ms        ? ?/sec    1.01      3.6±0.06ms        ? ?/sec

Test script

Details

set -x -e
## This script tests planning speed of 37.0.0 against the speed on planning on main
git fetch -p apache
git fetch -p alamb

# remove old test runs
rm -rf target/criterion/

# Compare version 38
git checkout 38.0.0
cargo update
cargo bench --bench sql_planner -- --save-baseline "38.0.0"

# use a version of 37 with the tpcds benchmarks
git checkout alamb/37_bench
git reset --hard alamb/alamb/37_bench
cargo update
cargo bench --bench sql_planner -- --save-baseline "37.0.0"


echo "** Comparing to main"
git checkout main
git reset --hard apache/main
cargo update
cargo bench --bench sql_planner -- --save-baseline main

critcmp main "38.0.0" "37.0.0"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants