Skip to content

Conversation

@rluvaton
Copy link
Member

@rluvaton rluvaton commented Dec 5, 2025

Which issue does this PR close?

N/A

Rationale for this change

make aggregation faster.

currently even if we reached that nothing equal to, we will continue iterating over the arrays

What changes are included in this PR?

track the number of equal to and when reach 0, break

Are these changes tested?

existing tests

Are there any user-facing changes?

nope, even though the group by impl struct are public they made public only for accessing in benchmarks

@github-actions github-actions bot added the physical-plan Changes to the physical-plan crate label Dec 5, 2025
@rluvaton
Copy link
Member Author

rluvaton commented Dec 5, 2025

run benchmarks

@rluvaton
Copy link
Member Author

rluvaton commented Dec 5, 2025

run benchmarks aggregate_query_sql

@alamb-ghbot
Copy link

🤖 Hi @rluvaton, thanks for the request (#19107 (comment)).

scrape_comments.py only supports whitelisted benchmarks.

  • Standard: clickbench_1, clickbench_extended, clickbench_partitioned, clickbench_pushdown, tpch, tpch10, tpch_mem, tpch_mem10
  • Criterion: case_when, sql_planner

Please choose one or more of these with run benchmark <name> or run benchmark <name1> <name2>...
Unsupported benchmarks: aggregate_query_sql.

@rluvaton rluvaton added the performance Make DataFusion faster label Dec 5, 2025
@rluvaton
Copy link
Member Author

rluvaton commented Dec 5, 2025

Show benchmark queue

@alamb-ghbot
Copy link

🤖 Hi @rluvaton, you asked to view the benchmark queue (#19107 (comment)).

Job User Benchmarks Comment
19103_3616616845.sh rluvaton default https://github.com/apache/datafusion/pull/19103#issuecomment-3616616845
19107_3616626757.sh rluvaton default https://github.com/apache/datafusion/pull/19107#issuecomment-3616626757

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing skip-already-unique (0ef2853) to a5fc3c7 diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and skip-already-unique
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ skip-already-unique ┃    Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 0     │  2686.57 ms │          2748.56 ms │ no change │
│ QQuery 1     │  1314.69 ms │          1252.14 ms │ no change │
│ QQuery 2     │  2463.34 ms │          2424.01 ms │ no change │
│ QQuery 3     │  1130.12 ms │          1143.24 ms │ no change │
│ QQuery 4     │  2274.44 ms │          2247.24 ms │ no change │
│ QQuery 5     │ 28358.31 ms │         28535.43 ms │ no change │
│ QQuery 6     │  3991.36 ms │          3986.79 ms │ no change │
│ QQuery 7     │  3608.81 ms │          3785.56 ms │ no change │
└──────────────┴─────────────┴─────────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                  ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                  │ 45827.64ms │
│ Total Time (skip-already-unique)   │ 46122.96ms │
│ Average Time (HEAD)                │  5728.46ms │
│ Average Time (skip-already-unique) │  5765.37ms │
│ Queries Faster                     │          0 │
│ Queries Slower                     │          0 │
│ Queries with No Change             │          8 │
│ Queries with Failure               │          0 │
└────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ skip-already-unique ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.17 ms │             2.64 ms │  1.21x slower │
│ QQuery 1     │    50.24 ms │            48.13 ms │     no change │
│ QQuery 2     │   133.28 ms │           133.25 ms │     no change │
│ QQuery 3     │   152.68 ms │           152.44 ms │     no change │
│ QQuery 4     │  1096.32 ms │          1098.68 ms │     no change │
│ QQuery 5     │  1474.88 ms │          1486.36 ms │     no change │
│ QQuery 6     │     2.17 ms │             2.22 ms │     no change │
│ QQuery 7     │    53.61 ms │            54.79 ms │     no change │
│ QQuery 8     │  1408.88 ms │          1429.18 ms │     no change │
│ QQuery 9     │  1846.91 ms │          1858.01 ms │     no change │
│ QQuery 10    │   357.30 ms │           355.27 ms │     no change │
│ QQuery 11    │   403.68 ms │           408.44 ms │     no change │
│ QQuery 12    │  1321.98 ms │          1357.68 ms │     no change │
│ QQuery 13    │  2001.58 ms │          2039.37 ms │     no change │
│ QQuery 14    │  1256.70 ms │          1256.50 ms │     no change │
│ QQuery 15    │  1238.02 ms │          1215.86 ms │     no change │
│ QQuery 16    │  2659.69 ms │          2690.42 ms │     no change │
│ QQuery 17    │  2619.53 ms │          2683.48 ms │     no change │
│ QQuery 18    │  5556.84 ms │          4923.72 ms │ +1.13x faster │
│ QQuery 19    │   119.95 ms │           119.58 ms │     no change │
│ QQuery 20    │  1903.48 ms │          1901.28 ms │     no change │
│ QQuery 21    │  2204.19 ms │          2159.40 ms │     no change │
│ QQuery 22    │  3895.84 ms │          3741.36 ms │     no change │
│ QQuery 23    │ 13040.27 ms │         12247.72 ms │ +1.06x faster │
│ QQuery 24    │   209.33 ms │           198.21 ms │ +1.06x faster │
│ QQuery 25    │   460.07 ms │           463.91 ms │     no change │
│ QQuery 26    │   209.43 ms │           202.02 ms │     no change │
│ QQuery 27    │  2836.46 ms │          2719.01 ms │     no change │
│ QQuery 28    │ 23739.59 ms │         23306.49 ms │     no change │
│ QQuery 29    │   959.54 ms │           950.32 ms │     no change │
│ QQuery 30    │  1333.59 ms │          1300.57 ms │     no change │
│ QQuery 31    │  1360.94 ms │          1312.21 ms │     no change │
│ QQuery 32    │  5040.35 ms │          5167.73 ms │     no change │
│ QQuery 33    │  5873.71 ms │          6006.40 ms │     no change │
│ QQuery 34    │  6002.89 ms │          5922.17 ms │     no change │
│ QQuery 35    │  1909.84 ms │          1902.67 ms │     no change │
│ QQuery 36    │   117.21 ms │           117.15 ms │     no change │
│ QQuery 37    │    53.17 ms │            51.66 ms │     no change │
│ QQuery 38    │   115.44 ms │           116.58 ms │     no change │
│ QQuery 39    │   191.66 ms │           192.02 ms │     no change │
│ QQuery 40    │    41.21 ms │            40.09 ms │     no change │
│ QQuery 41    │    39.06 ms │            37.69 ms │     no change │
│ QQuery 42    │    31.91 ms │            32.39 ms │     no change │
└──────────────┴─────────────┴─────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                  ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                  │ 95325.59ms │
│ Total Time (skip-already-unique)   │ 93405.04ms │
│ Average Time (HEAD)                │  2216.87ms │
│ Average Time (skip-already-unique) │  2172.21ms │
│ Queries Faster                     │          3 │
│ Queries Slower                     │          1 │
│ Queries with No Change             │         39 │
│ Queries with Failure               │          0 │
└────────────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ skip-already-unique ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 141.17 ms │           138.46 ms │     no change │
│ QQuery 2     │  29.19 ms │            27.70 ms │ +1.05x faster │
│ QQuery 3     │  39.43 ms │            39.35 ms │     no change │
│ QQuery 4     │  30.92 ms │            28.43 ms │ +1.09x faster │
│ QQuery 5     │  88.89 ms │            88.18 ms │     no change │
│ QQuery 6     │  19.68 ms │            19.63 ms │     no change │
│ QQuery 7     │ 233.90 ms │           236.16 ms │     no change │
│ QQuery 8     │  33.64 ms │            34.16 ms │     no change │
│ QQuery 9     │ 102.79 ms │           108.46 ms │  1.06x slower │
│ QQuery 10    │  65.75 ms │            63.72 ms │     no change │
│ QQuery 11    │  17.28 ms │            17.02 ms │     no change │
│ QQuery 12    │  50.93 ms │            51.62 ms │     no change │
│ QQuery 13    │  46.72 ms │            46.69 ms │     no change │
│ QQuery 14    │  13.70 ms │            13.82 ms │     no change │
│ QQuery 15    │  24.30 ms │            24.72 ms │     no change │
│ QQuery 16    │  24.76 ms │            24.80 ms │     no change │
│ QQuery 17    │ 151.86 ms │           152.82 ms │     no change │
│ QQuery 18    │ 278.93 ms │           280.13 ms │     no change │
│ QQuery 19    │  38.18 ms │            37.51 ms │     no change │
│ QQuery 20    │  52.01 ms │            49.31 ms │ +1.05x faster │
│ QQuery 21    │ 339.39 ms │           336.99 ms │     no change │
│ QQuery 22    │  17.76 ms │            17.60 ms │     no change │
└──────────────┴───────────┴─────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                  ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                  │ 1841.20ms │
│ Total Time (skip-already-unique)   │ 1837.29ms │
│ Average Time (HEAD)                │   83.69ms │
│ Average Time (skip-already-unique) │   83.51ms │
│ Queries Faster                     │         3 │
│ Queries Slower                     │         1 │
│ Queries with No Change             │        18 │
│ Queries with Failure               │         0 │
└────────────────────────────────────┴───────────┘

@rluvaton
Copy link
Member Author

rluvaton commented Dec 5, 2025

@alamb, can you run aggregate_query_sql this will show better results, as this improvement will only benefit high cardinality data that earlier columns will filter all of them

@rluvaton

This comment was marked as resolved.

@rluvaton

This comment was marked as resolved.

@alamb-ghbot

This comment was marked as resolved.

@rluvaton
Copy link
Member Author

rluvaton commented Dec 7, 2025

run benchmark aggregate_query_sql

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch_bench.sh Benchmark Script Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing skip-already-unique (59fe21c) to 6746007 diff
BENCH_NAME=aggregate_query_sql
BENCH_COMMAND=cargo bench --bench aggregate_query_sql
BENCH_FILTER=
BENCH_BRANCH_NAME=skip-already-unique
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                                                         main                                    skip-already-unique
-----                                                                         ----                                    -------------------
aggregate_query_approx_percentile_cont_on_f32                                 1.00      3.8±0.21ms        ? ?/sec     1.01      3.9±0.23ms        ? ?/sec
aggregate_query_approx_percentile_cont_on_u64                                 1.00      4.1±0.13ms        ? ?/sec     1.02      4.2±0.22ms        ? ?/sec
aggregate_query_distinct_median                                               1.00      2.8±0.07ms        ? ?/sec     1.02      2.8±0.15ms        ? ?/sec
aggregate_query_group_by                                                      1.00  1535.1±40.21µs        ? ?/sec     1.01  1551.5±53.74µs        ? ?/sec
aggregate_query_group_by_u64 15 12                                            1.00  1442.3±34.46µs        ? ?/sec     1.03  1478.7±37.56µs        ? ?/sec
aggregate_query_group_by_u64_multiple_keys                                    1.00      4.1±0.28ms        ? ?/sec     1.01      4.1±0.30ms        ? ?/sec
aggregate_query_group_by_wide_u64_and_f32_without_aggregate_expressions       1.01      2.2±0.14ms        ? ?/sec     1.00      2.1±0.10ms        ? ?/sec
aggregate_query_group_by_wide_u64_and_string_without_aggregate_expressions    1.00      2.6±0.18ms        ? ?/sec     1.00      2.6±0.18ms        ? ?/sec
aggregate_query_group_by_with_filter                                          1.01  1448.6±36.38µs        ? ?/sec     1.00  1437.4±17.21µs        ? ?/sec
aggregate_query_group_by_with_filter_u64 15 12                                1.00  1421.1±24.12µs        ? ?/sec     1.00  1415.9±16.11µs        ? ?/sec
aggregate_query_no_group_by 15 12                                             1.00   726.7±17.00µs        ? ?/sec     1.00   729.3±19.14µs        ? ?/sec
aggregate_query_no_group_by_count_distinct_narrow                             1.00  1176.5±19.16µs        ? ?/sec     1.01  1185.2±18.34µs        ? ?/sec
aggregate_query_no_group_by_count_distinct_wide                               1.00  1997.2±114.74µs        ? ?/sec    1.00      2.0±0.12ms        ? ?/sec
aggregate_query_no_group_by_min_max_f64                                       1.00   681.8±10.61µs        ? ?/sec     1.00    682.8±8.94µs        ? ?/sec
first_last_ignore_nulls                                                       1.00      2.2±0.06ms        ? ?/sec     1.01      2.2±0.11ms        ? ?/sec
first_last_many_columns                                                       1.00      2.2±0.10ms        ? ?/sec     1.00      2.2±0.09ms        ? ?/sec
first_last_one_column                                                         1.00  1847.2±67.11µs        ? ?/sec     1.01  1863.1±63.65µs        ? ?/sec

@rluvaton
Copy link
Member Author

rluvaton commented Dec 7, 2025

This shows no improvements but currently, the high cardinality benchmarks are doing group by on 2 columns only so only the second column will be skipped, we need a lot more columns for it to be beneficial

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch_bench.sh Benchmark Script Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing skip-already-unique (59fe21c) to 6746007 diff
BENCH_NAME=aggregate_query_sql
BENCH_COMMAND=cargo bench --bench aggregate_query_sql
BENCH_FILTER=
BENCH_BRANCH_NAME=skip-already-unique
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                                                         main                                    skip-already-unique
-----                                                                         ----                                    -------------------
aggregate_query_approx_percentile_cont_on_f32                                 1.00      3.8±0.17ms        ? ?/sec     1.01      3.8±0.21ms        ? ?/sec
aggregate_query_approx_percentile_cont_on_u64                                 1.00      4.2±0.21ms        ? ?/sec     1.00      4.2±0.24ms        ? ?/sec
aggregate_query_distinct_median                                               1.00      2.7±0.04ms        ? ?/sec     1.00      2.7±0.07ms        ? ?/sec
aggregate_query_group_by                                                      1.00  1534.5±27.39µs        ? ?/sec     1.01  1543.4±36.34µs        ? ?/sec
aggregate_query_group_by_u64 15 12                                            1.00  1462.9±37.74µs        ? ?/sec     1.00  1456.3±33.35µs        ? ?/sec
aggregate_query_group_by_u64_multiple_keys                                    1.00      4.1±0.23ms        ? ?/sec     1.01      4.1±0.26ms        ? ?/sec
aggregate_query_group_by_wide_u64_and_f32_without_aggregate_expressions       1.00      2.2±0.11ms        ? ?/sec     1.00      2.1±0.11ms        ? ?/sec
aggregate_query_group_by_wide_u64_and_string_without_aggregate_expressions    1.00      2.6±0.15ms        ? ?/sec     1.01      2.6±0.16ms        ? ?/sec
aggregate_query_group_by_with_filter                                          1.00  1434.5±21.58µs        ? ?/sec     1.00  1440.2±23.87µs        ? ?/sec
aggregate_query_group_by_with_filter_u64 15 12                                1.00  1410.9±22.14µs        ? ?/sec     1.01  1423.7±17.05µs        ? ?/sec
aggregate_query_no_group_by 15 12                                             1.01   731.2±18.84µs        ? ?/sec     1.00   725.9±12.59µs        ? ?/sec
aggregate_query_no_group_by_count_distinct_narrow                             1.00  1178.9±18.59µs        ? ?/sec     1.01  1185.6±22.57µs        ? ?/sec
aggregate_query_no_group_by_count_distinct_wide                               1.00  1981.7±105.51µs        ? ?/sec    1.01  1997.5±106.49µs        ? ?/sec
aggregate_query_no_group_by_min_max_f64                                       1.00    680.0±9.49µs        ? ?/sec     1.06   718.8±35.89µs        ? ?/sec
first_last_ignore_nulls                                                       1.00      2.2±0.11ms        ? ?/sec     1.01      2.2±0.10ms        ? ?/sec
first_last_many_columns                                                       1.00      2.2±0.07ms        ? ?/sec     1.00      2.2±0.09ms        ? ?/sec
first_last_one_column                                                         1.00  1850.5±59.71µs        ? ?/sec     1.00  1855.8±61.79µs        ? ?/sec

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Make DataFusion faster physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants