perf: skip hash aggregation multi group by for already not found #19107

rluvaton · 2025-12-05T12:08:01Z

Which issue does this PR close?

N/A

Rationale for this change

make aggregation faster.

currently even if we reached that nothing equal to, we will continue iterating over the arrays

What changes are included in this PR?

track the number of equal to and when reach 0, break

Are these changes tested?

existing tests

Are there any user-facing changes?

nope, even though the group by impl struct are public they made public only for accessing in benchmarks

rluvaton · 2025-12-05T12:08:22Z

run benchmarks

rluvaton · 2025-12-05T12:08:53Z

run benchmarks aggregate_query_sql

alamb-ghbot · 2025-12-05T12:08:58Z

🤖 Hi @rluvaton, thanks for the request (#19107 (comment)).

scrape_comments.py only supports whitelisted benchmarks.

Standard: clickbench_1, clickbench_extended, clickbench_partitioned, clickbench_pushdown, tpch, tpch10, tpch_mem, tpch_mem10
Criterion: case_when, sql_planner

Please choose one or more of these with run benchmark <name> or run benchmark <name1> <name2>...
Unsupported benchmarks: aggregate_query_sql.

rluvaton · 2025-12-05T12:41:57Z

Show benchmark queue

alamb-ghbot · 2025-12-05T12:42:33Z

🤖 Hi @rluvaton, you asked to view the benchmark queue (#19107 (comment)).

Job	User	Benchmarks	Comment
`19103_3616616845.sh`	rluvaton	default	`https://github.com/apache/datafusion/pull/19103#issuecomment-3616616845`
`19107_3616626757.sh`	rluvaton	default	`https://github.com/apache/datafusion/pull/19107#issuecomment-3616626757`

alamb-ghbot · 2025-12-05T12:45:18Z

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing skip-already-unique (0ef2853) to a5fc3c7 diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

alamb-ghbot · 2025-12-05T13:22:36Z

🤖: Benchmark completed

Details

Comparing HEAD and skip-already-unique
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ skip-already-unique ┃    Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 0     │  2686.57 ms │          2748.56 ms │ no change │
│ QQuery 1     │  1314.69 ms │          1252.14 ms │ no change │
│ QQuery 2     │  2463.34 ms │          2424.01 ms │ no change │
│ QQuery 3     │  1130.12 ms │          1143.24 ms │ no change │
│ QQuery 4     │  2274.44 ms │          2247.24 ms │ no change │
│ QQuery 5     │ 28358.31 ms │         28535.43 ms │ no change │
│ QQuery 6     │  3991.36 ms │          3986.79 ms │ no change │
│ QQuery 7     │  3608.81 ms │          3785.56 ms │ no change │
└──────────────┴─────────────┴─────────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                  ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                  │ 45827.64ms │
│ Total Time (skip-already-unique)   │ 46122.96ms │
│ Average Time (HEAD)                │  5728.46ms │
│ Average Time (skip-already-unique) │  5765.37ms │
│ Queries Faster                     │          0 │
│ Queries Slower                     │          0 │
│ Queries with No Change             │          8 │
│ Queries with Failure               │          0 │
└────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ skip-already-unique ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.17 ms │             2.64 ms │  1.21x slower │
│ QQuery 1     │    50.24 ms │            48.13 ms │     no change │
│ QQuery 2     │   133.28 ms │           133.25 ms │     no change │
│ QQuery 3     │   152.68 ms │           152.44 ms │     no change │
│ QQuery 4     │  1096.32 ms │          1098.68 ms │     no change │
│ QQuery 5     │  1474.88 ms │          1486.36 ms │     no change │
│ QQuery 6     │     2.17 ms │             2.22 ms │     no change │
│ QQuery 7     │    53.61 ms │            54.79 ms │     no change │
│ QQuery 8     │  1408.88 ms │          1429.18 ms │     no change │
│ QQuery 9     │  1846.91 ms │          1858.01 ms │     no change │
│ QQuery 10    │   357.30 ms │           355.27 ms │     no change │
│ QQuery 11    │   403.68 ms │           408.44 ms │     no change │
│ QQuery 12    │  1321.98 ms │          1357.68 ms │     no change │
│ QQuery 13    │  2001.58 ms │          2039.37 ms │     no change │
│ QQuery 14    │  1256.70 ms │          1256.50 ms │     no change │
│ QQuery 15    │  1238.02 ms │          1215.86 ms │     no change │
│ QQuery 16    │  2659.69 ms │          2690.42 ms │     no change │
│ QQuery 17    │  2619.53 ms │          2683.48 ms │     no change │
│ QQuery 18    │  5556.84 ms │          4923.72 ms │ +1.13x faster │
│ QQuery 19    │   119.95 ms │           119.58 ms │     no change │
│ QQuery 20    │  1903.48 ms │          1901.28 ms │     no change │
│ QQuery 21    │  2204.19 ms │          2159.40 ms │     no change │
│ QQuery 22    │  3895.84 ms │          3741.36 ms │     no change │
│ QQuery 23    │ 13040.27 ms │         12247.72 ms │ +1.06x faster │
│ QQuery 24    │   209.33 ms │           198.21 ms │ +1.06x faster │
│ QQuery 25    │   460.07 ms │           463.91 ms │     no change │
│ QQuery 26    │   209.43 ms │           202.02 ms │     no change │
│ QQuery 27    │  2836.46 ms │          2719.01 ms │     no change │
│ QQuery 28    │ 23739.59 ms │         23306.49 ms │     no change │
│ QQuery 29    │   959.54 ms │           950.32 ms │     no change │
│ QQuery 30    │  1333.59 ms │          1300.57 ms │     no change │
│ QQuery 31    │  1360.94 ms │          1312.21 ms │     no change │
│ QQuery 32    │  5040.35 ms │          5167.73 ms │     no change │
│ QQuery 33    │  5873.71 ms │          6006.40 ms │     no change │
│ QQuery 34    │  6002.89 ms │          5922.17 ms │     no change │
│ QQuery 35    │  1909.84 ms │          1902.67 ms │     no change │
│ QQuery 36    │   117.21 ms │           117.15 ms │     no change │
│ QQuery 37    │    53.17 ms │            51.66 ms │     no change │
│ QQuery 38    │   115.44 ms │           116.58 ms │     no change │
│ QQuery 39    │   191.66 ms │           192.02 ms │     no change │
│ QQuery 40    │    41.21 ms │            40.09 ms │     no change │
│ QQuery 41    │    39.06 ms │            37.69 ms │     no change │
│ QQuery 42    │    31.91 ms │            32.39 ms │     no change │
└──────────────┴─────────────┴─────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                  ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                  │ 95325.59ms │
│ Total Time (skip-already-unique)   │ 93405.04ms │
│ Average Time (HEAD)                │  2216.87ms │
│ Average Time (skip-already-unique) │  2172.21ms │
│ Queries Faster                     │          3 │
│ Queries Slower                     │          1 │
│ Queries with No Change             │         39 │
│ Queries with Failure               │          0 │
└────────────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ skip-already-unique ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 141.17 ms │           138.46 ms │     no change │
│ QQuery 2     │  29.19 ms │            27.70 ms │ +1.05x faster │
│ QQuery 3     │  39.43 ms │            39.35 ms │     no change │
│ QQuery 4     │  30.92 ms │            28.43 ms │ +1.09x faster │
│ QQuery 5     │  88.89 ms │            88.18 ms │     no change │
│ QQuery 6     │  19.68 ms │            19.63 ms │     no change │
│ QQuery 7     │ 233.90 ms │           236.16 ms │     no change │
│ QQuery 8     │  33.64 ms │            34.16 ms │     no change │
│ QQuery 9     │ 102.79 ms │           108.46 ms │  1.06x slower │
│ QQuery 10    │  65.75 ms │            63.72 ms │     no change │
│ QQuery 11    │  17.28 ms │            17.02 ms │     no change │
│ QQuery 12    │  50.93 ms │            51.62 ms │     no change │
│ QQuery 13    │  46.72 ms │            46.69 ms │     no change │
│ QQuery 14    │  13.70 ms │            13.82 ms │     no change │
│ QQuery 15    │  24.30 ms │            24.72 ms │     no change │
│ QQuery 16    │  24.76 ms │            24.80 ms │     no change │
│ QQuery 17    │ 151.86 ms │           152.82 ms │     no change │
│ QQuery 18    │ 278.93 ms │           280.13 ms │     no change │
│ QQuery 19    │  38.18 ms │            37.51 ms │     no change │
│ QQuery 20    │  52.01 ms │            49.31 ms │ +1.05x faster │
│ QQuery 21    │ 339.39 ms │           336.99 ms │     no change │
│ QQuery 22    │  17.76 ms │            17.60 ms │     no change │
└──────────────┴───────────┴─────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                  ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                  │ 1841.20ms │
│ Total Time (skip-already-unique)   │ 1837.29ms │
│ Average Time (HEAD)                │   83.69ms │
│ Average Time (skip-already-unique) │   83.51ms │
│ Queries Faster                     │         3 │
│ Queries Slower                     │         1 │
│ Queries with No Change             │        18 │
│ Queries with Failure               │         0 │
└────────────────────────────────────┴───────────┘

rluvaton · 2025-12-05T13:26:09Z

@alamb, can you run aggregate_query_sql this will show better results, as this improvement will only benefit high cardinality data that earlier columns will filter all of them

rluvaton · 2025-12-07T11:23:27Z

run benchmark aggregate_query_sql

alamb-ghbot · 2025-12-07T11:23:35Z

🤖 ./gh_compare_branch_bench.sh Benchmark Script Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing skip-already-unique (59fe21c) to 6746007 diff
BENCH_NAME=aggregate_query_sql
BENCH_COMMAND=cargo bench --bench aggregate_query_sql
BENCH_FILTER=
BENCH_BRANCH_NAME=skip-already-unique
Results will be posted here when complete

alamb-ghbot · 2025-12-07T11:54:35Z

🤖: Benchmark completed

Details

group                                                                         main                                    skip-already-unique
-----                                                                         ----                                    -------------------
aggregate_query_approx_percentile_cont_on_f32                                 1.00      3.8±0.21ms        ? ?/sec     1.01      3.9±0.23ms        ? ?/sec
aggregate_query_approx_percentile_cont_on_u64                                 1.00      4.1±0.13ms        ? ?/sec     1.02      4.2±0.22ms        ? ?/sec
aggregate_query_distinct_median                                               1.00      2.8±0.07ms        ? ?/sec     1.02      2.8±0.15ms        ? ?/sec
aggregate_query_group_by                                                      1.00  1535.1±40.21µs        ? ?/sec     1.01  1551.5±53.74µs        ? ?/sec
aggregate_query_group_by_u64 15 12                                            1.00  1442.3±34.46µs        ? ?/sec     1.03  1478.7±37.56µs        ? ?/sec
aggregate_query_group_by_u64_multiple_keys                                    1.00      4.1±0.28ms        ? ?/sec     1.01      4.1±0.30ms        ? ?/sec
aggregate_query_group_by_wide_u64_and_f32_without_aggregate_expressions       1.01      2.2±0.14ms        ? ?/sec     1.00      2.1±0.10ms        ? ?/sec
aggregate_query_group_by_wide_u64_and_string_without_aggregate_expressions    1.00      2.6±0.18ms        ? ?/sec     1.00      2.6±0.18ms        ? ?/sec
aggregate_query_group_by_with_filter                                          1.01  1448.6±36.38µs        ? ?/sec     1.00  1437.4±17.21µs        ? ?/sec
aggregate_query_group_by_with_filter_u64 15 12                                1.00  1421.1±24.12µs        ? ?/sec     1.00  1415.9±16.11µs        ? ?/sec
aggregate_query_no_group_by 15 12                                             1.00   726.7±17.00µs        ? ?/sec     1.00   729.3±19.14µs        ? ?/sec
aggregate_query_no_group_by_count_distinct_narrow                             1.00  1176.5±19.16µs        ? ?/sec     1.01  1185.2±18.34µs        ? ?/sec
aggregate_query_no_group_by_count_distinct_wide                               1.00  1997.2±114.74µs        ? ?/sec    1.00      2.0±0.12ms        ? ?/sec
aggregate_query_no_group_by_min_max_f64                                       1.00   681.8±10.61µs        ? ?/sec     1.00    682.8±8.94µs        ? ?/sec
first_last_ignore_nulls                                                       1.00      2.2±0.06ms        ? ?/sec     1.01      2.2±0.11ms        ? ?/sec
first_last_many_columns                                                       1.00      2.2±0.10ms        ? ?/sec     1.00      2.2±0.09ms        ? ?/sec
first_last_one_column                                                         1.00  1847.2±67.11µs        ? ?/sec     1.01  1863.1±63.65µs        ? ?/sec

rluvaton · 2025-12-07T12:07:21Z

This shows no improvements but currently, the high cardinality benchmarks are doing group by on 2 columns only so only the second column will be skipped, we need a lot more columns for it to be beneficial

alamb-ghbot · 2025-12-07T12:25:53Z

🤖 ./gh_compare_branch_bench.sh Benchmark Script Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing skip-already-unique (59fe21c) to 6746007 diff
BENCH_NAME=aggregate_query_sql
BENCH_COMMAND=cargo bench --bench aggregate_query_sql
BENCH_FILTER=
BENCH_BRANCH_NAME=skip-already-unique
Results will be posted here when complete

alamb-ghbot · 2025-12-07T12:57:01Z

🤖: Benchmark completed

Details

group                                                                         main                                    skip-already-unique
-----                                                                         ----                                    -------------------
aggregate_query_approx_percentile_cont_on_f32                                 1.00      3.8±0.17ms        ? ?/sec     1.01      3.8±0.21ms        ? ?/sec
aggregate_query_approx_percentile_cont_on_u64                                 1.00      4.2±0.21ms        ? ?/sec     1.00      4.2±0.24ms        ? ?/sec
aggregate_query_distinct_median                                               1.00      2.7±0.04ms        ? ?/sec     1.00      2.7±0.07ms        ? ?/sec
aggregate_query_group_by                                                      1.00  1534.5±27.39µs        ? ?/sec     1.01  1543.4±36.34µs        ? ?/sec
aggregate_query_group_by_u64 15 12                                            1.00  1462.9±37.74µs        ? ?/sec     1.00  1456.3±33.35µs        ? ?/sec
aggregate_query_group_by_u64_multiple_keys                                    1.00      4.1±0.23ms        ? ?/sec     1.01      4.1±0.26ms        ? ?/sec
aggregate_query_group_by_wide_u64_and_f32_without_aggregate_expressions       1.00      2.2±0.11ms        ? ?/sec     1.00      2.1±0.11ms        ? ?/sec
aggregate_query_group_by_wide_u64_and_string_without_aggregate_expressions    1.00      2.6±0.15ms        ? ?/sec     1.01      2.6±0.16ms        ? ?/sec
aggregate_query_group_by_with_filter                                          1.00  1434.5±21.58µs        ? ?/sec     1.00  1440.2±23.87µs        ? ?/sec
aggregate_query_group_by_with_filter_u64 15 12                                1.00  1410.9±22.14µs        ? ?/sec     1.01  1423.7±17.05µs        ? ?/sec
aggregate_query_no_group_by 15 12                                             1.01   731.2±18.84µs        ? ?/sec     1.00   725.9±12.59µs        ? ?/sec
aggregate_query_no_group_by_count_distinct_narrow                             1.00  1178.9±18.59µs        ? ?/sec     1.01  1185.6±22.57µs        ? ?/sec
aggregate_query_no_group_by_count_distinct_wide                               1.00  1981.7±105.51µs        ? ?/sec    1.01  1997.5±106.49µs        ? ?/sec
aggregate_query_no_group_by_min_max_f64                                       1.00    680.0±9.49µs        ? ?/sec     1.06   718.8±35.89µs        ? ?/sec
first_last_ignore_nulls                                                       1.00      2.2±0.11ms        ? ?/sec     1.01      2.2±0.10ms        ? ?/sec
first_last_many_columns                                                       1.00      2.2±0.07ms        ? ?/sec     1.00      2.2±0.09ms        ? ?/sec
first_last_one_column                                                         1.00  1850.5±59.71µs        ? ?/sec     1.00  1855.8±61.79µs        ? ?/sec

perf: skip hash aggregation multi group by for already not found

0ef2853

github-actions bot added the physical-plan Changes to the physical-plan crate label Dec 5, 2025

rluvaton added the performance Make DataFusion faster label Dec 5, 2025

clippy

ed5254b

This comment was marked as resolved.

Sign in to view

Merge branch 'main' into skip-already-unique

59fe21c

This comment was marked as resolved.

Sign in to view

perf: skip hash aggregation multi group by for already not found #19107

Are you sure you want to change the base?

perf: skip hash aggregation multi group by for already not found #19107

Conversation

rluvaton commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

rluvaton commented Dec 5, 2025

Uh oh!

rluvaton commented Dec 5, 2025

Uh oh!

alamb-ghbot commented Dec 5, 2025

Uh oh!

rluvaton commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb-ghbot commented Dec 5, 2025

Uh oh!

alamb-ghbot commented Dec 5, 2025

Uh oh!

alamb-ghbot commented Dec 5, 2025

Uh oh!

rluvaton commented Dec 5, 2025

Uh oh!

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

rluvaton commented Dec 7, 2025

Uh oh!

alamb-ghbot commented Dec 7, 2025

Uh oh!

alamb-ghbot commented Dec 7, 2025

Uh oh!

rluvaton commented Dec 7, 2025

Uh oh!

alamb-ghbot commented Dec 7, 2025

Uh oh!

alamb-ghbot commented Dec 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rluvaton commented Dec 5, 2025 •

edited

Loading

rluvaton commented Dec 5, 2025 •

edited

Loading