You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was looking at the grouped aggregation benchmark earlier, as referenced in Parallel Grouped Aggregation in DuckDB, and found a fairly dramatic performance issue (which is so large I think it is likely to be a bug), just by using 64bit values (while keeping everything else about the benchmark the same).
Performance drops by a full order of magnitude (~10x) across the board.
# scale by a value that will force the underlying column dtypes to 64bitscaling_factor=5_000_000_000g1=np.repeat( np.arange( 0, int( n_groups/2 ) ), int( n_rows/ (n_groups/2) ) ) *scaling_factorg2=np.tile( np.arange( 0, 2 ), int( n_rows/2 ) ) *scaling_factord=np.random.randint( 0, n_rows, n_rows ) *scaling_factor
This changes nothing about how much data is generated, it just scales the test values to guarantee that 64bit column dtypes are applied. The result is an unexpected performance falloff.
Results
Characterizing the slowdown under 32bit/64bit regimes, with polars as a reference implementation:
rows
groups
dd:32
dd:64
dd:slowdown
pl:32
pl:64
pl:slowdown
10,000,000
1,000
0.0140
0.588
42.0x
0.032
0.088
2.8x
10,000,000
10,000
0.0520
0.664
12.8x
0.031
0.089
2.9x
10,000,000
100,000
0.0680
0.599
8.8x
0.038
0.098
2.6x
10,000,000
1,000,000
0.1280
1.481
11.6x
0.090
0.183
2.0x
10,000,000
10,000,000
0.1980
2.830
14.3x
0.228
0.307
1.3x
100,000,000
1,000
0.2130
4.145
19.5x
0.319
0.841
2.6x
100,000,000
100,000
0.5490
4.530
8.3x
0.415
0.978
2.4x
100,000,000
1,000,000
0.8060
7.525
9.3x
0.841
1.893
2.3x
100,000,000
10,000,000
1.5920
16.932
10.6x
1.609
2.984
1.9x
100,000,000
100,000,000
2.7520
31.819
11.6x
3.729
5.861
1.6x
Moving to 64bit...
DuckDB (dd in the table above) slows down by ~10x. this degree of drop-off is not proportional to the extra work being done
Polars (pl in the table above) slows down by ~2x. this is broadly what you'd expect, given that 64bit values occupy twice the memory of 32bit values
y-axis: time in seconds (lower is better)
x-axis: rows/groups bucket in the above table
OS:
macOS (12.4)
DuckDB Version:
0.4.0
DuckDB Client:
Python
Full Name:
Alexander Beedie
Have you tried this on the latest master branch?
I agree (no - tested on latest officially released package, v0.4.0)
Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
I agree
The text was updated successfully, but these errors were encountered:
alexander-beedie
changed the title
Huge performance drop-off (~10x) aggregating 64 bit int vs 32 bit int groups
Unexpected performance drop-off (~10x) aggregating 64bit int vs 32bit int groups
Jul 26, 2022
Thanks for the report! This data set seems to trigger a worst-case behavior in our hash function, causing all the data to be handled in the same hash partition. Modifying the hash function fixes the issue. This is likely because the data all consists of very large numbers that are evenly spaced out. We should definitely have a look at fixing this, although I doubt this is a common data distribution in practice.
We should definitely have a look at fixing this...
That would be great!
...although I doubt this is a common data distribution in practice.
Hopefully not; I might try some other workloads to see if this can be triggered in other data regimes, because when it does occur it seems to hit hard, heh. Thanks! :)
What happens?
I was looking at the grouped aggregation benchmark earlier, as referenced in Parallel Grouped Aggregation in DuckDB, and found a fairly dramatic performance issue (which is so large I think it is likely to be a bug), just by using 64bit values (while keeping everything else about the benchmark the same).
Performance drops by a full order of magnitude (~10x) across the board.
To Reproduce
Run the grouped aggregation benchmark with the following minor change to the
experiment()
function:This changes nothing about how much data is generated, it just scales the test values to guarantee that 64bit column dtypes are applied. The result is an unexpected performance falloff.
Results
Characterizing the slowdown under 32bit/64bit regimes, with
polars
as a reference implementation:Moving to 64bit...
DuckDB (
dd
in the table above) slows down by ~10x.this degree of drop-off is not proportional to the extra work being done
Polars (
pl
in the table above) slows down by ~2x.this is broadly what you'd expect, given that 64bit values occupy twice the memory of 32bit values
OS:
macOS (12.4)
DuckDB Version:
0.4.0
DuckDB Client:
Python
Full Name:
Alexander Beedie
Have you tried this on the latest
master
branch?Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
The text was updated successfully, but these errors were encountered: