Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird result in corner case with external aggregation #57814

Open
CurtizJ opened this issue Dec 13, 2023 · 1 comment
Open

Weird result in corner case with external aggregation #57814

CurtizJ opened this issue Dec 13, 2023 · 1 comment
Labels
potential bug To be reviewed by developers and confirmed/rejected.

Comments

@CurtizJ
Copy link
Member

CurtizJ commented Dec 13, 2023

Consider the following sql script:

drop table if exists data;
drop table if exists dist;

set max_bytes_before_external_group_by = 1;
set group_by_two_level_threshold_bytes = 1;

set max_untracked_memory = '1Mi';
set memory_profiler_step = '1Mi';

create table data (key String) Engine=Memory;
create table dist (key LowCardinality(String)) engine=Distributed(test_cluster_two_shards, currentDatabase(), data);
insert into data values ('foo');

select * from dist group by key;

The last query returns (non-deterministically) the following result which is wrong (key is duplicated):

┌─key─┐
│ foo │
└─────┘
┌─key─┐
│ foo │
└─────┘

In logs we can see that identical keys go to different buckets:

2023.12.13 14:18:23.348050 [ 2207426 ] {8c005767-91d3-44ae-8d67-c5b4ba610f79} <Trace> Aggregator: Merging partially aggregated blocks (bucket = 174).
2023.12.13 14:18:23.348077 [ 2207426 ] {8c005767-91d3-44ae-8d67-c5b4ba610f79} <Debug> Aggregator: Merged partially aggregated blocks for bucket #174. Got 1 rows, 1.00 B from 1 source rows in 1.7239e-05 sec. (58008.005 rows/sec., 56.65 KiB/sec.)

2023.12.13 14:18:23.348123 [ 2207477 ] {8c005767-91d3-44ae-8d67-c5b4ba610f79} <Trace> Aggregator: Merging partially aggregated blocks (bucket = 222).
2023.12.13 14:18:23.348153 [ 2207477 ] {8c005767-91d3-44ae-8d67-c5b4ba610f79} <Debug> Aggregator: Merged partially aggregated blocks for bucket #222. Got 1 rows, 1.00 B from 1 source rows in 1.9668e-05 sec. (50844.011 rows/sec., 49.65 KiB/sec.)

Unfortunately I couldn't reproduce it on more real example. Also note that if LowCardinality(String)) is changed to String the result is correct. So probably there is some issue with conversion. However there are the same aggregation methods selected on both shards: Aggregation method: key_string.

@CurtizJ CurtizJ added the potential bug To be reviewed by developers and confirmed/rejected. label Dec 13, 2023
@nickitat
Copy link
Member

difference in hashing methods can cause this: #42630

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
potential bug To be reviewed by developers and confirmed/rejected.
Projects
None yet
Development

No branches or pull requests

2 participants