Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM for select count(distinct ) query #47520

Closed
rikuiki opened this issue Mar 13, 2023 · 5 comments
Closed

OOM for select count(distinct ) query #47520

rikuiki opened this issue Mar 13, 2023 · 5 comments

Comments

@rikuiki
Copy link

rikuiki commented Mar 13, 2023

You have to provide the following information whenever possible.

select count(distinct ) query produces OOM

A clear and concise description of what works not as it is supposed to.

I created table from external tsv as MergeTree:

set max_memory_usage = 8000000000;
set max_memory_usage_for_user = 8000000000;
set max_bytes_before_external_group_by = 10000000;
set max_bytes_before_external_sort = 10000000;

create table if not exists t ENGINE = MergeTree() order by v1 as
select *
from file('/usr/proj/tsv-*', TabSeparated, 'v1 text, v2 text, v3 text, seq bigint');

Table has 3B rows.

I run following query: select count(distinct v1) from t;

I receive following error:

Code: 241. DB::Exception: Received from localhost:9000. DB::Exception: Memory limit (for query) exceeded: would use 7.45 GiB (attempt to allocate chunk of 6291456 bytes), maximum: 7.45 GiB.: While executing AggregatingTransform. (MEMORY_LIMIT_EXCEEDED)

@rikuiki rikuiki added the potential bug To be reviewed by developers and confirmed/rejected. label Mar 13, 2023
@Alex-Cheng
Copy link
Contributor

8000000000 / 1024/1024/1024 = 7.45058059692383

It is correct. You may want to increase the settings max_memory_usage and max_memory_usage_for_user.

@rikuiki
Copy link
Author

rikuiki commented Mar 13, 2023

I am wondering if clickhouse is supposed to start doing external group by much earlier because of max_bytes_before_external_group_by = 10MB setting and don't produce OOM.

@den-crane
Copy link
Contributor

den-crane commented Mar 13, 2023

try select uniq(v1) from t;
uniq is using less RAM because it's approximate function.

earlier because of max_bytes_before_external_group_by = 10MB setting and don't produce OOM.

I guess the problem is that max_bytes_before_external_group_by is applicable to count(distinct v1)

@Alex-Cheng
Copy link
Contributor

Alex-Cheng commented Mar 14, 2023

external aggregating is to write interim aggregated results into temporary files and merge them a bit by bit. To trigger the feature, only setting max_bytes_before_external_group_by is not enough, the two level aggregation must be enabled. To enable two-level aggregation, you need to set group_by_two_level_threshold and add GROUP BY in your query. If your query does not have GROUP BY clause the aggregated results are only one row, in such case the 'external aggregating' cannot help for saving memory.

Try running below query:

set max_memory_usage = 8000000000;
set max_memory_usage_for_user = 8000000000;
set max_bytes_before_external_group_by = 10000000;
set max_bytes_before_external_sort = 10000000;

set group_by_two_level_threshold = 16;

create table if not exists t ENGINE = MergeTree() order by v1 as
select *
from file('/usr/proj/tsv-*', TabSeparated, 'v1 text, v2 text, v3 text, seq bigint');

select count(distinct v1) from t group by seq % 32;

uniq function is not 'exact distinct', it utilizes a special statistics algorithm which only require small amount of memory. So uniq works because it does not cost much memory. uniqExact uses a lot of memory and I guess you can reproduce OOM issue when running uniqExact.

@den-crane den-crane added question Question? question-answered and removed potential bug To be reviewed by developers and confirmed/rejected. labels Mar 14, 2023
@den-crane
Copy link
Contributor

den-crane commented Mar 14, 2023

As I mentioned max_bytes_before_external_group_by does not impact on uniqExact itself.

max_threads=1 may help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants