Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GROUP BY LowCardinality(String/FixedString) perfomance #15005

Open
UnamedRus opened this issue Sep 18, 2020 · 4 comments
Open

GROUP BY LowCardinality(String/FixedString) perfomance #15005

UnamedRus opened this issue Sep 18, 2020 · 4 comments

Comments

@UnamedRus
Copy link
Contributor

UnamedRus commented Sep 18, 2020

How to reproduce
Clickhouse server version 20.8.2.3

CREATE TABLE test_group_by (key UInt32, key_a String, key_b String, key_c LowCardinality(String), key_d LowCardinality(String), key_e FixedString(14), key_f FixedString(14), key_g LowCardinality(FixedString(14)), key_h LowCardinality(FixedString(14)) ) ENGINE=MergeTree() PARTITION BY tuple() ORDER BY key;
INSERT INTO test_group_by SELECT number % 40000 , ('number key ' || toString(number % 400)) AS a, ('number key ' || toString(number % 20)) AS b, a, b, a, b, a, b FROM numbers(30000000) ;

Single key

String
SELECT count() FROM test_group_by GROUP BY key_a  FORMAT Null;
0 rows in set. Elapsed: 0.045 sec. Processed 30.00 million rows, 681.75 MB (665.30 million rows/s., 15.12 GB/s.)
key_string

LC(String)
SELECT count() FROM test_group_by GROUP BY key_c  FORMAT Null;
0 rows in set. Elapsed: 0.018 sec. Processed 30.00 million rows, 65.09 MB (1.71 billion rows/s., 3.71 GB/s.)
Aggregation method: low_cardinality_key_string

FixedString
SELECT count() FROM test_group_by GROUP BY key_f  FORMAT Null;
0 rows in set. Elapsed: 0.030 sec. Processed 30.00 million rows, 420.00 MB (1.00 billion rows/s., 14.04 GB/s.)
Aggregation method: keys128

LC(FixedString)
SELECT count() FROM test_group_by GROUP BY key_h  FORMAT Null;
0 rows in set. Elapsed: 0.031 sec. Processed 30.00 million rows, 30.17 MB (981.24 million rows/s., 986.91 MB/s.)
Aggregation method: low_cardinality_keys128 

LC(FixedString) two times slower than LC(String) and it looks like happens because of:
low_cardinality_keys128 instead of low_cardinality_key_fixed_string
FixedString is using keys128 instead of key_fixed_string too.

Other results are expected.

Multiple key

String
SELECT count() FROM test_group_by GROUP BY key_a, key_b  FORMAT Null;
0 rows in set. Elapsed: 0.120 sec. Processed 30.00 million rows, 1.33 GB (249.21 million rows/s., 11.02 GB/s.)
Aggregation method: serialized

LC(String)
SELECT count() FROM test_group_by GROUP BY key_c, key_d  FORMAT Null;
0 rows in set. Elapsed: 0.112 sec. Processed 30.00 million rows, 95.34 MB (268.63 million rows/s., 853.75 MB/s.)
Aggregation method: serialized

FixedString
SELECT count() FROM test_group_by GROUP BY key_e, key_f  FORMAT Null;
0 rows in set. Elapsed: 0.046 sec. Processed 30.00 million rows, 840.01 MB (647.31 million rows/s., 18.13 GB/s.)
Aggregation method: keys256

LC(FixedString)
SELECT count() FROM test_group_by GROUP BY key_g, key_h  FORMAT Null;
0 rows in set. Elapsed: 0.044 sec. Processed 30.00 million rows, 95.31 MB (684.91 million rows/s., 2.18 GB/s.)
Aggregation method: low_cardinality_keys256

Clickhouse does have a special aggregate hashmap for single LowCardinality and can benefit greatly from it, but doesn't have that kind of map for multiple Low Cardinality keys or LC keys and fixed length values. And now you can't just use LC(FixedString) and put it in the table ORDER BY because of issue with PK and LC(FixedString).

Additional context
On real table (SSB benchmark), difference between multiple LC(String) and LC(FixedString) keys even more and can be close to 4-5 times (120ms-340ms vs 50ms) and LC(String) has great scatter between repeating requests.

@nickitat
Copy link
Member

@UnamedRus seems like it is fixed?

@UnamedRus
Copy link
Contributor Author

UnamedRus commented Apr 30, 2022

Don't think so.

#29131 (comment)

@nickitat
Copy link
Member

nickitat commented Apr 30, 2022

SELECT count() FROM test_group_by GROUP BY key_g, key_h FORMAT Null; uses low_cardinality_keys256, so afaiu only SELECT count() FROM test_group_by GROUP BY key_c, key_d FORMAT Null; is subject to possible optimization.
more context: #3562 (comment)

@UnamedRus
Copy link
Contributor Author

UnamedRus commented Aug 16, 2023

Related #53406

#53406 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants