You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
How to reproduce
Clickhouse server version 20.8.2.3
CREATE TABLE test_group_by (key UInt32, key_a String, key_b String, key_c LowCardinality(String), key_d LowCardinality(String), key_e FixedString(14), key_f FixedString(14), key_g LowCardinality(FixedString(14)), key_h LowCardinality(FixedString(14)) ) ENGINE=MergeTree() PARTITION BY tuple() ORDER BY key;
INSERT INTO test_group_by SELECT number % 40000 , ('number key ' || toString(number % 400)) AS a, ('number key ' || toString(number % 20)) AS b, a, b, a, b, a, b FROM numbers(30000000) ;
Single key
String
SELECT count() FROM test_group_by GROUP BY key_a FORMAT Null;
0 rows in set. Elapsed: 0.045 sec. Processed 30.00 million rows, 681.75 MB (665.30 million rows/s., 15.12 GB/s.)
key_string
LC(String)
SELECT count() FROM test_group_by GROUP BY key_c FORMAT Null;
0 rows in set. Elapsed: 0.018 sec. Processed 30.00 million rows, 65.09 MB (1.71 billion rows/s., 3.71 GB/s.)
Aggregation method: low_cardinality_key_string
FixedString
SELECT count() FROM test_group_by GROUP BY key_f FORMAT Null;
0 rows in set. Elapsed: 0.030 sec. Processed 30.00 million rows, 420.00 MB (1.00 billion rows/s., 14.04 GB/s.)
Aggregation method: keys128
LC(FixedString)
SELECT count() FROM test_group_by GROUP BY key_h FORMAT Null;
0 rows in set. Elapsed: 0.031 sec. Processed 30.00 million rows, 30.17 MB (981.24 million rows/s., 986.91 MB/s.)
Aggregation method: low_cardinality_keys128
LC(FixedString) two times slower than LC(String) and it looks like happens because of: low_cardinality_keys128 instead of low_cardinality_key_fixed_string
FixedString is using keys128 instead of key_fixed_string too.
Other results are expected.
Multiple key
String
SELECT count() FROM test_group_by GROUP BY key_a, key_b FORMAT Null;
0 rows in set. Elapsed: 0.120 sec. Processed 30.00 million rows, 1.33 GB (249.21 million rows/s., 11.02 GB/s.)
Aggregation method: serialized
LC(String)
SELECT count() FROM test_group_by GROUP BY key_c, key_d FORMAT Null;
0 rows in set. Elapsed: 0.112 sec. Processed 30.00 million rows, 95.34 MB (268.63 million rows/s., 853.75 MB/s.)
Aggregation method: serialized
FixedString
SELECT count() FROM test_group_by GROUP BY key_e, key_f FORMAT Null;
0 rows in set. Elapsed: 0.046 sec. Processed 30.00 million rows, 840.01 MB (647.31 million rows/s., 18.13 GB/s.)
Aggregation method: keys256
LC(FixedString)
SELECT count() FROM test_group_by GROUP BY key_g, key_h FORMAT Null;
0 rows in set. Elapsed: 0.044 sec. Processed 30.00 million rows, 95.31 MB (684.91 million rows/s., 2.18 GB/s.)
Aggregation method: low_cardinality_keys256
Clickhouse does have a special aggregate hashmap for single LowCardinality and can benefit greatly from it, but doesn't have that kind of map for multiple Low Cardinality keys or LC keys and fixed length values. And now you can't just use LC(FixedString) and put it in the table ORDER BY because of issue with PK and LC(FixedString).
Additional context
On real table (SSB benchmark), difference between multiple LC(String) and LC(FixedString) keys even more and can be close to 4-5 times (120ms-340ms vs 50ms) and LC(String) has great scatter between repeating requests.
The text was updated successfully, but these errors were encountered:
SELECT count() FROM test_group_by GROUP BY key_g, key_h FORMAT Null; uses low_cardinality_keys256, so afaiu only SELECT count() FROM test_group_by GROUP BY key_c, key_d FORMAT Null; is subject to possible optimization.
more context: #3562 (comment)
How to reproduce
Clickhouse server version 20.8.2.3
Single key
LC(FixedString) two times slower than LC(String) and it looks like happens because of:
low_cardinality_keys128 instead of low_cardinality_key_fixed_string
FixedString is using keys128 instead of key_fixed_string too.
Other results are expected.
Multiple key
Clickhouse does have a special aggregate hashmap for single LowCardinality and can benefit greatly from it, but doesn't have that kind of map for multiple Low Cardinality keys or LC keys and fixed length values. And now you can't just use LC(FixedString) and put it in the table ORDER BY because of issue with PK and LC(FixedString).
Additional context
On real table (SSB benchmark), difference between multiple LC(String) and LC(FixedString) keys even more and can be close to 4-5 times (120ms-340ms vs 50ms) and LC(String) has great scatter between repeating requests.
The text was updated successfully, but these errors were encountered: