cache dictionary request the same key lot of times #51762

filimonov · 2023-07-04T09:26:39Z

then one single block of data processed by dictGet have a lot of duplicate keys ClickHouse will construct suboptimal query like

SELECT ... FROM dict_source WHERE `id` IN (1000, 1000, ..., 1000, 1000, 1000);

instead of simple

SELECT ... FROM dict_source WHERE `id` IN (1000);

Repro:

DROP TABLE IF EXISTS dict_source;
DROP DICTIONARY IF EXISTS test_cache_dict;

CREATE TABLE dict_source engine = Log AS SELECT number as id, toString(number) as value FROM numbers(10000);


CREATE DICTIONARY IF NOT EXISTS test_cache_dict (
    id UInt64,
    value String
)
PRIMARY KEY id
SOURCE(
    CLICKHOUSE(
        host '127.0.0.2'
        DB currentDatabase()
        TABLE 'dict_source'
    )
)
LAYOUT(
    CACHE(SIZE_IN_CELLS 10000)
)
LIFETIME(MIN 1 MAX 100);


SELECT dictGet(test_cache_dict, 'value', materialize(toUInt64(1000))) FROM numbers_mt(1000) SETTINGS max_block_size = 50, max_threads = 4 FORMAT Null;

system flush logs;
SELECT event_time, query FROM system.query_log WHERE event_time > now() - 300 and has(tables, currentDatabase() || '.dict_source') and type = 'QueryFinish' and type = 'QueryFinish' and query_kind='Select' ORDER BY event_time DESC LIMIT 10;


SELECT dict_key, count(query_id) number_of_queries_to_source, sum(count_per_query) as sum_key_requests, max(count_per_query) as max_key_requests_per_query FROM (
SELECT  arrayJoin(splitByRegexp(',\s+', extract(query, 'IN \((.*)\);'))) as dict_key, query_id, count() count_per_query FROM system.query_log WHERE event_time > now() - 300 and has(tables, currentDatabase() || '.dict_source') and type = 'QueryFinish' and query_kind='Select' GROUP BY dict_key, query_id) GROUP BY dict_key FORMAT PrettyCompactMonoBlock;

filimonov · 2023-07-04T09:27:23Z

It is the regression between 21.3 and 21.4, most probably introduced during that refactoring #20595

diegov · 2023-07-04T14:49:51Z

Is the fix gong to be applied to the query that is issued against the source of the dictionary? Or would it apply to the queue, eg. by making tryPushToUpdateQueueOrThrow a noop if the key is already in the the queue to be updated?

We've had issues with dictionary update queue sizes due to a very large number of dictGet operations, which often add the same key to the update queue over and over until the first update task populates the value.

filimonov added the unexpected behaviour label Jul 4, 2023

filimonov added the comp-dictionary Dictionaries label Jul 4, 2023

kitaisreal self-assigned this Jul 4, 2023

kitaisreal mentioned this issue Jul 5, 2023

CacheDictionary request only unique keys from source #51853

Merged

robot-clickhouse closed this as completed in #51853 Jul 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cache dictionary request the same key lot of times #51762

cache dictionary request the same key lot of times #51762

filimonov commented Jul 4, 2023

filimonov commented Jul 4, 2023

diegov commented Jul 4, 2023 •

edited

cache dictionary request the same key lot of times #51762

cache dictionary request the same key lot of times #51762

Comments

filimonov commented Jul 4, 2023

filimonov commented Jul 4, 2023

diegov commented Jul 4, 2023 • edited

diegov commented Jul 4, 2023 •

edited