Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query cache is not isolated across different databases #64136

Closed
realyota opened this issue May 20, 2024 · 3 comments · Fixed by #64199
Closed

Query cache is not isolated across different databases #64136

realyota opened this issue May 20, 2024 · 3 comments · Fixed by #64199

Comments

@realyota
Copy link
Contributor

Tested on 24.3.2.23 (official build).

Describe the unexpected behaviour
Query cache is used for the same tables created in different databases when no database alias is used.

How to reproduce

Prepare debug environment and data:

CREATE DATABASE test ENGINE = Atomic;

CREATE TABLE default.test
(
    `a` UInt64,
    `b` UUID,
    `c` UInt32 DEFAULT xxHash32(b),
    `d` DateTime,
    `error_count` UInt8,
    INDEX idx_error error_count TYPE minmax GRANULARITY 1
)
ENGINE = ReplacingMergeTree(d)
PARTITION BY toYYYYMM(d)
ORDER BY (a, c)
TTL d + toIntervalMonth(1)
SETTINGS index_granularity = 6, ttl_only_drop_parts = 1;

CREATE TABLE test.test AS default.test;

INSERT INTO test.test SELECT number, generateUUIDv4(), number, '2024-05-20 14:00:00'::DateTime, 2 FROM numbers(10e7);

ALTER TABLE default.test ATTACH PARTITION '202405' FROM test.test;

Now test query cache:

USE default;

SELECT a % 100 ids, max(b) uuids, sum(error_count) ec FROM test GROUP BY ids ORDER BY ids SETTINGS use_query_cache=1;

100 rows in set. Elapsed: 11.742 sec. Processed 75.49 million rows, 1.89 GB (6.43 million rows/s., 160.73 MB/s.)

SELECT * FROM system.query_cache;
      ┌─query───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─result_size─┬─stale─┬─shared─┬─compressed─┬──────────expires_at─┬────────────key_hash─┐
1. │ SELECT a % 100 AS ids, max(b) AS uuids, sum(error_count) AS ec FROM test GROUP BY ids ORDER BY ids ASC SETTINGS use_query_cache = 1 │        3328 │     0 │      0 │          1 │ 2024-05-20 14:09:44 │ 5831134195536845086 │
   └─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴─────────────┴───────┴────────┴────────────┴─────────────────────┴─────────────────────┘

USE test;

SELECT a % 100 ids, max(b) uuids, sum(error_count) ec FROM test GROUP BY ids ORDER BY ids SETTINGS use_query_cache=1;

100 rows in set. Elapsed: 0.001 sec.

SELECT * FROM system.query_cache;
   
┌─query───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─result_size─┬─stale─┬─shared─┬─compressed─┬──────────expires_at─┬────────────key_hash─┐
1. │ SELECT a % 100 AS ids, max(b) AS uuids, sum(error_count) AS ec FROM test GROUP BY ids ORDER BY ids ASC SETTINGS use_query_cache = 1 │        3328 │     0 │      0 │          1 │ 2024-05-20 14:09:44 │ 5831134195536845086 │
   └─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴─────────────┴───────┴────────┴────────────┴─────────────────────┴─────────────────────┘

Expected behavior
I expect the query cache to be aware that I'm selecting the data from another table (same table name in another database).

@den-crane
Copy link
Contributor

Why this example is so over complicated ? It reproduces with any query https://fiddle.clickhouse.com/7ae271e9-e866-4187-a6d6-e84e89a8910e

@den-crane
Copy link
Contributor

den-crane commented May 20, 2024

@rschu1ze result is incorrect with the cache enabled https://fiddle.clickhouse.com/b79e57c1-e687-4985-9489-17eb1ff20a94

In some sense this is a CVE, because an unauthorized user can peek a data.

@rschu1ze
Copy link
Member

rschu1ze commented May 20, 2024

Even more minimal repro: https://fiddle.clickhouse.com/6661f07d-1372-4088-a0ea-9ff0d20090f3

Yes, the user can peek data from tables/columns without having SELECT privileges (assigned directly or by a role).

The source of the problem is that the query cache is based on comparing query ASTs. The same identifier in two ASTs can mean different things ...

The right fix would be to compare the representation with resolved identifiers produced by the Analyzer. It would be a bigger rewrite, however.

A quickfix would be to introduce a privilege for running queries with the query cache. This basically recognizes that the query cache usage can be a security risk (this would also be documented) and must be explicitly enabled by assigning the corresponding privilege. Such a privilege would be similar in spirit to the INTROSPECTION privilege.

Not sure what is the best route here, let me discuss internally.

EDIT: After discussion, we'll go with the quickfix (#64199, will also be backported), and for master I'll switch the query cache to use the Analyzer-representation with resolved identifiers (#64223).

rschu1ze added a commit to rschu1ze/ClickHouse that referenced this issue May 21, 2024
rschu1ze added a commit that referenced this issue May 24, 2024
rschu1ze added a commit that referenced this issue May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants