Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClickHouse ignores settings from subquery #63078

Open
rjoelnorgren opened this issue Apr 27, 2024 · 3 comments
Open

ClickHouse ignores settings from subquery #63078

rjoelnorgren opened this issue Apr 27, 2024 · 3 comments
Assignees

Comments

@rjoelnorgren
Copy link

Describe what's wrong
SETTINGS clause from a subquery does not apply to the query.

How to reproduce
No subquery

SELECT *
FROM numbers(10)
SETTINGS optimize_read_in_order = 0

-- Query id: d8f9702d-2f45-4976-a1c0-f0401bcf9593

SELECT Settings
FROM system.query_log
WHERE query_id = 'd8f9702d-2f45-4976-a1c0-f0401bcf9593'
LIMIT 1

--    ┌─Settings───────────────────────┐
-- 1. │ {'optimize_read_in_order':'0'} │
--    └────────────────────────────────┘

With subquery

SELECT *
FROM
(
    SELECT *
    FROM numbers(10)
    SETTINGS optimize_read_in_order = 0
)

-- Query id: 98714456-aa89-48b9-a5e0-1ebadc866aaa
;

SELECT Settings
FROM system.query_log
WHERE query_id = '98714456-aa89-48b9-a5e0-1ebadc866aaa'
LIMIT 1


--    ┌─Settings─┐
-- 1. │ {}       │
--    └──────────┘

Expected behavior

Settings are applied to the subquery.

@rjoelnorgren rjoelnorgren added the potential bug To be reviewed by developers and confirmed/rejected. label Apr 27, 2024
@alexey-milovidov
Copy link
Member

Settings are scoped. If a setting is specified in a subquery, it applies to the scope of the subquery.

@alexey-milovidov alexey-milovidov self-assigned this Apr 28, 2024
@alexey-milovidov alexey-milovidov removed the potential bug To be reviewed by developers and confirmed/rejected. label Apr 28, 2024
@rjoelnorgren
Copy link
Author

@alexey-milovidov, sorry my example was not an actual reproduction. More involved example in which I found this was a problem. optimize_reader_in_order = 0 is ~10x improvement in performance for this query pattern. Appears setting is not applied in subquery example. Problematic for some patterns used in https://github.com/ClickHouse/dbt-clickhouse.

no subquery

SELECT
    item_id,
    location_id,
    created_at,
    metric - lagInFrame(metric) OVER prior AS delta
FROM my_table
FINAL
WHERE (created_at >= (now() - toIntervalDay(7))) AND (created_at <= now())
WINDOW prior AS (PARTITION BY item_id, location_id ORDER BY created_at ASC RANGE BETWEEN 86400 PRECEDING AND 1 PRECEDING)
SETTINGS optimize_read_in_order = 0
FORMAT `Null`

-- optimize_read_in_order = 0
-- 0 rows in set. Elapsed: 4.400 sec. Processed 682.14 million rows, 12.28 GB (155.02 million rows/s., 2.79 GB/s.)
-- Peak memory usage: 10.99 GiB.

-- optimize_in_order = 1
-- 0 rows in set. Elapsed: 46.819 sec. Processed 682.14 million rows, 12.28 GB (14.57 million rows/s., 262.26 MB/s.)
-- Peak memory usage: 676.24 MiB.

subquery

SELECT *
FROM
(
    SELECT
        item_id,
        location_id,
        created_at,
        metric - lagInFrame(metric) OVER prior AS delta
    FROM my_table
    FINAL
    WHERE (created_at >= (now() - toIntervalDay(7))) AND (created_at <= now())
    WINDOW prior AS (PARTITION BY item_id, location_id ORDER BY created_at ASC RANGE BETWEEN 86400 PRECEDING AND 1 PRECEDING)
    SETTINGS optimize_read_in_order = 0
)
FORMAT `Null`

-- optimize_read_in_order = 0
-- 0 rows in set. Elapsed: 45.771 sec. Processed 682.14 million rows, 12.28 GB (14.90 million rows/s., 268.26 MB/s.)
-- Peak memory usage: 688.16 MiB.

-- optimize_read_in_order = 1
-- 0 rows in set. Elapsed: 45.059 sec. Processed 682.14 million rows, 12.28 GB (15.14 million rows/s., 272.50 MB/s.)
-- Peak memory usage: 692.87 MiB.

ddl

CREATE TABLE my_table
(
    `item_id` UInt64 CODEC(Delta(8), ZSTD(1)),
    `location_id` UInt16 CODEC(Delta(2), ZSTD(1)),
    `metric` SimpleAggregateFunction(max, UInt32) CODEC(T64, ZSTD(1)),
    `created_at` DateTime CODEC(Delta(4), ZSTD(1))
)
ENGINE = AggregatingMergeTree
PARTITION BY toStartOfWeek(created_at)
ORDER BY (item_id, location_id, created_at)
SETTINGS index_granularity = 8192

@italodamato
Copy link

yes I've also just noticed the same using optimize_aggregation_in_order=1 in the subquery

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants