New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Indentical subqueries (or CTE) execute only once. (condition pushdown) #21992
Comments
#23539 (comment) can help |
I have a similar problem where I have a performant two-statement query (using a temp table) that I can't convert to a one-statement query due to how clickhouse plans the CTE query. CREATE TEMPORARY TABLE temp AS
SELECT d FROM dist_table WHERE e IN (
SELECT f from local_table
);
SELECT *
FROM dist_table
WHERE a IN (
SELECT b FROM local_table WHERE c IN (temp)
); In this two-statement query, If I try to convert this to a one-statement query, I use WITH temp AS (
SELECT d FROM dist_table WHERE e IN (
SELECT f from local_table
)
SELECT *
FROM dist_table
WHERE a IN (
SELECT b FROM local_table WHERE c IN (temp)
); Clickhouse's query planner "inlines" the query here As a workaround I tried @UnamedRus trick of using WITH (
SELECT groupArray(d) FROM dist_table WHERE e IN (
SELECT f from local_table
) AS temp
SELECT *
FROM dist_table
WHERE a IN (
SELECT b FROM local_table WHERE c IN (select arrayJoin(temp))
)
SETTINGS distributed_product_mode='allow'; The query plan begins by running the CTE query once on the cluster (which is what we want), but then inside the main query the CTE gets rebuilt again on every node... which I think might be expected given In summary, I can't seem to find any workaround to make a performant one-statement query that matches my two-statement query. @amosbird's suggestion of adding a WITH temp AS MATERIALIZED (
SELECT d FROM dist_table WHERE e IN (
SELECT f from local_table
)
SELECT *
FROM dist_table
WHERE a IN (
SELECT b FROM local_table WHERE c IN (temp)
); |
Is there any plan to implement materialized CTEs? If not, is there an issue/discussion where it’s explained why not? I’m wondering for the simple case where there is just one Clickhouse instance, so the trick of using GLOBAL JOIN to force temporary caching doesn’t work, but still want to avoid the round trip of first writing a temporary table. |
Plans yes, but not ETA
I've opened separate feature request for that |
After #2301 fix, clickhouse execute sub queries from single level of query only once. But it doesn't work in case we are using that sub query in WHERE condition and that conditions is being pushed to the inner query.
Use case
Describe the solution you'd like
Clickhouse wouldn't push that kind of conditions or would execute them only once.
Describe alternatives you've considered
Disable predicate optimization by hand:
The text was updated successfully, but these errors were encountered: