Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect results (ignoring WHERE) when using JOIN combined with NOT IN #25368

Closed
Peter-Sh opened this issue Jun 16, 2021 · 4 comments · Fixed by #25370
Closed

Incorrect results (ignoring WHERE) when using JOIN combined with NOT IN #25368

Peter-Sh opened this issue Jun 16, 2021 · 4 comments · Fixed by #25370
Assignees
Labels
bug Confirmed user-visible misbehaviour in official release comp-joins JOINs major

Comments

@Peter-Sh
Copy link

Peter-Sh commented Jun 16, 2021

I get incorrect results as if particular WHERE condition is ignored when there is JOIN and NOT IN condition (empty set) in a query. I've created as minimal example as I could.


# Test dataset
CREATE TABLE test
(
    `t` UInt8,
    `flag` UInt8,
    `id` UInt8
)
ENGINE = MergeTree
PARTITION BY t
ORDER BY (t, id)
SETTINGS index_granularity = 8192;

INSERT INTO test VALUES (1,0,1),(1,0,2),(1,0,3),(1,0,4),(1,0,5),(1,0,6),(1,1,7),(0,0,7);

# Explore dataset

SELECT * FROM test

┌─t─┬─flag─┬─id─┐
│ 0 │    0 │  7 │
└───┴──────┴────┘
┌─t─┬─flag─┬─id─┐
│ 1 │    0 │  1 │
│ 1 │    0 │  2 │
│ 1 │    0 │  3 │
│ 1 │    0 │  4 │
│ 1 │    0 │  5 │
│ 1 │    0 │  6 │
│ 1 │    1 │  7 │
└───┴──────┴────┘

# Problematic query

SELECT id, flag FROM test t1
INNER JOIN  (SELECT DISTINCT id FROM test) AS t2 ON t1.id = t2.id
WHERE flag = 0 and t = 1 AND id NOT IN (SELECT 1 WHERE 0)

┌─id─┬─flag─┐
│  1 │    0 │
│  2 │    0 │
│  3 │    0 │
│  4 │    0 │
│  5 │    0 │
│  6 │    0 │
│  7 │    1 │ <---- this row should NOT be in result set because of flag = 0 condition in WHERE
└────┴──────┘

# If I remove NOT IN part (which should not affect anything because it is an empty set) I got correct results

SELECT id, flag FROM test t1
INNER JOIN  (SELECT DISTINCT id FROM test) AS t2 ON t1.id = t2.id
WHERE flag = 0 and t = 1
┌─id─┬─flag─┐
│  1 │    0 │
│  2 │    0 │
│  3 │    0 │
│  4 │    0 │
│  5 │    0 │
│  6 │    0 │
└────┴──────┘

Original query was a way more complex, didn't use self join and NOT IN wasn't empty set, I've boiled it down to above simple example while trying to figure out why it's happening. It seems that all things matters here: table and query structure and dataset.

Yes, it is reproducing in current release with default settings.
21.6.4.26 - reproduced (current release)
21.5.6.6 - reproduced

Does NOT reproduce in 20.10.3.30

@Peter-Sh Peter-Sh added the bug Confirmed user-visible misbehaviour in official release label Jun 16, 2021
@den-crane
Copy link
Contributor

den-crane commented Jun 16, 2021

Check with set query_plan_enable_optimizations=0

@den-crane
Copy link
Contributor

reproduced with (set optimize_move_to_prewhere=1, compile_expressions=1, query_plan_enable_optimizations=1)

@Peter-Sh
Copy link
Author

Yes, setting set query_plan_enable_optimizations=0 make this bug to disappear.

@Peter-Sh
Copy link
Author

Wow, not only clickhouse itself is very fast, but its devs seems to be somehow vectorized and very fast too! :) Thanks for quick responses and fixes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Confirmed user-visible misbehaviour in official release comp-joins JOINs major
Projects
None yet
4 participants