Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect 'where' filtration after full join #20497

Closed
DimasKovas opened this issue Feb 15, 2021 · 3 comments · Fixed by #20622
Closed

Incorrect 'where' filtration after full join #20497

DimasKovas opened this issue Feb 15, 2021 · 3 comments · Fixed by #20622
Assignees
Labels
bug Confirmed user-visible misbehaviour in official release comp-joins JOINs

Comments

@DimasKovas
Copy link
Contributor

Describe the bug
Strange behavior with 'where' condition after join. Some rows are discarded by mistake.

How to reproduce

  • Which ClickHouse server version to use
    Latest version in Arcadia.

  • CREATE TABLE statements for all tables involved

create table t0 (a Int64) engine = MergeTree() primary key a;
create table t1 (a Int64) engine = MergeTree() primary key a;
  • Sample data for all these tables
insert into t0 values (1), (2);
insert into t1 values (1), (3);
  • Queries to run that lead to unexpected result
select multiIf(a != 0 and t1.a != 0, 'both', a != 0 and t1.a = 0, 'first', a = 0 and t1.a != 0, 'second', 'error') as type from (select t0.a, t1.a from t0 full join t1 using a) where type = 'first';
select multiIf(a != 0 and t1.a != 0, 'both', a != 0 and t1.a = 0, 'first', a = 0 and t1.a != 0, 'second', 'error') as type from (select t0.a, t1.a from t0 full join t1 using a) where type in ('both', 'first', 'second');

Expected behavior
Result should contain 1 row in first query and 3 in second one.

Additional context
First two queries and the last one return correct results. Others do not. The queries differ only in 'where' condition.

di.man.yp-c.yandex.net :) select multiIf(a != 0 and t1.a != 0, 'both', a != 0 and t1.a = 0, 'first', a = 0 and t1.a != 0, 'second', 'error') as type from (select t0.a, t1.a from t0 full join t1 using a)

SELECT multiIf((a != 0) AND (t1.a != 0), 'both', (a != 0) AND (t1.a = 0), 'first', (a = 0) AND (t1.a != 0), 'second', 'error') AS type
FROM 
(
    SELECT
        t0.a,
        t1.a
    FROM t0
    FULL OUTER JOIN t1 USING (a)
)

Query id: 52e62625-d357-4fa4-babe-805332ab1dfd

┌─type──┐
│ both  │
│ first │
└───────┘
┌─type───┐
│ second │
└────────┘

3 rows in set. Elapsed: 0.019 sec.

di.man.yp-c.yandex.net :) select multiIf(a != 0 and t1.a != 0, 'both', a != 0 and t1.a = 0, 'first', a = 0 and t1.a != 0, 'second', 'error') as type from (select t0.a, t1.a from t0 full join t1 using a) where type = 'both'

SELECT multiIf((a != 0) AND (t1.a != 0), 'both', (a != 0) AND (t1.a = 0), 'first', (a = 0) AND (t1.a != 0), 'second', 'error') AS type
FROM 
(
    SELECT
        t0.a,
        t1.a
    FROM t0
    FULL OUTER JOIN t1 USING (a)
)
WHERE type = 'both'

Query id: 4ac91465-c66b-4aed-89cd-c6fbea58ff57

┌─type─┐
│ both │
└──────┘

1 rows in set. Elapsed: 0.028 sec.

di.man.yp-c.yandex.net :) select multiIf(a != 0 and t1.a != 0, 'both', a != 0 and t1.a = 0, 'first', a = 0 and t1.a != 0, 'second', 'error') as type from (select t0.a, t1.a from t0 full join t1 using a) where type = 'first'

SELECT multiIf((a != 0) AND (t1.a != 0), 'both', (a != 0) AND (t1.a = 0), 'first', (a = 0) AND (t1.a != 0), 'second', 'error') AS type
FROM 
(
    SELECT
        t0.a,
        t1.a
    FROM t0
    FULL OUTER JOIN t1 USING (a)
)
WHERE type = 'first'

Query id: 5e90a617-d996-4fde-ace2-9a43e5c96906

Ok.

0 rows in set. Elapsed: 0.028 sec. 

di.man.yp-c.yandex.net :) select multiIf(a != 0 and t1.a != 0, 'both', a != 0 and t1.a = 0, 'first', a = 0 and t1.a != 0, 'second', 'error') as type from (select t0.a, t1.a from t0 full join t1 using a) where type = 'second'

SELECT multiIf((a != 0) AND (t1.a != 0), 'both', (a != 0) AND (t1.a = 0), 'first', (a = 0) AND (t1.a != 0), 'second', 'error') AS type
FROM 
(
    SELECT
        t0.a,
        t1.a
    FROM t0
    FULL OUTER JOIN t1 USING (a)
)
WHERE type = 'second'

Query id: a673bf99-0cfe-4782-912f-3132b150fd83

Ok.

0 rows in set. Elapsed: 0.027 sec. 

di.man.yp-c.yandex.net :) select multiIf(a != 0 and t1.a != 0, 'both', a != 0 and t1.a = 0, 'first', a = 0 and t1.a != 0, 'second', 'error') as type from (select t0.a, t1.a from t0 full join t1 using a) where type in ('both', 'first', 'second')

SELECT multiIf((a != 0) AND (t1.a != 0), 'both', (a != 0) AND (t1.a = 0), 'first', (a = 0) AND (t1.a != 0), 'second', 'error') AS type
FROM 
(
    SELECT
        t0.a,
        t1.a
    FROM t0
    FULL OUTER JOIN t1 USING (a)
)
WHERE type IN ('both', 'first', 'second')

Query id: 8cc4b758-6b0d-41b3-8dc9-3932c46c00ce

┌─type──┐
│ both  │
│ first │
└───────┘

2 rows in set. Elapsed: 0.028 sec.

di.man.yp-c.yandex.net :) select multiIf(a != 0 and t1.a != 0, 'both', a != 0 and t1.a = 0, 'first', a = 0 and t1.a != 0, 'second', 'error') as type from (select t0.a, t1.a from t0 full join t1 using a) where type not in ('both', 'first')

SELECT multiIf((a != 0) AND (t1.a != 0), 'both', (a != 0) AND (t1.a = 0), 'first', (a = 0) AND (t1.a != 0), 'second', 'error') AS type
FROM 
(
    SELECT
        t0.a,
        t1.a
    FROM t0
    FULL OUTER JOIN t1 USING (a)
)
WHERE type NOT IN ('both', 'first')

Query id: e8f174d1-f2e4-424b-9ad0-2fa233ceb0b0

┌─type───┐
│ second │
└────────┘

1 rows in set. Elapsed: 0.034 sec. 
@DimasKovas DimasKovas added the bug Confirmed user-visible misbehaviour in official release label Feb 15, 2021
@filimonov filimonov added the comp-joins JOINs label Feb 16, 2021
@vdimir
Copy link
Member

vdimir commented Feb 17, 2021

Error reproduced with setting enable_optimize_predicate_expression = 1 (enabled by default). We are going to fix it.

@kolechenkov
Copy link
Contributor

Is there a ETA of issue solving?

vdimir added a commit to vdimir/ClickHouse that referenced this issue Mar 24, 2021
@vdimir
Copy link
Member

vdimir commented Mar 24, 2021

@kolechenkov

Workaround is to disable optimizing predicate expression SET enable_optimize_predicate_expression = 0.

We are working on fix in #20622 I hope it will be merged soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Confirmed user-visible misbehaviour in official release comp-joins JOINs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants