Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected results when using natural right join #7879

Closed
suyZhong opened this issue Nov 27, 2023 · 13 comments
Closed

Unexpected results when using natural right join #7879

suyZhong opened this issue Nov 27, 2023 · 13 comments

Comments

@suyZhong
Copy link

Considering the test case below.

CREATE TABLE t0(c0 INT, c1 INT);
CREATE TABLE t1(c0 INT);

INSERT INTO t0(c0, c1) VALUES (1, 2);
INSERT INTO t1( c0) VALUES (3);

SELECT * FROM t0 NATURAL RIGHT JOIN t1; -- 3 <null>
SELECT (c0 IN (c0, c1)) FROM t0 NATURAL RIGHT JOIN t1; -- <true>
SELECT * FROM t0 NATURAL RIGHT JOIN t1 WHERE ((c0 IN (c0, c1))); 
-- Expected: 3 <null>
-- Actual: empty

The third SELECT returns an empty result, which is surprising: If the result of second query is true, the value of the IN expression should be true, and thus the third query should return the row of the JOIN , that is 3 <null>, same as the first query.

I found this in version 6.0.0.154 where I built from source code ab37234

@dyemanov
Copy link
Member

WHERE condition is applied after a join. (c0 IN (c0, c1)) means (c0 = c0 OR c0 = c1) and NULL = <something> can never be true.

@mrotteveel
Copy link
Member

Why do you expect 3 NULL values, that will never happen. Do you understand what a NATURAL JOIN does? To quote from the Language reference:

a natural join performs an automatic equi-join on all the columns that have the same name in the left and right table.

That means it is equivalent to a join with USING (c0) or ON t0.c0 = t1.c0, which evaluates to false, so there is no row in the result.

@dyemanov
Copy link
Member

dyemanov commented Nov 27, 2023

It appears that the second query wrongly returns TRUE then.

@dyemanov
Copy link
Member

@mrotteveel It's RIGHT join, so the false join condition does not eliminate the rows from the output.

@dyemanov
Copy link
Member

Firebird v4 returns {3, null} for the third query, while Firebird v5/v6 return no rows.

@dyemanov
Copy link
Member

dyemanov commented Nov 27, 2023

IIRC, every unqualified (without an alias) field reference inside a select list of A NATURAL JOIN B is replaced with COALESCE(A, B) under the hood. So the expression actually becomes COALESCE(T0.C0, T1.C0) IN (COALESCE(T0.C0, T1.C0), T1.C1) inside the select list and thus evaluates to TRUE. If fields are qualified properly, the result is different:

SELECT (t1.c0 IN (t0.c0, t0.c1)) FROM t0 NATURAL RIGHT JOIN t1
-- null

(in all FB versions)

and this:

SELECT * FROM t0 NATURAL RIGHT JOIN t1 WHERE ((t1.c0 IN (t0.c0, t0.c1)));

returns no rows even in v4.

@mrotteveel
Copy link
Member

mrotteveel commented Nov 27, 2023

WHERE condition is applied after a join. (c0 IN (c0, c1)) means (c0 = c0 OR c0 = c1) and NULL = <something> can never be true.

UNKNOWN OR TRUE is TRUE

or more correctly TRUE OR UNKNOWN is TRUE

@mrotteveel
Copy link
Member

mrotteveel commented Nov 27, 2023

It appears that the second query wrongly returns TRUE then.

See my previous comment, that is the correct result: c0 in (c0, c1) is c0 = c0 OR c0 = c1, is 3 = 3 OR 3 = NULL, is TRUE OR UNKNOWN, is TRUE.

@mrotteveel
Copy link
Member

Interestingly enough, the equivalent using = ANY does produce the right result:

SELECT * FROM t0 NATURAL RIGHT JOIN t1 WHERE c0 = any(select c0 from rdb$database union all select c1 from rdb$database);

@dyemanov
Copy link
Member

It appears that the second query wrongly returns TRUE then.

See my previous comment, that is the correct result: c0 in (c0, c1) is c0 = c0 OR c0 = c1, is 3 = 3 OR 3 = NULL, is TRUE OR UNKNOWN, is TRUE.

Yes, if c0 is wrapped with COALESCE. I implied c0 is actually t0.c0 when was writing it's incorrect, as NULL = NULL => UNKNOWN.

@mrotteveel
Copy link
Member

It appears that the second query wrongly returns TRUE then.

See my previous comment, that is the correct result: c0 in (c0, c1) is c0 = c0 OR c0 = c1, is 3 = 3 OR 3 = NULL, is TRUE OR UNKNOWN, is TRUE.

Yes, if c0 is wrapped with COALESCE. I implied c0 is actually t0.c0 when was writing it's incorrect, as NULL = NULL => UNKNOWN.

Sure, but IIRC there should be no t0.c0 or t1.c0 after a natural join, there should only be a single c0 of the join result.

@dyemanov
Copy link
Member

dyemanov commented Nov 27, 2023

Explained plan shows that RIGHT JOIN was changed to INNER JOIN in v5/v6, due to WHERE condition not caring about NULLs. This looks wrong. Maybe the COALESCE replacement happens after this check. I will look closer.

@dyemanov
Copy link
Member

Should be fixed in the next snapshot (tomorrow).

@mrotteveel mrotteveel changed the title Unexpected Results when Using Natural Right Join Unexpected results when using natural right join Dec 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment