Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect result for a JOIN predicate t1.c0 IN (t0.c0) and WHERE predicate t1.c0<=t0.c0 #525

Closed
mrigger opened this issue Apr 11, 2020 · 4 comments

Comments

@mrigger
Copy link
Contributor

mrigger commented Apr 11, 2020

Consider the following statements:

CREATE TABLE t0(c0 INT);
CREATE TABLE t1(c0 FLOAT);
INSERT INTO t0(c0) VALUES (1), (0);
INSERT INTO t1(c0) VALUES (1);
SELECT t1.c0 FROM t1 JOIN t0 ON t1.c0 IN (t0.c0) WHERE t1.c0<=t0.c0; -- expected: {1.0}, actual: {4.67454452791745e-310}

Unexpectedly, the query yields an incorrect result and fetches a value that is not contained in t1.

I found this based on the latest master commit (c52fc9b).

@Mytherin
Copy link
Collaborator

Good find. Looks to me like uninitialized memory is being read.

@mrigger
Copy link
Contributor Author

mrigger commented Apr 12, 2020

I also checked using debug build, which exits with the following:

SQLite version DuckDB 3f0eb513
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> CREATE TABLE t0(c0 INT);
sqlite> CREATE TABLE t1(c0 FLOAT);
sqlite> INSERT INTO t0(c0) VALUES (1), (0);
sqlite> INSERT INTO t1(c0) VALUES (1);
sqlite> SELECT t1.c0 FROM t1 JOIN t0 ON t1.c0 IN (t0.c0) WHERE t1.c0<=t0.c0; -- expected: {1.0}, actual: {4.67454452791745e-310}
duckdb_cli: /duckdb/src/execution/expression_executor.cpp:87: void duckdb::ExpressionExecutor::Verify(duckdb::Expression&, duckdb::Vector&, duckdb::idx_t): Assertion `expr.return_type == vector.type' failed.
Aborted

@Mytherin
Copy link
Collaborator

Interesting, looks like a type mismatch somehow. Thanks :) I will have a look!

@Mytherin
Copy link
Collaborator

Fixed in 0e5a3c4, this was a problem caused by the remove unused column optimizer. This optimizer projects out columns that are no longer required after a filter or join. For example, if we would have a query SELECT t1.c1 FROM t1 JOIN t2 ON t1.id=t2.id WHERE t1.c0=10, the filter would be computed and after t1.c0 would be projected out because it is no longer necessary after the filter. This makes the subsequent join faster to compute.

In this case, what happened was this optimizer detected that t0.c0 was no longer required after the filter, and hence it told the filter that it could remove this column. However, the expression was afterwards removed from the filter and pushed into the join itself, leaving the filter with no expressions. A filter with no expressions was previously entirely ignored, which also caused the column removal not to occur, which then resulted in a bug where the column binding was incorrect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants