Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected Results when Using BETWEEN AND after CREATE INDEX #7260

Closed
suyZhong opened this issue Jan 5, 2024 · 6 comments · Fixed by dolthub/go-mysql-server#2278
Closed
Assignees
Labels
analyzer bug Something isn't working correctness We don't return the same result as MySQL customer issue

Comments

@suyZhong
Copy link

suyZhong commented Jan 5, 2024

Considering the test case below:

CREATE TABLE t0(c0 INT);
CREATE TABLE t1(c0 INT, c1 INT, PRIMARY KEY(c0));
CREATE INDEX t1i1 ON t1(c1 , c0 ); -- necessary to reproduce the issue
INSERT INTO t1 (c1, c0) VALUES (1, 1);
INSERT INTO t1 (c0) VALUES (0);
INSERT INTO t0 (c0) VALUES (true);

SELECT * FROM t0, t1; -- 2 rows: (1 0 NULL), (1 1 1)
SELECT (t0.c0 BETWEEN t1.c1 AND t1.c0) FROM t0, t1; -- 0, 1
SELECT * FROM t0, t1 WHERE (t0.c0 BETWEEN t1.c1 AND t1.c0);
-- Expected: 1 row (1 1 1)
-- Actual: 0 row

The third SELECT returns an empty result, which is surprising: If the result of second query is 0, 1, the value of the BETWEEN expression should be true for one row, and thus the third query should return at least one row in t0, t1.

This test case works well in MySQL, however not in dolt.

I originally find this by building dolt from source version 2d0a2ed. It could also be reproduced in 1.29.7

@timsehn timsehn added bug Something isn't working analyzer correctness We don't return the same result as MySQL labels Jan 5, 2024
@max-hoffman
Copy link
Contributor

This looks like a range heap join bug @jycor @nicktobey

tmp1> explain SELECT * FROM t0, t1 WHERE (t0.c0 BETWEEN t1.c1 AND t1.c0);
+---------------------------------------------+
| plan                                        |
+---------------------------------------------+
| RangeHeapJoin                               |
|  ├─ ((t0.c0 >= t1.c1) AND (t0.c0 <= t1.c0)) |
|  ├─ Sort(t0.c0 ASC)                         |
|  │   └─ Table                               |
|  │       ├─ name: t0                        |
|  │       └─ columns: [c0]                   |
|  └─ IndexedTableAccess(t1)                  |
|      ├─ index: [t1.c1,t1.c0]                |
|      ├─ filters: [{[NULL, ∞), [NULL, ∞)}]   |
|      └─ columns: [c0 c1]                    |
+---------------------------------------------+

@suyZhong
Copy link
Author

Hi all! I'm wondering if there has been any update or assessment regarding this issue.

@max-hoffman
Copy link
Contributor

@nicktobey would you take a pass at this today?

@nicktobey
Copy link
Contributor

I'll take a look at this today.

@nicktobey nicktobey self-assigned this Jan 17, 2024
@nicktobey
Copy link
Contributor

It looks like there's an issue with how the Range Heap Join algorithm handles NULLS in a BETWEEN expression, when there's an index over the lower bound. The index treats NULLs as being less than non-NULL values, while the comparison function used to sort the heap treats NULLs as being greater than non-NULL values.

This discrepancy usually doesn't matter because comparisons involving NULL can never be true. But this is causing the heap to behave incorrectly and omit rows that should be included.

I have a fix but I'm writing additional tests to confirm its correctness. The exact interplay between the different ways that MySQL handles NULL can be somewhat tricky.

@nicktobey
Copy link
Contributor

Related issue: dolthub/go-mysql-server#1903

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analyzer bug Something isn't working correctness We don't return the same result as MySQL customer issue
Projects
None yet
5 participants