New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect results in 0.7.1 and 0.7.2-dev2675 vs 0.4.0 (Issue with IEJoin/CTE's/?) #7278
Comments
SELECT SETSEED(0.8675309); Will make the random numbers deterministic. |
I'll make that change! |
Updated to be deterministic using |
In case it helps, a more minimal reproduction (I've been interested in understanding duckdb's IEJoin implementation so this seemed like a good chance to try to dig in a little... didn't get far in code yet, but at least narrowed down the reproduction): After removing some extraneous factors, like random(), I observed that the number of elements in snapshot data + the nested subquery in the CTE are both factors Edit: to add a better explanation of the cases: Case 1 is a minimized reproduction of the original issue. Case 2 only differs by the sequence length (1000 vs 100). Case 3 is a modification of Case 1 that passes that differs from Case 1 by the removal of the nested subquery in the CTE. Case 1: fails (returns 0 count) w/ generateseries(1000)
Case 2: Passes (101 count) with
|
What happens?
When I run the below query, I receive a null value for the average computation and 0 rows for the count(*) aggregation using 0.7.1 or 0.7.2-dev2675. In 0.4.0, I receive non-zero results for both columns.
The query is an IE join that connects a shiftly calendar with data that is recorded at specific timestamps. I am using the IEJoin to filter the data and assign a shift number to the data. This query is with random data, but I encountered this in a real-world query of similar structure.
Please forgive me on the ugly IN statements... I was just hacking something together and figured that the calendar table was small enough to handle my ugly SQL... ;-)
To Reproduce
I am running this query in the Python client, but it is all encapsulated in a single SQL statement:
Results in 0.7.1 and 0.7.2dev:
Results in 0.4.0:
If I remove the CTE's, the query does return correct results:
OS:
Windows
DuckDB Version:
0.7.1 and 0.7.2-dev2675
DuckDB Client:
Python
Full Name:
Alex Monahan
Affiliation:
Intel and DuckDB Labs
Have you tried this on the latest
master
branch?Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
The text was updated successfully, but these errors were encountered: