-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cross join to unnest array never completes #11820
Comments
Hi @fisher-liquid, Thank you for filing the report. I took a look at this and I think this is expected behavior, it's just that the result size is extremely large. Here are the logical plans for the two queries you mentioned original query
CTE
Note that the CTE is actually not the same query since there is an extra join condition. This is the main reason the CTE finished so fast (more on that later). An equivalent query with a CTE would looks like the following
The query plan for this query is
This query also hangs and does not finish (or will get killed by the OS).
This means the last cross product will end up producing 200,000,000,000 results. If item_id is just a 32 bit integer, the result set has a size of 800 gigabytes. Using the CTE with the extra join condition, only the size of the left table is returned as Let me know if this helps. If you still have issues with a use case surrounding this, let me know 👍 |
Thanks for the detailed write up! I'll stick to unnesting arrays in CTEs in the future instead of cross joining them. |
@Tmonster The query plans should be equivalent though. They might not be equal, but they should produce equal results; they do in PostgreSQL at least. See where the two queries from the OP produce equal results, and only your CTE produces the "big" result. In summary, it seems to me the issue reported by the OP is indeed not a performance issue but, worse, a parsing/compilation issue. |
TL;DR Ah yes, technically this isn't a cross product, sorry for the confusion. My explanation for why isn't great, but it has to do with correlated subqueries and the unnest call. I can spend some more time refining it, but my advice is to stick with CTE's in this case, or the following query
Ah ok yea, technically the two should be equivalent, sorry for the confusion. In the query
the |
The problem with "recommend sticking to CTEs" is that (1) it's hard to know in what situations that recommendation applies (2) CTEs come with their own performance problems (e.g. they can't be "run once and use results many times" afaik). Without having understood the details in your answer, do you agree there is a bug in the current parser/optimiser and do you think that'll be fixed? Or do you think duckdb will define its interpretation of the OP's query as an accepted deviation from the PostgreSQL dialect? |
Ok yea, so this is an issue with our Parser most likely. For some reason DuckDB detects a correlated subquery when in reality this is not a correlated subquery. It will be fixed, but not before v1.0.0 I'm afraid. If you decide to stick to CTE's instead of the subquery solution mentioned in (#11820 (comment)), you can actually run the CTE as a materialized CTE. This way it is a run once, use multiple times I would leave this issue open though, that way I can remember to come back to it. |
What happens?
A cross join to unnest fairly large array fields (2000 items per array) stalls and never completes.
To Reproduce
To reproduce:
After moving the unnest into a CTE, the query does complete in under a second:
OS:
Ubuntu linux x86, and macbook air with m2
DuckDB Version:
0.10.1 and 0.10.2
DuckDB Client:
cli
Full Name:
Fisher Moritzburke
Affiliation:
Liquid Analytics
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
Yes
Did you include all code required to reproduce the issue?
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?
The text was updated successfully, but these errors were encountered: