-
-
Notifications
You must be signed in to change notification settings - Fork 480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect result from aggregating on table joined in both subquery and CTE #4353
Comments
Update. The subquery works in
|
Nice bug. @max-hoffman has been grinding on CTE issues so he'll grab it once he frees up a bit. |
@snipesjr thank you for the report! I would appreciate any additional debugging context. For example, if the queries are returning incorrect results, sharing how the results are different than expected, or how the |
Yes, for sure. I will send the output that I am getting from within pycharm tomorrow (new laptop). Dolt’s cli client returned the correct results, so it might be an issue elsewhere. Nevertheless, wanted to report it since others connecting using jetbrains products might bump into it. |
We love the report! Just want to get it fixed for you :-) |
@snipesjr Just to clarify, query plans or the modified query strings might be easier to share than full result sets. If you set |
Sharing first 10 records from CTE example (will share the rest):
|
Explain CTE (PyCharm):
|
Also noticed that the CTE returns different result each time I run it. Here's the log I get with mysql-connector-java:
|
Thank you for the additional details @snipesjr ! Made it much easier to figure out what's going on here. I think the temporary workaround is to use > WITH providers AS (
SELECT pro.npi
, IFNULL(ext.taxonomy_code, 'UNKNOWN') AS taxonomy_code
FROM npi_provider pro
LEFT JOIN npi_provider_extra ext
ON (pro.npi = ext.npi
AND ext.primary_taxonomy_switch IN ('Y', 'X'))
)
SELECT p.taxonomy_code
, COUNT(*) AS provider_count
FROM providers AS p
GROUP BY p.taxonomy_code
ORDER BY provider_count DESC
LIMIT 10;
+---------------+----------------+
| taxonomy_code | provider_count |
+---------------+----------------+
| UNKNOWN | 1914495 |
| 183500000X | 197214 |
| 1041C0700X | 196532 |
| 225100000X | 185787 |
| 101YM0800X | 185663 |
| 390200000X | 170594 |
| 106S00000X | 169258 |
| 207Q00000X | 146182 |
| 363LF0000X | 139425 |
| 207R00000X | 136415 |
+---------------+----------------+
10 rows in set (370.44 sec) With the COUNT(*) query, we appear to push the sort too far. So instead of the results being sorted on -- old
Limit(10)
└─ Project
├─ columns: [p.taxonomy_code, COUNT(*) as provider_count]
└─ Sort(COUNT(*) DESC)
└─ GroupBy
-- new
Limit(10)
└─ TopN(Limit: [10]; provider_count DESC)
└─ Project
├─ columns: [p.taxonomy_code, COUNT(*) as provider_count]
└─ GroupBy This does not explain why the two queries are equivalent without the Limit, so I need to do more testing. |
Thanks for the update! Unfortunately, swapping to
Interestingly, I get the random sort without the limit as well; however I see that pycharm is doing a |
Looks like we are selecting 501 random rows from the left join. We can manually disable parallelism ( Running a few comparison queries against MySQL, I think the correct behavior is to 1) only apply the default limit to the top-level query, and 2) use a query's LIMIT clause instead of the default when provided. Alright two bugs:
In the meantime, disabling Thank you for the follow up information! Let us know if there are any other bugs we should clean up here. |
@snipesjr I think this one should fix the Let us know if you run into any more trouble. |
The |
Hi, earlier I tried to do a quick COUNT(*) to spot check some join logic in a CTE. The output appears to be a query planning bug? Seems to be skipping my join condition entirely.
My db is here:
https://www.dolthub.com/repositories/snipesjr/npi_registry
Subquery example (bad):
CTE example (bad):
Non-subquery/CTE (good):
My dolt version:
I tried to test the query in the web ui but my query timed out.
The text was updated successfully, but these errors were encountered: