-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure that split queries with non-deterministic Skip/Take don't produce wrong results (by adding deterministic orderings) #26808
Comments
Took a look at this again because of #31657, and we review what we're doing around split queries with non-deterministic ordering that potentially yields wrong results. To recap, NorthwindSplitIncludeQuerySqlServerTest.Include_collection_take_no_order_by generates the following two queries: SELECT TOP(@__p_0) [c].[CustomerID], [c].[Address], [c].[City], [c].[CompanyName], [c].[ContactName], [c].[ContactTitle], [c].[Country], [c].[Fax], [c].[Phone], [c].[PostalCode], [c].[Region]
FROM [Customers] AS [c]
ORDER BY [c].[CustomerID]
SELECT [o].[OrderID], [o].[CustomerID], [o].[EmployeeID], [o].[OrderDate], [t].[CustomerID]
FROM (
SELECT TOP(@__p_0) [c].[CustomerID]
FROM [Customers] AS [c]
) AS [t]
INNER JOIN [Orders] AS [o] ON [t].[CustomerID] = [o].[CustomerID]
ORDER BY [t].[CustomerID] The 1st query orders by CustomerID, but the subquery in the second does not; that means arbitrary rows come out of the subquery, potentially resulting in non-matching rows with the first -> data corruption.
|
Note that it is reasonable, in general, to do |
@ajcvickers absolutely. The point is that the SQL generated above in my previous comment - which is exactly the case of Take without ordering - produces potentially incorrect results. It's a completely legitimate query (unlike Skip), but if you use split query you may get the wrong Posts on your Blogs. |
Although not exactly the same, this might be some of the discussion you remember? dotnet/EntityFramework.Docs#3242 Slightly different in that here an OrderBy is provided, just not a deterministic one. One of the suggestions is to append the stable ordering, but that's rejected. If I've followed it correctly, seems the augment is that adding the stable ordering is only done for correct materialisation when streaming, and never as an attempt to fix a user's non-deterministic query, hence not applying to the sub query. Personally just think it should add the stable ordering. Indeterminate results across multiple executions of the same query (in single mode) is one thing- as pointed out above, could even be intended. But wrong results when running the query once in split mode is another thing! |
Thanks @stevendarby, yeah - that's the discussion I had in mind (neglected to search for it in the docs repo). We also discussed this offline. And yeah, I agree that EF should not return incorrect results for AsSplitQuery with non-deterministic ordering, and that we should consider having the ordering in the subquery to ensure matching results between the multiple split queries. Even if not, we should at the very least be warning about all cases where such data corruption may occur (we seem to only be doing it for Skip for now). Note that it's not enough to just include the same ORDER BY inside the subquery: that's still non-deterministic when the the column isn't unique. In this case we'd need to add ordering on the primary key properties to ensure uniqueness as well. I'll bring this discussion to a design meeting. |
Related Github issue: dotnet/efcore#26808
Design discussion: we indeed think that EF should ensure correct results here by injecting orderings as necessary.
|
Related Github issue: dotnet/efcore#26808
Related Github issue: dotnet/efcore#26808
I created a gist to showcase the problem with Npgsql.EntityFrameworkCore.PostgreSQL: https://gist.github.com/kev-andrews/9cb4cc4954ed9cff177beaaeada59a5a the weird thing is...the exact same Code, which produces the same SQL for Postgres and SQL Server, works fine when using MS SQL Server Local Db and Microsoft.EntityFrameworkCore.SqlServer. Generated SQL for the Split Query: Postgres:
SQL Server:
I also tried the gist with Pomelo MySql, it fails the same way Postgres does when querying/merging the split query includes with the following SQL: MYSQL:
I have tried numerous times to cause the same errors on SQL Server, but it seems to always work...I do not know why. |
@kev-andrews I haven't looked in depth, by assuming |
Yes, I overexaggerated the problem by sorting on that column which has all identical values. Our real world problem was an order by on a "LastName" column, with skip/take pagination and more identical last names than the page size, which caused projected relations to not resolve properly when using split queries, as the implicit ordering by "Id" is only added on the outer query by EF core. I would assume that SQL Server just does something similar to primary key ordering internally so it seems to "just work" there. |
That really, really depends - creating a clustered index on another column would probably cause that to change. At the end of the day, if you don't have explicit orderings which result in a fully unique set (mostly unique like LastName vs. fully non-unique like "AlwaysSameValue" makes no difference), your resultset is non-deterministic and anything may happen. |
…and its occurrences in subqueries for collection includes. Fixes dotnet#26808
…and its occurrences in subqueries for collection includes. Fixes dotnet#26808
The core issue here is worse than this issue originally describes. Entity Framework is actively applying different, mismatched orderings to different parts of the split query. Would any EF Core team members have a chance to check out my PR for this? #34097. Apologies for not asking for advanced permission to open a PR per the contribution guidelines, but this started out as an exploratory exercise for me and I just stumbled upon what seems to me like an elegant, simple solution. All feedback is very welcome! |
…and its occurrences in subqueries for collection includes. Fixes dotnet#26808
…and its occurrences in subqueries for collection includes. Fixes dotnet#26808
When doing split query, if there's any non-determinism in the query that may produce wrong results, since the multiple queries may return different/inconsistent data.
In some cases we can be sure the query is non-deterministic, and can throw (#21202 tracks that). In other cases (dotnet/EntityFramework.Docs#3242) we may not be sure (e.g. depends on the uniqueness of the last column being ordered by), so throwing may be too much (and block legitimate, non-dangerous use). A warning may be more appropriate here (probably needs to be discussed).
The text was updated successfully, but these errors were encountered: