Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust EF behavior based on query buffering vs. streaming #20076

Open
roji opened this issue Feb 26, 2020 · 8 comments
Open

Adjust EF behavior based on query buffering vs. streaming #20076

roji opened this issue Feb 26, 2020 · 8 comments

Comments

@roji
Copy link
Member

roji commented Feb 26, 2020

Queries currently execute in streaming mode (see exception below), i.e. they process rows one-by-one from DbDataReader. If eager loading is being performed, this requires EF to add an ORDER BY clause grouping together all rows of an entity, so that related entities can be loaded and returned to the user. There is at least some evidence that these ORDER BY clauses can have negative perf impact, especially when they are over a more complex subquery.

We can allow the user to specify buffering behavior on a given query (i.e. AsBuffering/WithBuffering operator). For tracking queries projecting out entities with eager loading, this would allow us to remove the ORDER BY clause; the entire resultset is consumed before returning the first result (buffering), and related entities can simply be fixed up with identity resolution.

Another advantage: we currently buffer internally (buffered reader) when using a retrying strategy, to make sure that if the user is streaming results and and an exception occurs, data consistency is guaranteed (2nd retry could return different results). If we know that the user is buffering, we don't need to do that any more.

Finally, as an alternative to EF-specific operators (e.g. AsBuffering), we'd ideally get this information automatically when the user executes ToList or similar. This could be done in two ways:

  • Introduce an IQueryable overload of ToList (see Introduce an IQueryable ToDictionary method (to align with async) and translate it #16730 for an IQueryable overload of ToDictionary). This would inject a ToList node into the query tree, just like IQueryable Single does. The problem is obviously backwards-compat - all older LINQ providers would immediately break, as the node isn't recognized. We're working elsewhere on expression tree versioning, in order to allow newer C# constructs to be used in expression trees - this could be related.
  • ToList internally checks whether the source is an IIListProvider (internal interface), and if so, calls ToList on it. This allows specific collection/enumerable types to implement the (optimized) behavior they want when ToList is called. The IQueryable returned by EF Core could implement IIListProvider, and the implementation would inject the new node side-stepping all backwards-compat issues with older LINQ providers.

Some notes (discussed in design meeting):

  • If we go with EF-specific operators, then this behavior needs to be opt-in via an operator and not the default, because it implies a long wait before the first result is returned (so change in latency, perf profile of the query).
  • This is an optimization that's orthogonal to query splitting (Significant Query Slowdown When Using Multiple Joins Due To Changes In 3.0 #18022). However, it's possible that once a good query splitting shortcut is provided, the perf impact of the ORDER BY clause would no longer be significant enough to do this. On the other hand, query splitting isn't always desirable (data consistency issues, multiple roundtrips), so this ORDER BY elimination might still be relevant. Requires a perf investigation.
  • Note that we already implement a type of buffering internally when retrying execution strategies are used, although that is at the reader level (BufferedDataReader). The idea here is to use the change tracker as the "buffer".
@roji
Copy link
Member Author

roji commented May 22, 2021

@smitpatel @AndriySvyryd added some points above following an offline conversation with @Emill.

@raffaeler

This comment was marked as off-topic.

@roji

This comment was marked as off-topic.

@raffaeler

This comment was marked as off-topic.

@roji

This comment was marked as off-topic.

@roji
Copy link
Member Author

roji commented Dec 10, 2022

The ongoing precompiled work for NativeAOT may interact with this: at design-time, when we locate a query terminating with ToList, we can inject an extra buffering/streaming node into the tree; this is effectively like introducing an IQueryable ToList which injects the node (like our ToListAsync), but without the big breaking change.

However, note that this depends on call-site replacement, otherwise at runtime, the query tree that reaches our cache doesn't contain the extra node.

@crega
Copy link

crega commented Oct 20, 2023

Any progress on this front? I am mostly interested in removing ORDER BY when joining multiple entities?

@roji
Copy link
Member Author

roji commented Oct 20, 2023

@crega that sounds like #29171, follow that. I definitely think it's an important, high-value thing but we don't know yet whether it'll make it into 9 (8 is already finished).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants