Simple query with Include() executes 2 SQL queries instead of 1 #9987

isonil · 2017-10-05T22:06:30Z

This query:

var data = await db.Users.Where(x => x.Id == userId)
    .Include(x => x.NotificationsReceived)
    .Select(x => new {
        Reputation = x.Reputation,
        Notification = x.NotificationsReceived.FirstOrDefault() })
    .FirstOrDefaultAsync();

Executes 2 queries which seems unnecessary

Microsoft.EntityFrameworkCore.Database.Command:Information: Executed DbCommand (197ms) [Parameters=[@__userId_0='?' (Size = 450)], CommandType='Text', CommandTimeout='30']
SELECT TOP(1) [x].[Reputation], [x].[Id]
FROM [AspNetUsers] AS [x]
WHERE [x].[Id] = @__userId_0
Application Insights Telemetry (unconfigured): {"name":"Microsoft.ApplicationInsights.Dev.Message","time":"2017-10-05T21:51:53.8617663Z","tags":{"ai.operation.parentId":"|e09e27b1-43ce897e3a8f6584.","ai.operation.id":"e09e27b1-43ce897e3a8f6584","ai.operation.name":"GET Home/Index","ai.location.ip":"127.0.0.1","ai.internal.sdkVersion":"aspnet5c:2.1.1","ai.internal.nodeName":"ison","ai.application.ver":"1.0.0.0","ai.cloud.roleInstance":"ison"},"data":{"baseType":"MessageData","baseData":{"ver":2,"message":"Executed DbCommand (197ms) [Parameters=[@__userId_0='?' (Size = 450)], CommandType='Text', CommandTimeout='30']\r\nSELECT TOP(1) [x].[Reputation], [x].[Id]\r\nFROM [AspNetUsers] AS [x]\r\nWHERE [x].[Id] = @__userId_0","severityLevel":"Information","properties":{"commandType":"Text","AspNetCoreEnvironment":"Development","parameters":"@__userId_0='?' (Size = 450)","CategoryName":"Microsoft.EntityFrameworkCore.Database.Command","DeveloperMode":"true","commandTimeout":"30","commandText":"SELECT TOP(1) [x].[Reputation], [x].[Id]\r\nFROM [AspNetUsers] AS [x]\r\nWHERE [x].[Id] = @__userId_0","elapsed":"197","{OriginalFormat}":"Executed DbCommand ({elapsed}ms) [Parameters=[{parameters}], CommandType='{commandType}', CommandTimeout='{commandTimeout}']{newLine}{commandText}"}}}}
Microsoft.EntityFrameworkCore.Database.Command:Information: Executed DbCommand (39ms) [Parameters=[@_outer_Id='?' (Size = 450)], CommandType='Text', CommandTimeout='30']
SELECT TOP(1) [n].[Id], [n].[DateReceived], [n].[IsRead], [n].[Message], [n].[RecipientId], [n].[TargetUrl]
FROM [Notifications] AS [n]
WHERE @_outer_Id = [n].[RecipientId]
Application Insights Telemetry (unconfigured): {"name":"Microsoft.ApplicationInsights.Dev.Message","time":"2017-10-05T21:51:53.9300546Z","tags":{"ai.operation.parentId":"|e09e27b1-43ce897e3a8f6584.","ai.operation.id":"e09e27b1-43ce897e3a8f6584","ai.operation.name":"GET Home/Index","ai.location.ip":"127.0.0.1","ai.internal.sdkVersion":"aspnet5c:2.1.1","ai.internal.nodeName":"ison","ai.application.ver":"1.0.0.0","ai.cloud.roleInstance":"ison"},"data":{"baseType":"MessageData","baseData":{"ver":2,"message":"Executed DbCommand (39ms) [Parameters=[@_outer_Id='?' (Size = 450)], CommandType='Text', CommandTimeout='30']\r\nSELECT TOP(1) [n].[Id], [n].[DateReceived], [n].[IsRead], [n].[Message], [n].[RecipientId], [n].[TargetUrl]\r\nFROM [Notifications] AS [n]\r\nWHERE @_outer_Id = [n].[RecipientId]","severityLevel":"Information","properties":{"commandType":"Text","AspNetCoreEnvironment":"Development","parameters":"@_outer_Id='?' (Size = 450)","CategoryName":"Microsoft.EntityFrameworkCore.Database.Command","DeveloperMode":"true","commandTimeout":"30","commandText":"SELECT TOP(1) [n].[Id], [n].[DateReceived], [n].[IsRead], [n].[Message], [n].[RecipientId], [n].[TargetUrl]\r\nFROM [Notifications] AS [n]\r\nWHERE @_outer_Id = [n].[RecipientId]","elapsed":"39","{OriginalFormat}":"Executed DbCommand ({elapsed}ms) [Parameters=[{parameters}], CommandType='{commandType}', CommandTimeout='{commandTimeout}']{newLine}{commandText}"}}}}

smitpatel · 2017-10-05T22:42:43Z

Include is ignored in this case actually since you are not materializing an entity instance.
Since x.NotificationsReceived is collection navigation with result operator, we evaluate it in separate step. (N+1 scenario). Second query would be run each time (with different value of @_outer_Id) for row in first query results.
We are looking at improving scenario in #9282

If you have a single query SQL for above case then let us know.

isonil · 2017-10-05T22:55:09Z

So what would be the proper syntax to get data from ApplicationUser and from NotificationsReceived in a single query? It doesn't exist? If so then it looks like a design flaw.

Second query would be run each time (with different value of @_outer_Id) for row in first query results.

On a side note: how come this doesn't even give any warning? This behavior looks extremely bad.

smitpatel · 2017-10-05T23:03:04Z

If you can provide single SQL query which would get all data we can look into how to write Linq query to achieve that translation or even improve query pipeline with linq query like above to generate such translation. Beyond that it is all about how & what data is stored. If the data stored does not allow single query comprehension then it may not be possible.

It gives warning during query compilation phase that it would be evaluated on client.

isonil · 2017-10-06T00:20:42Z

I see how this could be tricky. But if the count is constant (e.g. Take(5)), then it's possible I think.

1 notification:

CREATE table users (id int, name text);
CREATE table notifications (id int, userId int, message text);

INSERT users values (0, 'Foo');
INSERT users values (1, 'Bar');
INSERT notifications values (0, 0, 'Hello');
INSERT notifications values (1, 0, 'Bye');
INSERT notifications values (2, 0, 'Bye2');

select (select users.name from users where users.id = 0), (select top 1 notifications.message from notifications where notifications.userId = 0)

output: Foo Hello

X notifications:

CREATE table users (id int, name text);
CREATE table notifications (id int, userId int, message text);

INSERT users values (0, 'Foo');
INSERT users values (1, 'Bar');
INSERT notifications values (0, 0, 'Hello');
INSERT notifications values (1, 0, 'Bye');
INSERT notifications values (2, 0, 'Bye2');

select (select users.name from users where users.id = 0),
  (select top 1 notifications.message from notifications where notifications.userId = 0),
  (select top 1 z.message from (select top 2 * from notifications where notifications.userId = 0 order by notifications.id) z order by z.id desc),
  (select top 1 z.message from (select top 3 * from notifications where notifications.userId = 0 order by notifications.id) z order by z.id desc)

output: Foo Hello Bye Bye2

It just feels very hacky to have to connect to the database so many times only because there's no syntax to get the results in a single query. Connecting to database also takes time, correct? Doesn't it mean that it would be faster to execute everything in a single query (even if database would have to split it into several queries internally anyway)?

smitpatel · 2017-10-06T21:24:33Z

Presently, EFCore make the argument of skip/take as parameter since same SQL can be used with just different Top/Fetch/Offset. Let's break apart the case of FirstOrDefault() & Take.

For Take every different argument, we are generating a different SQL. That is not scalable. For every value of arg in Take we need to recompile whole query to generate desired SQL. Query compilation is also costly step. We need a scalable translation where we can just change parameter value and reuse the previously compiled query. Since with Take we are returning group of Notification in projection, as is the case with any ORM, it has to be flattened out. Potentially doing some cross join/apply to achieve such flattened results. The issue with such join (or flattened SQL) is, your projection columns from first table (users) is repeated (same values) for every related row coming from other table (notification in this case)
e.g. of output of such query

id |  name  | message
0  | 'Foo'  | 'Hello'
0  | 'Foo'  | 'Bye'
0  | 'Foo'  | 'Bye2'

Of course, it is possible to write a single query generating above output and forming correct projection on client side. But the repetition of data on left side causes duplicated data to be fetched from server. Even though there will be single query in this case, there will be significant amount of data transferred from server (which is unnecessary). To combat with such issue, we decided to generate 2 queries (to fetch relevant data from each table separately) and combine them on client. Even though its 2 queries, it avoids duplicated data fetch. There are different cases, when one method would be faster than other but it is not possible to determine which one would be faster without doing database query. Hence on safe side, we produce 2 queries. For queries like above with Take, we are causing N+1 queries but we are trying to condense such behavior into 2 queries which is being tracked in #9282

Coming back to case of FirstOrDefault, it does guarantee that there will be only 1 related row, so we can perhaps do cross apply without worrying about data explosion. I have filed issue #10001 to track it.

Conclusion: When looking at performance, the number of queries run against alone is not a good indicator. You also need to consider the amount of data being fetch back. Also in the context of EF, you need to consider the cost of compiling the query. Not to mention, SqlServer also catches SQL queries but for each new SQL, it also need to generate query execution plan. The best way to approach is to measure perf by running queries.

isonil · 2017-10-06T22:40:42Z

Thank you for the detailed explanation. It definitely helped me understand how difficult it would be to implement such behavior and I agree with everything you said.

However the main problem is still there, but at least I now know that it can't be solved by EntityFramework. It just feels bad that a database, which is supposed to allow fetching data, fails at its only purpose due to lacking syntax and what seems like a bad design flaw.

In theory it should be possible to fetch user info + any number of notifications with:

a single query
a single, constant and compilable query
and without any duplicate data

The only thing which prevents this is syntax.

Anyway, it's just some idle musing. Thanks for your help.

smitpatel closed this as completed Oct 5, 2017

smitpatel added the closed-by-design label Oct 5, 2017

ajcvickers reopened this Oct 16, 2022

ajcvickers closed this as not planned Won't fix, can't repro, duplicate, stale Oct 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simple query with Include() executes 2 SQL queries instead of 1 #9987

Simple query with Include() executes 2 SQL queries instead of 1 #9987

isonil commented Oct 5, 2017

smitpatel commented Oct 5, 2017

isonil commented Oct 5, 2017 •

edited

Loading

smitpatel commented Oct 5, 2017

isonil commented Oct 6, 2017

smitpatel commented Oct 6, 2017

isonil commented Oct 6, 2017

Simple query with Include() executes 2 SQL queries instead of 1 #9987

Simple query with Include() executes 2 SQL queries instead of 1 #9987

Comments

isonil commented Oct 5, 2017

smitpatel commented Oct 5, 2017

isonil commented Oct 5, 2017 • edited Loading

smitpatel commented Oct 5, 2017

isonil commented Oct 6, 2017

smitpatel commented Oct 6, 2017

isonil commented Oct 6, 2017

isonil commented Oct 5, 2017 •

edited

Loading