Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query: Avoid adding unnecessary sort columns in query with projection and optional navigation properties #6861

Closed
jemiller0 opened this issue Oct 26, 2016 · 7 comments
Assignees
Labels
area-perf closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. type-enhancement
Milestone

Comments

@jemiller0
Copy link

Steps to reproduce

Run a query that does a projection such as the following.

                var q = from b in lc.Bibs
                        select new
                        {
                            b.Id,
                            b.OclcNumber,
                            b.Title,
                            b.Author,
                            b.PublicationDate,
                            Language = b.Language.Name,
                            b.LanguageId,
                            Format = b.Format.Name,
                            b.FormatId,
                            BibStatus = b.BibStatus.Name,
                            b.BibStatusId,
                            b.BibStatusLastWriteTime,
                            b.BibStatusLastWriteUserName,
                            b.IsPublic,
                            b.Level,
                            b.CreationTime,
                            b.CreationUserName,
                            b.LastWriteTime,
                            b.LastWriteUserName
                        };
                q = q.OrderByDescending(b => b.LastWriteTime);
                q = q.Skip(200).Take(100);
                var l = q.ToList();

The following zip file contains the definitions for the POCO classes that are used by the above query.

QueryOrderByBug.zip

The issue

EF Core 1.1.0-preview1-final generates the following SQL. There are a few problems with the generated SQL. The main one is that you can see that it is sorting by FormatId and LanguageId columns in addition to the LastWriteTime column which is the only one that was specified in the LINQ query. The Bibs table used below is a table containing 7.7 million rows. The query takes 12 seconds to execute. If I run the SQL using SQL Management Studio, but, comment out the extra order by columns, the query executes in 0 seconds. It looks like what it's doing is adding sort columns for the foreign key columns that are not required. The project above is also bringing in the BibStatus navigation property. That one is required and there isn't a sort for it.

A second problem is that the query is doing a projection, but, it looks like EF Core is bringing in all columns from the tables that are used. In this particular query, there is a column of type VARCHAR(MAX) that contains a lot of data. So, I want to avoid bringing that in.

Another thing that I noticed is that it appears that it's using an INNER JOIN on BibStatus, which makes sense because it's a required field. It's doing LEFT JOINs on the the other fields which are not required, which also makes sense. However, EF 6 used a LEFT JOIN for all of the JOINs. I want to say LEFT JOIN is faster than INNER JOIN. For this particular query, it doesn't appear to make a difference. The main problem that seems to be slowing everything down is the extra ORDER BY columns on the foreign key fields.

exec sp_executesql N'SELECT [b].[Id], [b].[Author], [b].[BibStatusId], [b].[BibStatusLastWriteTime], [b].[BibStatusLastWriteUserName], [b].[Content], [b].[CreationTime], [b].[CreationUserName], [b].[FormatId], [b].[IsPublic], [b].[LanguageId], [b].[LastWriteTime], [b].[LastWriteUserName], [b].[Level], [b].[OclcNumber], [b].[PublicationDate], [b].[Title], [b].[Type], [b.Format].[Id], [b.Format].[CreationTime], [b.Format].[CreationUserName], [b.Format].[LastWriteTime], [b.Format].[LastWriteUserName], [b.Format].[Name], [b.Language].[Id], [b.Language].[Code], [b.Language].[CreationTime], [b.Language].[CreationUserName], [b.Language].[LastWriteTime], [b.Language].[LastWriteUserName], [b.Language].[Name], [b.BibStatus].[Name]
FROM [Bibs] AS [b]
INNER JOIN [BibStatuses] AS [b.BibStatus] ON [b].[BibStatusId] = [b.BibStatus].[Id]
LEFT JOIN [Formats] AS [b.Format] ON [b].[FormatId] = [b.Format].[Id]
LEFT JOIN [Languages] AS [b.Language] ON [b].[LanguageId] = [b.Language].[Id]
ORDER BY [b].[LastWriteTime] DESC, [b].[FormatId], [b].[LanguageId]
OFFSET @__p_0 ROWS FETCH NEXT @__p_1 ROWS ONLY',N'@__p_0 int,@__p_1 int',@__p_0=200,@__p_1=100

The following is what the SQL looks like for EF 6.1.3.

SELECT 
    [Extent1].[Id] AS [Id], 
    [Extent1].[OclcNumber] AS [OclcNumber], 
    [Extent1].[Title] AS [Title], 
    [Extent1].[Author] AS [Author], 
    [Extent1].[PublicationDate] AS [PublicationDate], 
    [Extent2].[Name] AS [Name], 
    [Extent1].[LanguageId] AS [LanguageId], 
    [Extent3].[Name] AS [Name1], 
    [Extent1].[FormatId] AS [FormatId], 
    [Extent4].[Name] AS [Name2], 
    [Extent1].[BibStatusId] AS [BibStatusId], 
    [Extent1].[BibStatusLastWriteTime] AS [BibStatusLastWriteTime], 
    [Extent1].[BibStatusLastWriteUserName] AS [BibStatusLastWriteUserName], 
    [Extent1].[IsPublic] AS [IsPublic], 
    [Extent1].[Type] AS [Type], 
    [Extent1].[Level] AS [Level], 
    [Extent1].[CreationTime] AS [CreationTime], 
    [Extent1].[CreationUserName] AS [CreationUserName], 
    [Extent1].[LastWriteTime] AS [LastWriteTime], 
    [Extent1].[LastWriteUserName] AS [LastWriteUserName]
    FROM    [dbo].[Bibs] AS [Extent1]
    LEFT OUTER JOIN [dbo].[Languages] AS [Extent2] ON [Extent1].[LanguageId] = [Extent2].[Id]
    LEFT OUTER JOIN [dbo].[Formats] AS [Extent3] ON [Extent1].[FormatId] = [Extent3].[Id]
    LEFT OUTER JOIN [dbo].[BibStatuses] AS [Extent4] ON [Extent1].[BibStatusId] = [Extent4].[Id]
    ORDER BY [Extent1].[LastWriteTime] DESC
    OFFSET 0 ROWS FETCH NEXT 100 ROWS ONLY 

The query appears to work correct, just much slower than it should due to the extra sort columns.

Further technical details

EF Core version: 1.1.0-preview1-final
Operating system: Windows 10 Anniversary Update
Visual Studio version: 2015

@rowanmiller rowanmiller added this to the 1.2.0 milestone Oct 26, 2016
@smitpatel
Copy link
Member

We are adding order by columns since the navigation is optional which requires left outer join. We achieve left outer join by rewriting navigation in group join. Since our group join operator is optimized for streaming, we requires the ordering.
Though in this particular case, the navigation is dependent to principal. Which means that there can be only 1 matching record. Hence there is no need of ordering. Results would be correct otherwise too.
We should not add order by when grouping is known to have maximum 1 record.

Assigning to @maumar since it is around the area of navigation rewriting.

@smitpatel smitpatel assigned maumar and unassigned smitpatel Nov 4, 2016
maumar added a commit that referenced this issue Nov 16, 2016
…xtra sort columns in ORDER BY for foreign key columns

When creating LEFT OUTER JOIN calls we add ORDER BY calls based on the key of the outer entity. This is because out GroupJoin method is streaming and we need to have results in order to create the correct groups in a performant way.
However we can there are some optimizations we can do:
- we don't need ORDER BY if the join is from dependent to principal, e.g. orders.Select(o => o.Customer.Name) because there will always be at most one customer associated with a given order.
- in case when we fully translate GroupJoin into SQL we also don't need the extra ORDER BYs because we no longer perform streamed grouping on the client.
maumar added a commit that referenced this issue Nov 16, 2016
…xtra sort columns in ORDER BY for foreign key columns

When creating LEFT OUTER JOIN we add ORDER BY clause(s) based on the key of the outer entity. This is because out GroupJoin method is streaming and we need to have results in order to create the correct groups in a performant way.
However we can there are some optimizations we can do:
1. we don't need ORDER BY if the join is from dependent to principal because there will always be at most one customer associated with a given order.
e.g. orders.Select(o => o.Customer.Name)
which translates to:
from o in Orders
join c in Customers on o.CustomerId equal c.Id into grouping
from c in grouping.DefaultIfEmpty()
select c.Name
 .
2. in case when we fully translate LOJ into SQL (which is now very often the case) we also don't need the extra ORDER BYs because we no longer perform streamed grouping on the client.
maumar added a commit that referenced this issue Nov 16, 2016
…xtra sort columns in ORDER BY for foreign key columns

When creating LEFT OUTER JOIN we add ORDER BY clause(s) based on the key of the outer entity. This is because out GroupJoin method is streaming and we need to have results in order to create the correct groups in a performant way.
However we can there are some optimizations we can do:
1. we don't need ORDER BY if the join is from dependent to principal because there will always be at most one customer associated with a given order.
e.g. orders.Select(o => o.Customer.Name)
which translates to:
from o in Orders
join c in Customers on o.CustomerId equal c.Id into grouping
from c in grouping.DefaultIfEmpty()
select c.Name
 .
2. in case when we fully translate LOJ into SQL (which is now very often the case) we also don't need the extra ORDER BYs because we no longer perform streamed grouping on the client.
maumar added a commit that referenced this issue Nov 16, 2016
…xtra sort columns in ORDER BY for foreign key columns

When creating LEFT OUTER JOIN we add ORDER BY clause(s) based on the key of the outer entity. This is because out GroupJoin method is streaming and we need to have results in order to create the correct groups in a performant way.
However we can there are some optimizations we can do:
1. we don't need ORDER BY if the join is from dependent to principal because there will always be at most one customer associated with a given order.
e.g. orders.Select(o => o.Customer.Name)
which translates to:
from o in Orders
join c in Customers on o.CustomerId equal c.Id into grouping
from c in grouping.DefaultIfEmpty()
select c.Name
 .
2. in case when we fully translate LOJ into SQL (which is now very often the case) we also don't need the extra ORDER BYs because we no longer perform streamed grouping on the client.
maumar added a commit that referenced this issue Nov 16, 2016
…xtra sort columns in ORDER BY for foreign key columns

When creating LEFT OUTER JOIN we add ORDER BY clause(s) based on the key of the outer entity. This is because out GroupJoin method is streaming and we need to have results in order to create the correct groups in a performant way.
However we can there are some optimizations we can do:
1. we don't need ORDER BY if the join is from dependent to principal because there will always be at most one customer associated with a given order.
e.g. orders.Select(o => o.Customer.Name)
which translates to:
from o in Orders
join c in Customers on o.CustomerId equal c.Id into grouping
from c in grouping.DefaultIfEmpty()
select c.Name
 .
2. in case when we fully translate LOJ into SQL (which is now very often the case) we also don't need the extra ORDER BYs because we no longer perform streamed grouping on the client.
maumar added a commit that referenced this issue Nov 17, 2016
…xtra sort columns in ORDER BY for foreign key columns

When creating LEFT OUTER JOIN we add ORDER BY clause(s) based on the key of the outer entity. This is because out GroupJoin method is streaming and we need to have results in order to create the correct groups in a performant way.
However we can there are some optimizations we can do:
1. we don't need ORDER BY if the join is from dependent to principal because there will always be at most one customer associated with a given order.
e.g. orders.Select(o => o.Customer.Name)
which translates to:
from o in Orders
join c in Customers on o.CustomerId equal c.Id into grouping
from c in grouping.DefaultIfEmpty()
select c.Name

2. in case when we fully translate LOJ into SQL (which is now very often the case) we also don't need the extra ORDER BYs because we no longer perform streamed grouping on the client.
maumar added a commit that referenced this issue Nov 17, 2016
…xtra sort columns in ORDER BY for foreign key columns

When creating LEFT OUTER JOIN we add ORDER BY clause(s) based on the key of the outer entity. This is because out GroupJoin method is streaming and we need to have results in order to create the correct groups in a performant way.
However we can there are some optimizations we can do:
1. we don't need ORDER BY if the join is from dependent to principal because there will always be at most one customer associated with a given order.
e.g. orders.Select(o => o.Customer.Name)
which translates to:
from o in Orders
join c in Customers on o.CustomerId equal c.Id into grouping
from c in grouping.DefaultIfEmpty()
select c.Name

2. in case when we fully translate LOJ into SQL (which is now very often the case) we also don't need the extra ORDER BYs because we no longer perform streamed grouping on the client.
maumar added a commit that referenced this issue Nov 19, 2016
…xtra sort columns in ORDER BY for foreign key columns

When creating LEFT OUTER JOIN we add ORDER BY clause(s) based on the key of the outer entity. This is because out GroupJoin method is streaming and we need to have results in order to create the correct groups in a performant way.
However we can there are some optimizations we can do:
1. we don't need ORDER BY if the join is from dependent to principal because there will always be at most one customer associated with a given order.
e.g. orders.Select(o => o.Customer.Name)
which translates to:
from o in Orders
join c in Customers on o.CustomerId equal c.Id into grouping
from c in grouping.DefaultIfEmpty()
select c.Name

In case of composite keys we need to check that entire key is being used as a join condition. Moreover, if there are additional elements in the join condition (on top of the FK->PK) we don't need to add order by either, because the grouping will be even more restrictive.

2. in case when we fully translate LOJ into SQL (which is now very often the case) we also don't need the extra ORDER BYs because we no longer perform streamed grouping on the client.
maumar added a commit that referenced this issue Nov 21, 2016
…xtra sort columns in ORDER BY for foreign key columns

When creating LEFT OUTER JOIN we add ORDER BY clause(s) based on the key of the outer entity. This is because out GroupJoin method is streaming and we need to have results in order to create the correct groups in a performant way.
However we can there are some optimizations we can do:
1. we don't need ORDER BY if the join is from dependent to principal because there will always be at most one customer associated with a given order.
e.g. orders.Select(o => o.Customer.Name)
which translates to:
from o in Orders
join c in Customers on o.CustomerId equal c.Id into grouping
from c in grouping.DefaultIfEmpty()
select c.Name

In case of composite keys we need to check that entire key is being used as a join condition. Moreover, if there are additional elements in the join condition (on top of the FK->PK) we don't need to add order by either, because the grouping will be even more restrictive.

2. in case when we fully translate LOJ into SQL (which is now very often the case) we also don't need the extra ORDER BYs because we no longer perform streamed grouping on the client.
@maumar
Copy link
Contributor

maumar commented Nov 21, 2016

Fixed in 398dec4

@maumar maumar closed this as completed Nov 21, 2016
@maumar maumar added closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. type-bug type-enhancement area-perf and removed type-investigation type-bug labels Nov 21, 2016
@jemiller0
Copy link
Author

Thanks. I greatly appreciate it.

@jemiller0
Copy link
Author

I want to test this out. On the main page, it mentions, nightly builds and Myget. I've never used this before. What I'm wondering if I can add Myget to Visual Studio so that I can get the nightly builds off Nuget? Are there instructions somewhere for doing this?

@maumar
Copy link
Contributor

maumar commented Jan 13, 2017

You can use the following myget feed: https://dotnet.myget.org/gallery/aspnetcore-ci-dev

@maumar
Copy link
Contributor

maumar commented Jan 13, 2017

@jemiller0 yeah you should be able to add it directly to Visual Studio, rather than modify the nuget.config

@jemiller0
Copy link
Author

Thanks maumar. That's great that the dev builds are available like that. I just tried it out and got it working. I ran into a versioning issue though since I had the Npgsql provider included in my project. However, I commented that code out for the time being and have it working.

However, now, I'm running into a different problem. .StartsWith() filters are adding an extra CHARINDEX() call. EF 6 didn't do this. I'm assuming there's a reason for it. But, it looks like it's slowing down SQL Server's performance on large tables. I have a table with 7 million rows in it. With the CHARINDEX() call, the query takes 11 seconds to execute. Without it, it takes 1 second. I'm thinking it must be ignoring the index when CHARINDEX() is used.

WHERE [b].[updated_by] LIKE N'jemiller' + N'%' AND (CHARINDEX(N'jemiller', [b].[updated_by]) = 1)

@ajcvickers ajcvickers changed the title Query with projection and navigation properties adds extra sort columns in ORDER BY for foreign key columns Query: Query with projection and navigation properties adds extra sort columns in ORDER BY for foreign key columns May 9, 2017
@divega divega changed the title Query: Query with projection and navigation properties adds extra sort columns in ORDER BY for foreign key columns Query: Avoid adding unnecessary sort columns in query with projection and optional navigation properties May 10, 2017
@ajcvickers ajcvickers modified the milestones: 2.0.0-preview1, 2.0.0 Oct 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-perf closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. type-enhancement
Projects
None yet
Development

No branches or pull requests

5 participants