Undesirable composite ORDER BY clause when loading included data on a one-to-many relation #19571

ghost · 2020-01-11T23:50:31Z

Entity Framework is generating a composite ORDER BY clause when a query on one entity .Includes() an entity under a one-to-many relation and that related entity is loaded in the SELECT clause. The order by clause includes the primary keys of every single entity in the query as a whole, not just those under that relation. None of this happens if the relation is on a *-to-one relation. I haven't tested many-to-many.

This in combination with the change in EF 3.0 to utilize joins instead of multiple selects can have severe negative performance implications for queries with .Include() calls that worked well enough prior to EF 3.0. Not only does that mean there's extra load on the database server to do these orderings - the sql generated by EF 3.1 for one of our large queries executes in two minutes with the ORDER BY clause and 30 seconds without it - but also the database server isn't able to even start streaming results until it more or less has all the results assembled, leading to client-side timeouts.

I'm not sure if this is a bug or just undesirable behavior. I've found nothing documenting this on pages like https://docs.microsoft.com/en-us/ef/core/querying/related-data, and I've found no clear way to disable it or get around it, absent rewriting these queries or doing something ugly with a command interceptor, but I could have just missed a source of information. I totally understand the pivot to joins implies some things need to be rewritten, but this ORDER BY especially on every entity is excessive and mandates us to manually fine tune data loading where we were previously able to rely on EF for it, even when joins alone are fine.

Solutions in order of preference:

Don't introduce these order by clauses unless asked for in the query
Document this and provide some way to opt out of this behavior on a global or per-query basis.
Document this and don't order by other entitles' keys outside the one-to-many-relation
Document this behavior loudly and provide best practices to work around it

Steps to reproduce

See my gist with code and comments at https://gist.github.com/cwhitelaw/df7d12778966e16e0119a4ffce0a06dc?ts=4

Model:

public class MainEntity
{
	public int Id { get; set; }
	public int SingleRelatedEntityId { get; set; }
	public SingleRelatedEntity SingleRelatedEntity { get; set; }
	public List<ManyRelatedEntity> ManyRelatedEntities { get; set; }
}
public class SingleRelatedEntity
{
	public int Id { get; set; }
}
public class ManyRelatedEntity
{
	public int Id { get; set; }
	public int MainEntityId { get; set; }
	public MainEntity MainEntity { get; set; }
}

Output (the problem manifests in the queries labeled with main entities including to-many relation):

Query: main entities
SELECT [m].[Id], [m].[SingleRelatedEntityId]
FROM [MainEntities] AS [m]

Query: single-related entities
SELECT [s].[Id]
FROM [SingleRelatedEntities] AS [s]

Query: many-related entities
SELECT [m].[Id], [m].[MainEntityId]
FROM [ManyRelatedEntities] AS [m]

Query: main entities including to-one relation
SELECT [m].[Id], [m].[SingleRelatedEntityId], [s].[Id]
FROM [MainEntities] AS [m]
INNER JOIN [SingleRelatedEntities] AS [s] ON [m].[SingleRelatedEntityId] = [s].[Id]

Query: main entities including to-many relation
SELECT [m].[Id], [m].[SingleRelatedEntityId], [m0].[Id], [m0].[MainEntityId]
FROM [MainEntities] AS [m]
LEFT JOIN [ManyRelatedEntities] AS [m0] ON [m].[Id] = [m0].[MainEntityId]
ORDER BY [m].[Id], [m0].[Id]

Query: main entities including to-many relation, order by MainEntity.Id descending
SELECT [m].[Id], [m].[SingleRelatedEntityId], [m0].[Id], [m0].[MainEntityId]
FROM [MainEntities] AS [m]
LEFT JOIN [ManyRelatedEntities] AS [m0] ON [m].[Id] = [m0].[MainEntityId]
ORDER BY [m].[Id] DESC, [m0].[Id]

Query: main entities including both to-one and to-many relation
SELECT [m].[Id], [m].[SingleRelatedEntityId], [s].[Id], [m0].[Id], [m0].[MainEntityId]
FROM [MainEntities] AS [m]
INNER JOIN [SingleRelatedEntities] AS [s] ON [m].[SingleRelatedEntityId] = [s].[Id]
LEFT JOIN [ManyRelatedEntities] AS [m0] ON [m].[Id] = [m0].[MainEntityId]
ORDER BY [m].[Id], [s].[Id], [m0].[Id]

Query: many-related entities through MainEntity relation
SELECT [m0].[Id], [m0].[MainEntityId]
FROM [MainEntities] AS [m]
INNER JOIN [ManyRelatedEntities] AS [m0] ON [m].[Id] = [m0].[MainEntityId]

Query: many-related entities including their MainEntity
SELECT [m].[Id], [m].[MainEntityId], [m0].[Id], [m0].[SingleRelatedEntityId]
FROM [ManyRelatedEntities] AS [m]
INNER JOIN [MainEntities] AS [m0] ON [m].[MainEntityId] = [m0].[Id]

Further technical details

EF Core version: 3.1.0
Database provider: Microsoft.EntityFrameworkCore.SqlServer 3.1.0
Target framework: .NET Core 3.1
Operating system: Windows 10 x64
IDE: Visual Studio 2019

The text was updated successfully, but these errors were encountered:

ajcvickers · 2020-01-13T17:27:53Z

@roji @maumar Any thoughts on this?

roji · 2020-01-13T17:59:19Z

I remember doing some performance tests on ordering by primary keys some time ago (on PostgreSQL), and definitely did not see any difference close to 2 minutes vs. 30 seconds - but we should definitely look into this. I can do a quick perf investigation here.

@CWhitelaw this isn't exactly my domain, but to give some context... The ORDER By clause is added when materializing 1-to-many relations, so that the principal is grouped together while loading the dependent. If we don't have this, rows come at totally random order, and we must track all principals and look them up again on for each row. Having said that, if the ORDER BY necessarily triggers such a perf degradation I definitely agree we should re-examine this.

ghost · 2020-01-13T18:40:55Z

@roji Thanks, it had just occurred to me this morning that this might be the use case: you can read rows until you get a different PK column and know you're done reading that entity and yield it to whatever is consuming the IEnumerable. This is a compelling trade off to make especially for memory usage and I see the value. What I don't understand following that is, using my schema as an example, why does it also need to order by the ManyRelatedEntity id? The ordering on the just the MainEntityId seems like it would get the job done unless the ManyRelatedEntityId in turn were to include other data on a one to many relation, in which case we would in turn just additionally want to order on that Id and not also the id of whatever ID it is referencing.

For more context on the performance, the 2m -> 30s example came up on a query with a lot of recursive includes (7 deep), some being to-many relations and some being to-one, so it's one we are rewriting regardless for EF Core 3. It also has some internal inefficiencies of its own we will simplify but I'm sure there are legitimate cases for stuff like this especially for larger projects.

_dbContext.E1
    .Include(x => x.E2) // Single, never null
        .ThenInclude(x => x.E3s) // Many
            .ThenInclude(x => x.E4s) // Many (more or less a many-many join table, and ~10000s records)
                .ThenInclude(x => x.E5) // Single, never null (consequently, also ~10000s of records), 
                    .ThenInclude(x => x.E6) // Single, never null
                        .ThenInclude(x => x.E7) // Many, up to ~10 rows
    .Include(x => x.E2) // Single (same as previous include)
        .ThenInclude(x => x.E8) // Many, 0-100s of rows
    .FirstOrDefaultAsync(e => e.Id == oneId);

roji · 2020-01-13T18:53:09Z

What I don't understand following that is, using my schema as an example, why does it also need to order by the ManyRelatedEntity id?

That's a good observation and we've raised this internally - it seems like the last dependent entity does not need necessarily need an ORDER BY on its primary key.

However, from the rest of your comment, it seems like the performance degradation you're experiencing could be completely unrelated to the ORDER BY clauses, but rather a case of eager loading being done with a single query. In a nutshell, versions before 3.0 sent a separate query for each related entity, where since 3.0 EF Core always generates a single query - which can cause a so-called "cartesian explosion" if too many joins are present. There are ways to rewrite your query to return to the pre-3.0 multiple query mechanism: please see this comment.

For now I'm going to assume that any perf issue here is related to the single-query issue, rather than to the ORDER BY clause - could you please take a look and confirm?

ghost · 2020-01-13T19:29:12Z

Yes I'm aware of the single query change and what that implies, and we are rewriting some queries to that end. The comment you linked seemed to be doing something effectively the same as my initial steps toward that - that comment uses a FirstOrDefault to force a load while I was just using Load() the Db.Context.Entry.Collection, Db.Context.Entry.Reference, or .Query() queryables starting from either of those two.

However, just to clarify, that 30s -> 2min difference is not an EF 2.x to 3.1 change - it's predicated solely on the the ORDER BY clause on the same query generated by EF Core 3.1, so the joins themselves were only 30 seconds even for such a large query. So I think the ordering is still significant. But now that I see why ordering on the primary key of just one entity is desirable, I could test it against the minimal orderings rather than against none.

Also, there's still separately the issue of this case

				LoadQuery("main entities including both to-one and to-many relation",
					dbContext.MainEntities
						.Include(x => x.SingleRelatedEntity)
						.Include(x => x.ManyRelatedEntities));

which outputs an order by all three primary keys rather than just the two under the relation. So it seems like it could be a really ugly surprise that if you had something like this

dbContext.MainEntities
.Include(x => x.SingleRelatedEntity1)
.Include(x => x.SingleRelatedEntity2)
.Include(x => x.SingleRelatedEntity3)
    .ThenInclude(x => x.SingleRelatedEntity4)
.Include(x => x.SingleRelatedEntity5)

And then you added an additional one-to-many relation in that query somewhere, suddenly your query is ordering by 7 different IDs instead of 0.

One particularly rough aspect of this is when all the ordering forces the DB to assemble all the results before it can spit any back out (at least for SQL server and the case of the 2m query i reported), which really reduces the value that the ordering would provide in allowing EF to yield one entity at a time, and is the main problem when it comes to timeouts.

roji · 2020-01-13T19:47:10Z

However, just to clarify, that 30s -> 2min difference is not an EF 2.x to 3.1 change - it's predicated solely on the the ORDER BY clause on the same query generated by EF Core 3.1, so the joins themselves were only 30 seconds even for such a large query. So I think the ordering is still significant. But now that I see why ordering on the primary key of just one entity is desirable, I could test it against the minimal orderings rather than against none.

So to be sure I understand - can you please confirm you're seeing a 30s -> 2min difference on a single-join query (e.g. the query tagged "main entities including to-many relation" above), predicated solely on the existence or absence of the ORDER BY clause? If so, I'll definitely look into this.

ghost · 2020-01-13T20:13:59Z

That is correct. I grabbed the single statement SQL that EF Core 3.1 spit out for the query in my initial comment with all the .ThenIncludes, and just removed the order-by clause which was ordering by a bunch of primary key IDs.

For the example cases in the gist like "main entities including to-many relation", I only created those to generate SQL output to illustrate the order-by clauses, and I haven't done any load testing with data against that example schema or those queries in particular. I imagine results can vary greatly by the queries themselves and also the size of the tables, number of relations that that satisfy the query in those 1-many Include() calls, the DB provider, etc.

roji · 2020-01-13T21:53:01Z

Thanks, I'll look into this and will get back to you soon if I have any questions.

ghost · 2020-01-13T21:53:55Z

@roji I realized i might have misunderstood what you meant by "single-join query" and wanted to clarify:

This is the query in question where we saw drastic the performance difference.

_dbContext.E1
    .Include(x => x.E2) // Single, never null
        .ThenInclude(x => x.E3s) // Many
            .ThenInclude(x => x.E4s) // Many (more or less a many-many join table, and ~10000s records)
                .ThenInclude(x => x.E5) // Single, never null (consequently, also ~10000s of records), 
                    .ThenInclude(x => x.E6) // Single, never null
                        .ThenInclude(x => x.E7) // Many, up to ~10 rows
    .Include(x => x.E2) // Single (same as previous include)
        .ThenInclude(x => x.E8) // Many, 0-100s of rows
    .FirstOrDefaultAsync(e => e.Id == oneId);

It results in a single query, but with many joins and some subqueries, and an order-by on 8 fields. So it certainly has more than one join if that was the question at hand - sorry if I misunderstood.

I ran it again just to check - without the order-bys, results started coming back coming back within seconds and completing in 33s like before, and with the order-bys it took 2:58s this time and results don't start coming back until nearly the end of that time. The result set is 45862 rows.

I don't yet have any performance numbers for smaller items like the sample queries in the gist.

ghost · 2020-01-13T22:23:39Z

To my great surprise, there's exactly one ID that's causing the bottleneck here. It's E8's ID.

With all 8 order-by fields (ran this once just prior to make sure it's a warm run)
Results start being returned at 1:50
Query completes in 2:04

By simply removing E8's ID from the order by clause (leaving 7 other elements in the order-by critieria)
Results start being returned immediately
Query completes in 0:18

Without any order-by, the query also completes in 18 seconds.

I tried removing other fields from the order by without any significant change, yet this one was drastic.

Using azure data studio if that's relevant. It's an azure SQL database and I don't currently have a dedicated instance/environment for this so the results at different times are pretty variable but repeated runs at any given point in time seem pretty consistent.

This might be a good motivation to only be ordering on the left side of the one-to-many relation rather than both but I don't know how representative this case is. Hope this helps.

roji · 2020-01-13T22:58:38Z

Thanks for the added info, it's important to know that the degradation is (for now) seen on a query with 8 includes/joins. This also means that you may be able to work around the issue here by breaking the query into separate parts as described in #18022.

Out of curiosity, if E8 seems to be the cause of the issue, do you have the same problem when removing the entire first include chain (i.e. E2-E7)?

Also, is there any chance you can provide your (minimal!) model, with an idea of the data required to reproduce this? Also, any chance you could check if this occurs on a regular, on-premise SQL Server instance rather than Azure?

This might be a good motivation to only be ordering on the left side of the one-to-many relation rather than both but I don't know how representative this case is. Hope this helps.

I think it would be good to get to the bottom of this even though not ordering on the dependent side here may mitigate the issue you're specifically seeing.

roji · 2020-01-13T23:01:44Z

Also, if you could post the specific SQL being generated that could be helpful.

ghost · 2020-01-14T22:36:42Z

I haven't been able to get back to this yet, but is what I found when I left off: It doesn't happen with E8.Id alone, and I was able to isolate the behavior down to the order by on E8 being combined with one other particular key in that ThenInclude chain but I don't recall as of now which one.

The SQL had 3 subqueries and looked like this. Seemed like a pretty good query - it knew to grab just one row from the first select.

SELECT <a bunch of fields>
FROM (SELECT TOP 1 <fields> FROM <E1 join E2>)
LEFT JOIN (<subquery on E3 through E7>)
LEFT JOIN <E8>

I ran the explain plan on both and saw there was a Sort operation right at the end that accounted for 100% of the cost, and I can't see it anywhere in the plan if I just remove the E8.Id from that clause (presumably, other changes to the clause would induce the same change but i haven't yet verified). This was on a dev server, while on a production server the explain showed 50% due to the sort and 50% for "Parallelism (Gather Streams)", which i assume is just because the larger instance has at least another core on it. Without the complete order-by clause, most of the cost goes to a clustered index scan on E8, but the query as a whole is quite fast. I'm not extremely familiar with specifics of SQL Server explain plans, but I was surprised in that if that "sort" isn't doing the ordering, then what would be doing it?

What's more perplexing about this is that E8 has 4 entries in this case, and 0 of them are actually associated with the test case in question.

I'll come back when I'm able with a minified model and the actual SQL plus results against a local sql server dev edition installation, unless these details signal any potential answers.

roji · 2020-01-14T22:50:13Z

@CWhitelaw thanks for all the time and effort you're putting into this.

I will try to play around with a mock model tomorrow, to try and reproduce your findings - the first step here is to definitely get minimal repro, and then we can go into analyzing the plan and understanding exactly what effect the ORDER BY has and why it's degrading performance. Please let me know if you manage to do this on your side.

ghost · 2020-01-21T01:18:24Z

I've been pretty slammed but I think I have a minimal enough repro now, minus the data.

Gist: https://gist.github.com/cwhitelaw/7dc6d93a0a1025cd4d3a2569300a46c5?ts=4

You can see through the execution plan that despite having order by clauses everywhere, that sort comes in only when the query gets to including the E4s, while including everything except the E8s is no problem. The last query is effectively the original one I had posted about.

In the minimal case where we just go up to E4s, altering the order by clause to just order by E8 alone is enough to retain the sort operation, but I could have sworn that wasn't the case when i was testing on the real database either. Also notably the sort operation produces a lower estimated cost (~40%), but i'm betting that's the case because there's only enough columns here to support the relations and not actual data itself. It's also possible that the sort operation is a red herring but the correlation is pretty strong so far.

--Query: Baseline - E1 FirstOrDefault(ID=1) and its E8s
SELECT [t].[Id], [t].[E2Id], [e0].[Id], [e0].[E1Id]
FROM (
    SELECT TOP(1) [e].[Id], [e].[E2Id]
    FROM [E1] AS [e]
    WHERE [e].[Id] = 1
) AS [t]
LEFT JOIN [E8] AS [e0] ON [t].[Id] = [e0].[E1Id]
ORDER BY [t].[Id], [e0].[Id]

--Query: Include up to E2
SELECT [t].[Id], [t].[E2Id], [t].[Id0], [e1].[Id], [e1].[E1Id]
FROM (
    SELECT TOP(1) [e].[Id], [e].[E2Id], [e0].[Id] AS [Id0]
    FROM [E1] AS [e]
    INNER JOIN [E2] AS [e0] ON [e].[E2Id] = [e0].[Id]
    WHERE [e].[Id] = 1
) AS [t]
LEFT JOIN [E8] AS [e1] ON [t].[Id] = [e1].[E1Id]
ORDER BY [t].[Id], [t].[Id0], [e1].[Id]

--Query: Include up to E3s
SELECT [t].[Id], [t].[E2Id], [t].[Id0], [e1].[Id], [e1].[E2Id], [e2].[Id], [e2].[E1Id]
FROM (
    SELECT TOP(1) [e].[Id], [e].[E2Id], [e0].[Id] AS [Id0]
    FROM [E1] AS [e]
    INNER JOIN [E2] AS [e0] ON [e].[E2Id] = [e0].[Id]
    WHERE [e].[Id] = 1
) AS [t]
LEFT JOIN [E3] AS [e1] ON [t].[Id0] = [e1].[E2Id]
LEFT JOIN [E8] AS [e2] ON [t].[Id] = [e2].[E1Id]
ORDER BY [t].[Id], [t].[Id0], [e1].[Id], [e2].[Id]

--Query: Include up to E4s
SELECT [t].[Id], [t].[E2Id], [t].[Id0], [t0].[Id], [t0].[E2Id], [t0].[Id0], [t0].[E3Id], [t0].[E5Id], [e3].[Id], [e3].[E1Id]
FROM (
    SELECT TOP(1) [e].[Id], [e].[E2Id], [e0].[Id] AS [Id0]
    FROM [E1] AS [e]
    INNER JOIN [E2] AS [e0] ON [e].[E2Id] = [e0].[Id]
    WHERE [e].[Id] = 1
) AS [t]
LEFT JOIN (
    SELECT [e1].[Id], [e1].[E2Id], [e2].[Id] AS [Id0], [e2].[E3Id], [e2].[E5Id]
    FROM [E3] AS [e1]
    LEFT JOIN [E4] AS [e2] ON [e1].[Id] = [e2].[E3Id]
) AS [t0] ON [t].[Id0] = [t0].[E2Id]
LEFT JOIN [E8] AS [e3] ON [t].[Id] = [e3].[E1Id]
ORDER BY [t].[Id], [t].[Id0], [t0].[Id], [t0].[Id0], [e3].[Id]

--Query: Include everything
SELECT [t].[Id], [t].[E2Id], [t].[Id0], [t1].[Id], [t1].[E2Id], [t1].[Id0], [t1].[E3Id], [t1].[E5Id], [t1].[Id00], [t1].[E6Id], [t1].[Id1], [t1].[Id2], [t1].[E6Id0], [e6].[Id], [e6].[E1Id]
FROM (
    SELECT TOP(1) [e].[Id], [e].[E2Id], [e0].[Id] AS [Id0]
    FROM [E1] AS [e]
    INNER JOIN [E2] AS [e0] ON [e].[E2Id] = [e0].[Id]
    WHERE [e].[Id] = 1
) AS [t]
LEFT JOIN (
    SELECT [e1].[Id], [e1].[E2Id], [t0].[Id] AS [Id0], [t0].[E3Id], [t0].[E5Id], [t0].[Id0] AS [Id00], [t0].[E6Id], [t0].[Id1], [t0].[Id2], [t0].[E6Id0]
    FROM [E3] AS [e1]
    LEFT JOIN (
        SELECT [e2].[Id], [e2].[E3Id], [e2].[E5Id], [e3].[Id] AS [Id0], [e3].[E6Id], [e4].[Id] AS [Id1], [e5].[Id] AS [Id2], [e5].[E6Id] AS [E6Id0]
        FROM [E4] AS [e2]
        INNER JOIN [E5] AS [e3] ON [e2].[E5Id] = [e3].[Id]
        INNER JOIN [E6] AS [e4] ON [e3].[E6Id] = [e4].[Id]
        LEFT JOIN [E7] AS [e5] ON [e4].[Id] = [e5].[E6Id]
    ) AS [t0] ON [e1].[Id] = [t0].[E3Id]
) AS [t1] ON [t].[Id0] = [t1].[E2Id]
LEFT JOIN [E8] AS [e6] ON [t].[Id] = [e6].[E1Id]
ORDER BY [t].[Id], [t].[Id0], [t1].[Id], [t1].[Id0], [t1].[Id00], [t1].[Id1], [t1].[Id2], [e6].[Id]

--Query: Include everything except the E8s
SELECT [t].[Id], [t].[E2Id], [t].[Id0], [t1].[Id], [t1].[E2Id], [t1].[Id0], [t1].[E3Id], [t1].[E5Id], [t1].[Id00], [t1].[E6Id], [t1].[Id1], [t1].[Id2], [t1].[E6Id0]
FROM (
    SELECT TOP(1) [e].[Id], [e].[E2Id], [e0].[Id] AS [Id0]
    FROM [E1] AS [e]
    INNER JOIN [E2] AS [e0] ON [e].[E2Id] = [e0].[Id]
    WHERE [e].[Id] = 1
) AS [t]
LEFT JOIN (
    SELECT [e1].[Id], [e1].[E2Id], [t0].[Id] AS [Id0], [t0].[E3Id], [t0].[E5Id], [t0].[Id0] AS [Id00], [t0].[E6Id], [t0].[Id1], [t0].[Id2], [t0].[E6Id0]
    FROM [E3] AS [e1]
    LEFT JOIN (
        SELECT [e2].[Id], [e2].[E3Id], [e2].[E5Id], [e3].[Id] AS [Id0], [e3].[E6Id], [e4].[Id] AS [Id1], [e5].[Id] AS [Id2], [e5].[E6Id] AS [E6Id0]
        FROM [E4] AS [e2]
        INNER JOIN [E5] AS [e3] ON [e2].[E5Id] = [e3].[Id]
        INNER JOIN [E6] AS [e4] ON [e3].[E6Id] = [e4].[Id]
        LEFT JOIN [E7] AS [e5] ON [e4].[Id] = [e5].[E6Id]
    ) AS [t0] ON [e1].[Id] = [t0].[E3Id]
) AS [t1] ON [t].[Id0] = [t1].[E2Id]
ORDER BY [t].[Id], [t].[Id0], [t1].[Id], [t1].[Id0], [t1].[Id00], [t1].[Id1], [t1].[Id2]

roji · 2020-01-21T13:04:27Z

@CWhitelaw thanks for putting this together, I'll look into it in the next few days and investigate.

roji · 2020-01-23T23:30:41Z

@CWhitelaw I've hacked up some SQL scripts to load data into your tables, to repro the speed difference. I haven't yet had time to play around with actual values, but as you know you're data maybe you can help out and suggest the numbers that would reproduce the issue?

To make sure we're talking about the same thing, the idea is to run the following query:

SELECT [t].[Id], [t].[E2Id], [t].[Id0], [t1].[Id], [t1].[E2Id], [t1].[Id0], [t1].[E3Id], [t1].[E5Id], [t1].[Id00], [t1].[E6Id], [t1].[Id1], [t1].[Id2], [t1].[E6Id0], [e6].[Id], [e6].[E1Id]
FROM (
    SELECT TOP(1) [e].[Id], [e].[E2Id], [e0].[Id] AS [Id0]
    FROM [E1s] AS [e]
    INNER JOIN [E2s] AS [e0] ON [e].[E2Id] = [e0].[Id]
    WHERE [e].[Id] = 1
) AS [t]
LEFT JOIN (
    SELECT [e1].[Id], [e1].[E2Id], [t0].[Id] AS [Id0], [t0].[E3Id], [t0].[E5Id], [t0].[Id0] AS [Id00], [t0].[E6Id], [t0].[Id1], [t0].[Id2], [t0].[E6Id0]
    FROM [E3s] AS [e1]
    LEFT JOIN (
        SELECT [e2].[Id], [e2].[E3Id], [e2].[E5Id], [e3].[Id] AS [Id0], [e3].[E6Id], [e4].[Id] AS [Id1], [e5].[Id] AS [Id2], [e5].[E6Id] AS [E6Id0]
        FROM [E4s] AS [e2]
        INNER JOIN [E5s] AS [e3] ON [e2].[E5Id] = [e3].[Id]
        INNER JOIN [E6s] AS [e4] ON [e3].[E6Id] = [e4].[Id]
        LEFT JOIN [E7s] AS [e5] ON [e4].[Id] = [e5].[E6Id]
    ) AS [t0] ON [e1].[Id] = [t0].[E3Id]
) AS [t1] ON [t].[Id0] = [t1].[E2Id]
LEFT JOIN [E8s] AS [e6] ON [t].[Id] = [e6].[E1Id]
ORDER BY [t].[Id], [t].[Id0], [t1].[Id], [t1].[Id0], [t1].[Id00], [t1].[Id1], [t1].[Id2], [e6].[Id];

... once with the final ORDER BY, and once without it, and see significant perf differences.

SQL to generate data:

DECLARE @e2s INT = 1000; -- e1s and e2s
DECLARE @e3s INT = 20;   -- per e2
DECLARE @e5s INT = 20;   -- per e6
DECLARE @e6s INT = 1000; -- e6s
DECLARE @e7s INT = 10;   -- per e6
DECLARE @e8s INT = 10;  -- per e1
DECLARE @e4s INT = @e2s * @e3s; -- between e3 and e5

-- E1+E2
DECLARE @e2 INT = 1;
WHILE @e2 <= @e2s
BEGIN
    INSERT INTO [E2s] DEFAULT VALUES;
    INSERT INTO [E1s] ([E2Id]) VALUES (@e2);

    -- E3
    DECLARE @e3 INT = 1;
    WHILE @e3 <= @e3s
    BEGIN
        INSERT INTO [E3s] ([E2Id]) VALUES (@e2);
        SET @e3 = @e3 + 1;
    END;
        
    -- E8
    DECLARE @e8 INT = 1;
    WHILE @e8 <= @e8s
    BEGIN
        INSERT INTO [E8s] ([E1Id]) VALUES (@e2);
        SET @e8 = @e8 + 1;
    END;

    SET @e2 = @e2 + 1;
END;

-- E6
DECLARE @e6 INT = 1;
WHILE @e6 <= @e6s
BEGIN
    INSERT INTO [E6s] DEFAULT VALUES;

    -- E7
    DECLARE @e7 INT = 1;
    WHILE @e7 <= @e7s
    BEGIN
        INSERT INTO [E7s] ([E6Id]) VALUES (@e6);
        SET @e7 = @e7 + 1;
    END;

    -- E5
    DECLARE @e5 INT = 1;
    WHILE @e5 <= @e5s
    BEGIN
        INSERT INTO [E5s] ([E6Id]) VALUES (@e6);
        SET @e5 = @e5 + 1;
    END;

    SET @e6 = @e6 + 1;
END;
    
-- E4 (many-to-many between E3 and E5)
DECLARE @e4 INT = 1;
WHILE @e4 <= @e4s
BEGIN
    INSERT INTO [E4s] ([E3Id], [E5Id]) VALUES (@e4, @e4);
    SET @e4 = @e4 + 1;
END;

-- DELETE DATA
DELETE FROM [E8s] WHERE 1=1;
DELETE FROM [E1s] WHERE 1=1;
DELETE FROM [E3s] WHERE 1=1;
DELETE FROM [E2s] WHERE 1=1;
DELETE FROM [E5s] WHERE 1=1;
DELETE FROM [E7s] WHERE 1=1;
DELETE FROM [E6s] WHERE 1=1;
DELETE FROM [E4s] WHERE 1=1;
DBCC CHECKIDENT ('E8s', RESEED, 0);
DBCC CHECKIDENT ('E1s', RESEED, 0);
DBCC CHECKIDENT ('E2s', RESEED, 0);
DBCC CHECKIDENT ('E3s', RESEED, 0);
DBCC CHECKIDENT ('E5s', RESEED, 0);
DBCC CHECKIDENT ('E6s', RESEED, 0);
DBCC CHECKIDENT ('E7s', RESEED, 0);
DBCC CHECKIDENT ('E4s', RESEED, 0);

Thanks again for your help.

ghost · 2020-01-27T16:10:32Z

Here's some rough numbers on table size

e1 300
e2 300
e3 75000
e4 2500000
e5 2500000
e6 1500
e7 5000
e8 5

I think there might be a mismatch in how some of the entities are generated so here's a rough outline how I would think to do it from scratch.

create 300 e1s

create 5 e8s, each associated with any distinct e1 (there are barely any entries in this table, despite its impact on the query)

create 300 e2s, one per e1
create 250 e3s per e2, associated with that e2

create 1500 e6s
create 3 E7s per e6 associated with that e6

for each e3:
  choose an arbitrary e6
  do 30 times:
    create an e5 associated with that e6
    create an e4 associated with that e3 and that e5

That should arrive at approximately those numbers.

ghost · 2020-02-04T21:28:07Z

Checking in - any findings etc. on this? I see this issue got added to a milestone yesterday.

roji · 2020-02-05T10:01:55Z

@CWhitelaw no specific findings, but only because I haven't had time to look into it. It's in the milestone which means we definitely intend to investigate this for 5.0.

@smitpatel will also be interested in this.

smitpatel · 2020-02-07T00:52:02Z

Last ordering term is not necessary for result computation. Though it is hard to keep track & add/remove in code complexity. We can do that if there is compelling evidence that it causes huge perf penalty. Apart from that, nothing new from #18022

Stasinevich · 2020-02-07T06:22:59Z

Hi

I've encountered this issue in the current week. Even on 4 Includes (joins) I got a composite 'order by' clause. The performance penalty sometimes is huge - for instance query with single/double fields in order by is executed within 2 secs, whether with 8+ it takes more than 2 minutes. Also, SQL server grants a lot more memory for such SQL query plans.

I have found an only single workaround for now - split queries

roji · 2020-02-07T14:03:22Z

Apart from that, nothing new from #18022

The new thing for me is that the slowdown in this case comes specifically from the ORDER BY clause - the same query without the clause performs significantly better (2 minutes vs. 30 seconds). This is more specific than general perf degradation caused by cartesian explosion (I do wonder how much of the complaints in #18022 come from the ORDER BY and how much from other factors).

Opened #19828 to specifically track removing the last ORDER BY.

Some general thoughts on this (@smitpatel and @ajcvickers can correct any inaccuracies, am new to this domain):

The ORDER BYs are necessary in order to have streaming behavior (otherwise related entities are located in random places), which is important for large resultsets.
However, it could be claimed that streaming is meaningless for tracking queries, since materialized entities are referenced from the change tracker anyway. Would it make sense to consider dropping the ORDER BYs from tracked queries, and just identity resolution?
For non-tracking queries, streaming makes more sense (as materialized entities aren't referenced by EF), and so ORDER BY seems necessary. In theory we could allow the user to express whether streaming is required or not, and take this into account. This would also require implementing identity resolution for non-streaming, non-tracked queries, making them a bit more similar to tracked queries. Not sure this is worth the effort.
I'm not sure what the exact current behavior is for queries which don't return entities at all - I think I'm seeing ORDER BY although it shouldn't be necessary?

roji · 2020-02-26T19:29:00Z

Opened #20076 to track the removal of ORDER BY for buffered tracked queries. Between that, #19828 (remove last ORDER BY) and #18022 ("easy API" to split queries), I don't think there's anything more to do here - am closing this issue.

@CWhitelaw at this point, the right thing to do is to manually split the query as described in #18022 (and to switch to the "easy API" once it's introduced). If, after splitting the query, you still encounter a scenario where an ORDER BY generates a significant perf degradation, please post a repro for that to #20076 - that would help us evaluate the necessity of doing that.

ghost · 2020-02-26T22:00:29Z

Glad to hear these are under consideration and yes that's the road we're going down at this point. I hope these proposals make it through for EF 5.

ghost added the type-bug label Jan 11, 2020

roji added the area-perf label Jan 13, 2020

ajcvickers added this to the 5.0.0 milestone Feb 3, 2020

ajcvickers assigned roji Feb 3, 2020

roji mentioned this issue Feb 7, 2020

Remove unnecessary *last column* in ORDER BY when joining for collection #19828

Closed

smitpatel removed this from the 5.0.0 milestone Feb 7, 2020

ajcvickers added this to the 5.0.0 milestone Feb 10, 2020

ajcvickers added the needs-design label Feb 10, 2020

roji mentioned this issue Feb 26, 2020

Adjust EF behavior based on query buffering vs. streaming #20076

Open

roji closed this as completed Feb 26, 2020

roji removed this from the 5.0.0 milestone Feb 26, 2020

roji added closed-by-design and removed needs-design type-bug labels Feb 26, 2020

ajcvickers added the customer-reported label Mar 10, 2020

roji mentioned this issue Jul 29, 2020

Significant Query Slowdown When Using Multiple Joins Due To Changes In 3.0 #18022

Closed

roji mentioned this issue Sep 2, 2020

Include() adds unnecessary ORDER BY #22369

Closed

roji mentioned this issue Feb 23, 2021

Query with nested projections result in a expensive ORDER BY #24239

Closed

roji mentioned this issue Aug 26, 2021

Performance issue when using multiple includes and QueryFilters #25731

Closed

roji mentioned this issue Dec 16, 2021

EF Core creates SQL query with ORDER BY at the end massively impacting performance for cases with many included collections #27026

Closed

roji mentioned this issue Apr 4, 2022

Disable automatic 'order by' in query to decrease execution time #27746

Closed

roji mentioned this issue Sep 20, 2022

Investigate single query related entity loading and orderings #29171

Open

ajcvickers reopened this Oct 16, 2022

ajcvickers closed this as not planned Won't fix, can't repro, duplicate, stale Oct 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Undesirable composite ORDER BY clause when loading included data on a one-to-many relation #19571

Undesirable composite ORDER BY clause when loading included data on a one-to-many relation #19571

ghost commented Jan 11, 2020 •

edited by ghost

Loading

ajcvickers commented Jan 13, 2020

roji commented Jan 13, 2020

ghost commented Jan 13, 2020

roji commented Jan 13, 2020

ghost commented Jan 13, 2020 •

edited by ghost

Loading

roji commented Jan 13, 2020

ghost commented Jan 13, 2020

roji commented Jan 13, 2020

ghost commented Jan 13, 2020 •

edited by ghost

Loading

ghost commented Jan 13, 2020

roji commented Jan 13, 2020

roji commented Jan 13, 2020

ghost commented Jan 14, 2020

roji commented Jan 14, 2020

ghost commented Jan 21, 2020

roji commented Jan 21, 2020

roji commented Jan 23, 2020

ghost commented Jan 27, 2020

ghost commented Feb 4, 2020

roji commented Feb 5, 2020

smitpatel commented Feb 7, 2020

Stasinevich commented Feb 7, 2020 •

edited

Loading

roji commented Feb 7, 2020

roji commented Feb 26, 2020

ghost commented Feb 26, 2020

Undesirable composite ORDER BY clause when loading included data on a one-to-many relation #19571

Undesirable composite ORDER BY clause when loading included data on a one-to-many relation #19571

Comments

ghost commented Jan 11, 2020 • edited by ghost Loading

Steps to reproduce

Further technical details

ajcvickers commented Jan 13, 2020

roji commented Jan 13, 2020

ghost commented Jan 13, 2020

roji commented Jan 13, 2020

ghost commented Jan 13, 2020 • edited by ghost Loading

roji commented Jan 13, 2020

ghost commented Jan 13, 2020

roji commented Jan 13, 2020

ghost commented Jan 13, 2020 • edited by ghost Loading

ghost commented Jan 13, 2020

roji commented Jan 13, 2020

roji commented Jan 13, 2020

ghost commented Jan 14, 2020

roji commented Jan 14, 2020

ghost commented Jan 21, 2020

roji commented Jan 21, 2020

roji commented Jan 23, 2020

ghost commented Jan 27, 2020

ghost commented Feb 4, 2020

roji commented Feb 5, 2020

smitpatel commented Feb 7, 2020

Stasinevich commented Feb 7, 2020 • edited Loading

roji commented Feb 7, 2020

roji commented Feb 26, 2020

ghost commented Feb 26, 2020

ghost commented Jan 11, 2020 •

edited by ghost

Loading

ghost commented Jan 13, 2020 •

edited by ghost

Loading

ghost commented Jan 13, 2020 •

edited by ghost

Loading

Stasinevich commented Feb 7, 2020 •

edited

Loading