Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant Query Slowdown When Using Multiple Joins Due To Changes In 3.0 #18022

Closed
dragnilar opened this issue Sep 24, 2019 · 242 comments
Closed
Labels
closed-no-further-action The issue is closed and no further action is planned. customer-reported

Comments

@dragnilar
Copy link

dragnilar commented Sep 24, 2019

Note:

  • Issue Relational split query mode for collection includes #20892 has been created to track actual work on split queries in EF Core 5.0. The new issue is locked so that it can be subscribed to for updates on the implementation without noise.
  • This older issue has been re-purposed to be discussion about different approaches, etc. It will not be locked because it is important to us that we don’t shut down channels of communication with the community.

Preview6 introduces AsSplitQuery API to load collections using multiple queries. Details of the API has been posted #20892 (comment)


A way to rewrite the query to avoid large result set & mimick previous version behavior is posted https://gist.github.com/smitpatel/d4cb3619e5b33e8d9ea24d3f2a88333a


Note: Per @roji's request, I am opening this in response to the comments on #17455

I have a very large query that contains about 57 includes that is being used to copy down a large entity so that it can be modified and cloned.

With EF 2.2.6, this large query ran successfully in about 1-3 seconds (variable). With the changes in 3.0 (all includes create one entire SQL statement with joins), the query takes significantly longer and always times out with the default execution timeout settings.

Steps to reproduce

Use a (IMO nasty) Linq query similar to the one as follows:

dbContext.Projects
	.Include(z => z.Financial)
	.Include(z => z.ProjectProtocol)
	.Include(z => z.ReportCategories)
	.Include(z => z.ReportSubCategories)
	.Include(x => x.SubProjects).ThenInclude(y => y.Address)
	.Include(x => x.SubProjects).ThenInclude(y => y.LegacyAddress)
	.Include(x => x.SubProjects).ThenInclude(z => z.BuildingTypes)
	.Include(x => x.SubProjects).ThenInclude(z => z.Buildings).ThenInclude(b => b.BuildingType)
	.Include(x => x.SubProjects).ThenInclude(z => z.Buildings).ThenInclude(b => b.Site)
	.Include(x => x.SubProjects).ThenInclude(z => z.Sites).ThenInclude(s => s.Address)
	.Include(x => x.SubProjects).ThenInclude(z => z.Participants).ThenInclude(p => p.Address)
	.Include(x => x.SubProjects).ThenInclude(z => z.ExcelFileJson)
	.Include(x => x.SubProjects).ThenInclude(z => z.CompanionAddress)
	.Include(x => x.SubProjects).ThenInclude(z => z.UtilityTypes)
	.Include(x => x.SubProjects).ThenInclude(z => z.InspectedUnits).ThenInclude(i => i.Building)
	.Include(x => x.SubProjects).ThenInclude(z => z.InspectedUnits).ThenInclude(i => i.UnitType)
	.Include(x => x.SubProjects).ThenInclude(z => z.Utilities).ThenInclude(i => i.UtilityType)
	.Include(x => x.SubProjects).ThenInclude(z => z.Units).ThenInclude(i => i.Building)
	.Include(x => x.SubProjects).ThenInclude(z => z.UnitTypes)
	.Include(x => x.SubProjects).ThenInclude(z => z.CommonAreas).ThenInclude(ca => ca.Building)
	.Include(x => x.SubProjects).ThenInclude(z => z.FixtureAreas).ThenInclude(w => w.Fixtures)
	.Include(x => x.SubProjects).ThenInclude(y => y.LineItems).ThenInclude(z => z.EnergyOneItems)
	.Include(x => x.SubProjects).ThenInclude(y => y.LineItems).ThenInclude(z => z.EnergyTwoItems)
	.Include(x => x.SubProjects).ThenInclude(y => y.LineItems)
	.ThenInclude(z => z.TraditionalItem).AsNoTracking().FirstOrDefault(x => x.Id == project.Id);

Results:

With EF Core 2.2.6 - I can see in the output via the SQL Server Profiler that EF is breaking up the LINQ statement into smaller queries. The overall process takes about 1-3 seconds.

With EF Core 3.0 - I can see in the output via the SQL Server Profiler that EF is emitting one massive query with lots and lots of joins. The overall process always times out with the default execution timeout setting.

At this point, I am open to the notion that this query needs to be either re-written or the process needs to be changed for handling the cloning. I would still like to hear if there are any workarounds, findings that this is a bug or other suggestions to avoid having to devote a significant amount of time on rewriting.

Edit
For now we worked around this by splitting the query up manually using EF Plus' "Includes Optimized" method and then looping through the change tracker to set all of the entities as untracked so we can then reset their keys so that the graph can be comitted as a clone (this gave me a flashback to my EF 6 days).

Note: The model changed somewhat between the time this issue was first encountered and now due to user requests and other factors. I should also note that the system is now in production and users are pretty happy with the performance.

            var clone = dbContext.Projects
                .IncludeOptimized(z => z.Financial)
                .IncludeOptimized(z => z.ProjectProtocol)
                .IncludeOptimized(z => z.ReportCategories)
                .IncludeOptimized(z => z.ReportSubCategories)
                .IncludeOptimized(z => z.SubProject)
                .IncludeOptimized(z => z.SubProject.ValidationFlag)
                .IncludeOptimized(z => z.SubProject.ExcelFileJson)
                .IncludeOptimized(z => z.SubProject.EnergyAuditUtilities)
                .IncludeOptimized(z => z.SubProject.EnergyAuditData)
                .IncludeOptimized(z => z.SubProject.EnergyAuditData.EnergyAuditAreas)
                .IncludeOptimized(z => z.SubProject.Address)
                .IncludeOptimized(z => z.SubProject.Utilities)
                .IncludeOptimized(z => z.SubProject.UtilityTypes)
                .IncludeOptimized(z => z.SubProject.Units)
                .IncludeOptimized(z => z.SubProject.Units.Select(y => y.Building))
                .IncludeOptimized(z => z.SubProject.BuildingTypes)
                .IncludeOptimized(z => z.SubProject.Buildings)
                .IncludeOptimized(z => z.SubProject.Buildings.Select(y => y.BuildingType))
                .IncludeOptimized(z => z.SubProject.Buildings.Select(y => y.Site))
                .IncludeOptimized(z => z.SubProject.Sites)
                .IncludeOptimized(z => z.SubProject.Sites.Select(y => y.Address))
                .IncludeOptimized(z => z.SubProject.Participants)
                .IncludeOptimized(z => z.SubProject.Participants.Select(y => y.Address))
                .IncludeOptimized(z => z.SubProject.InspectedUnits)
                .IncludeOptimized(z => z.SubProject.InspectedUnits.Select(y => y.Building))
                .IncludeOptimized(z => z.SubProject.InspectedUnits.Select(y => y.UnitType))
                .IncludeOptimized(z => z.SubProject.UnitTypes)
                .IncludeOptimized(z => z.SubProject.CommonAreas)
                .IncludeOptimized(z => z.SubProject.CommonAreas.Select(y => y.Building))
                .IncludeOptimized(z => z.SubProject.CommonAreas.Select(y => y.Building).Select(z => z.Units))
                .IncludeOptimized(z => z.SubProject.FixtureAreas)
                .IncludeOptimized(z => z.SubProject.FixtureAreas.Select(y => y.Fixtures))
                .IncludeOptimized(x => x.SubProject.LineItems)
                .IncludeOptimized(x => x.SubProject.LineItems.Select(y => y.EnergyOneItem))
                .IncludeOptimized(x => x.SubProject.LineItems.Select(y => y.EnergyTwoTwoItem))
                .IncludeOptimized(x => x.SubProject.LineItems.Select(y => y.TraditionalItem))
                .FirstOrDefault(x => x.Id == project.Id);

            if (clone != null)
            {
                foreach (var entityEntry in dbContext.ChangeTracker.Entries())
                {
                    if (entityEntry.Entity != null)
                    {
                        entityEntry.State = EntityState.Detached;
                    }
                }
                return clone;
            }

My team was struggling with this at first due to the fact that EF was at first wiping out the entities when we detatched them due to an issue on #18982. Using the work around that was posted there allowed for things to work. The overall performance is actually better due to the fact there isn't any client side evaluation. However, I would still prefer if the behavior of Includes Optimized (which is pretty much what EF 2.x did, splitting the query) was something that came out of the box with EF Core. I also do not like it how this entire scenario can no longer be done with a no tracking query (or so it seems), as it was possible to do it before with EF Core 2.x.

Further technical details

EF Core version: 3.0.0
Database provider: Microsoft.EntityFramework.SqlServer
Target framework: .Net Core 3.0
Operating system: Windows 10 x64 / Windows Server 2016
IDE: Visual Studio 2019 Pro (16.3)

@KevinMallinson
Copy link

This also happens with me: #18017

It seems it's happening to quite a few people. I know my query sucks, as do you in OP, but it would be a shame to have to revert an upgrade due this.

@dragnilar
Copy link
Author

@KevinMallinson I'm not surprised, it's a pretty significant change. I was hoping when I first read about it, that there would be some way to opt-out of this new behaviour, but unfortunately, that doesn't seem to be the case.

@divega
Copy link
Contributor

divega commented Sep 24, 2019

Some observations:

  • The main source of redundancy in a query like (and the higher potential for improvement if you decide to refactor the queries in your application) this comes from the traversal of collection navigation properties with high cardinality. So we are not talking about having to split a query into ~57 separate queries, but about judiciously partition it where it matters.
  • As @smitpatel implied on Slow compilation when using many .ThenInclude()'s #17455 (comment), the best case scenario for the multiple query approach taken in 2.2 is that some of the intermediary results will come out empty, in which case not all paths need to be traversed and hence we end up not executing all queries.
  • However, in the worst case scenario for the multiple query approach, there is data on all paths and several of the queries generated will have similar complexity to the one query we generate in 3.0. There is no data redundancy in the results, but every table on each path needs to be joined and predicates applied several times.
  • Although the approach we take in 3.0 is similar to EF6 and is relatively well understood, there might be still things we can do to improve the performance that we haven't investigated. I think there is a small chance that some profiling of the queries and trying different combinations may help get better results.
  • I expect that we will keep our new design to generate a single SQL query from each LINQ query because of all the benefits, but to mitigate cases like this we can consider providing API to explicitly generate and execute all the queries. I can see a spectrum of solutions, ranging from a utility method that takes the original query and does this automatically, to some extension method that simplifies creating a query for a related data based on the original query.

@dragnilar
Copy link
Author

@divega Having an API or extension to use the previous behavior would be good for those who don't have the bandwidth to rewrite large queries like this one. I personally would like to rewrite mine since it's inefficient and cumbersome to read, but as it stands I'll have to remain on 2.2.6 for a while longer. I'm sure others are in the same boat.

As far as refactoring it goes, I seldom use explicit loading, so excuse my unfamiliarity with it, but would I be able to use that in tandem with No Tracking to be able to obtain a copy of the entity that can have its keys changed before saving it again?

@divega
Copy link
Contributor

divega commented Sep 24, 2019

@dragnilar you just made me realize there are two interesting points I didn't think of:

  • Any workaround that relies on explicit loading or executing multiple queries would require fixup behavior that only happens for tracked queries.

  • It is possible that some of the negative performance impact of 3.0 comes from the fact that we no longer perform identity resolution in non-tracked queries, which should result in many more objects being created. There are two things you could measure that may give us a hint here:

    1. Execute the original LINQ query with tracking enabled
    2. Execute the generated SQL directly in SQL Management Studio

@dragnilar
Copy link
Author

dragnilar commented Sep 24, 2019

@divega

Now I remember now that I needed the fixup behaviour, which was why I couldn't use explicit loading in the first place for this particular query.

I tried out what you suggested:

Running the query through EF with tracking enabled has it finish successfully. It takes about ~25 seconds on average.

Running the generated SQL (I obtained it via the profiler) in SSMS takes almost 2 minutes to finish.

The generated SQL is pretty large in this case, but it's not as large as when I have tracking turned off.

@Shane32
Copy link

Shane32 commented Sep 25, 2019

I am very glad of the new behavior. Many times I had written a query expecting it to run in a single call, when it makes multiple roundtrips to the database to retrieve what I know to be very small amounts of data. Now it is in my hands as a developer whether to make one call to SQL for all the joins, or make separate requests to the database to retrieve all the pertinent information. Rewriting a few complex queries seems a small price to pay for the benefits gained in the majority of queries, at least certainly going forwards.

Ideally, we could provide SQL-compilation hints to the query, where it would run those queries separately. Or maybe opt-in behavior for the query as a whole. However, I never found EF 2.2 to produce very efficient queries in this scenario, so I would hope for more optimized behavior.

@dragnilar
Copy link
Author

dragnilar commented Sep 25, 2019

@Shane32 I agree it's better that it adhire to what is expected but at the time it was the seemingly recommended approach based on the documentation provided and what other resources recommended for the cloning scenario. Im fine with rewriting it, but as I said in the OP, I'd rather explore all options first before committing to a rewrite. I'm not sure what others will say, for me it's a pain, but not a super huge impact on my team since my code base uses pretty simple queries for all other operations.

@kakone
Copy link

kakone commented Sep 26, 2019

we can consider providing API to explicitly generate and execute all the queries.

Please, provide an API because this behavior change is very problematic. We need to rewrite a lot of our queries in EF Core 3.
I can understand this change but the problem is that I don't see how I can easily go back to the previous behavior. I can enable tracking but I must write a lot of queries to do a ThenInclude(...).ThenInclude(...).ThenInclude(...).ThenInclude(...).

@erikmf12
Copy link

erikmf12 commented Sep 26, 2019

I would like to chime in and say that we are also having this issue, and is very problematic for us. An .AsMultipleSqlStatements() extension method would suffice for those of us who need faster execution.

@DoughtCom
Copy link

Same here, seeing how the multiple queries returned their own subset of results then were consolidated in the framework that resulted in 100s of results. My query is fairly basic, it's really just: Get me products by brand, their images, their options and those option's values.

Example:
await _context.Products.Include("Images").Include("Options.Values").ToListAsync();

To my surprise, when I opened SQL Inspector when I started getting timeouts, I found out it was returning 150,000+ results when running the command manually.

@erikmf12
Copy link

erikmf12 commented Sep 27, 2019

For anyone wondering how to 'hack' your way around this to get 2.2 speed on 3.0, follow the idea on this comment. Basically, you define an initial query to filter your results to the entity you want, and then create dummy variables with .includes to include the related entities you need.

Important: I have only tested if this returns the correct data, nothing else. If you need to do something more than reading/displaying, make sure you test first.

For example, I went from this:

var equipment = await _context.Equipment.Include(x => x.Group)
				.Include(x => x.SystemInfo).ThenInclude(x => x.SystemUsers)
				.Include(x => x.SystemInfo).ThenInclude(x => x.Frameworks)
				.Include(x => x.SystemInfo).ThenInclude(x => x.VideoCards)
				.Include(x => x.SystemInfo).ThenInclude(x => x.StorageDrives)
				.Include(x => x.SystemInfo).ThenInclude(x => x.Software)
				.Include(x => x.SystemInfo).ThenInclude(x => x.NetworkAdapters)
				.Include(x => x.SystemInfo).ThenInclude(x => x.Printers)
				.Include(x => x.Parts).ThenInclude(x => x.ChildrenParts)
				.Include(x => x.Parts).ThenInclude(x => x.ParentParts)
				.Include(x => x.Parts).ThenInclude(x => x.Vendor)
				.Include(x => x.MaintenanceHours)
				.Include(x => x.Attachments)
				.Include(x => x.Notes)
				.Include(x => x.Status)
				.Include(x => x.Area)
				.Include(x => x.EquipmentType)
				.Include(x => x.Department)
				.Include(x => x.PMaintenance)
				.Include(x => x.Request)
				.FirstOrDefaultAsync(x => x.EquipmentId == id);

to this:

var equip = _context.Equipment.Include(x => x.Group).Where(x => x.EquipmentId == id);
var equipment = await equip.Include(x => x.MaintenanceHours)
				.Include(x => x.Attachments)
				.Include(x => x.Notes)
				.Include(x => x.Status)
				.Include(x => x.Area)
				.Include(x => x.EquipmentType)
				.Include(x => x.Department)
				.Include(x => x.PMaintenance)
				.Include(x => x.Request).FirstOrDefaultAsync();

			var d2 = await equip.Include(x => x.SystemInfo).ThenInclude(x => x.SystemUsers).FirstOrDefaultAsync();
			var d3 = await equip.Include(x => x.SystemInfo).ThenInclude(x => x.Frameworks).FirstOrDefaultAsync();
			var d4 = await equip.Include(x => x.SystemInfo).ThenInclude(x => x.VideoCards).FirstOrDefaultAsync();
			var d5 = await equip.Include(x => x.SystemInfo).ThenInclude(x => x.StorageDrives).FirstOrDefaultAsync();
			var d6 = await equip.Include(x => x.SystemInfo).ThenInclude(x => x.Software).FirstOrDefaultAsync();
			var d7 = await equip.Include(x => x.SystemInfo).ThenInclude(x => x.NetworkAdapters).FirstOrDefaultAsync();
			var d8 = await equip.Include(x => x.SystemInfo).ThenInclude(x => x.Printers).FirstOrDefaultAsync();
			var d9 = await equip.Include(x => x.Parts).ThenInclude(x => x.ChildrenParts).FirstOrDefaultAsync();
			var d10 = await equip.Include(x => x.Parts).ThenInclude(x => x.ParentParts).FirstOrDefaultAsync();
			var d11 = await equip.Include(x => x.Parts).ThenInclude(x => x.Vendor).FirstOrDefaultAsync();

You should only need to do this for the entities that need a .ThenInclude(). It's not the best at all, but it allows me to still use .NET Core 3.0 without having to sacrifice load times.

@dragnilar
Copy link
Author

@erikmf12 not sure if this is a full work around. I think we tried something like that and ran into a problem with the entity not having been fixed up since we need it to be non tracked so we can reset the primary keys before recomitting the entity back to the database as a copy of the original. Currently we're considering just writing a manual clone.

@erikmf12
Copy link

@dragnilar True, I have not tested anything besides making sure all the data is included. In my case, I just need to display the data on a detail page.

@DoughtCom
Copy link

Figured I would reference the original change request for this feature on here so it would be tracked back on that request.

#12098

@Shane32
Copy link

Shane32 commented Sep 27, 2019

@dragnilar Correct, my workaround only works with tracked entities. You should be able to write a snippet to untrack ALL entities from a dbcontext, if that works for you. Something like this: (not sure if this is a EF6 or EF Core snippet)

public void DetachAllEntities()
{
    var changedEntriesCopy = this.ChangeTracker.Entries()
        .Where(e => e.State == EntityState.Added ||
                    e.State == EntityState.Modified ||
                    e.State == EntityState.Deleted)
        .ToList();

    foreach (var entry in changedEntriesCopy)
        entry.State = EntityState.Detached;
}

If that works for you, you can retrieve the data with tracking, then detach all tracking, then use your existing code to clone the data.

You could probably also write code to inspect any entity object, and utilize the IModel to recursively iterate through all of its navigation members, creating a list of all related entities, then detach them all. Should be relatively easy to do. With a little extra work, it could reset primary keys and foreign keys, which may be necessary as well.

@dragnilar
Copy link
Author

dragnilar commented Sep 27, 2019

@dragnilar Correct, my workaround only works with tracked entities. You should be able to write a snippet to untrack ALL entities from a dbcontext, if that works for you. Something like this: (not sure if this is a EF6 or EF Core snippet)

public void DetachAllEntities()
{
    var changedEntriesCopy = this.ChangeTracker.Entries()
        .Where(e => e.State == EntityState.Added ||
                    e.State == EntityState.Modified ||
                    e.State == EntityState.Deleted)
        .ToList();

    foreach (var entry in changedEntriesCopy)
        entry.State = EntityState.Detached;
}

If that works for you, you can retrieve the data with tracking, then detach all tracking, then use your existing code to clone the data.

You could probably also write code to inspect any entity object, and utilize the IModel to recursively iterate through all of its navigation members, creating a list of all related entities, then detach them all. Should be relatively easy to do. With a little extra work, it could reset primary keys and foreign keys, which may be necessary as well.

Edit: I was probably being too polite. In actuality, we did try this first when we saw what the problem was, since it was an old trick we all had learned back in our EF6 days. Unfortunately it did not work due to a bug (see other comments I made). When we finally found out about the bug and the recommended work around though, the whole thing worked. Hence, the advice was appreciated though. However your remark about IModel did not mean anything to me. Also, I could not find any significant documentation of it besides some vague statements over at MSDN. 😑 It sounds cool but I didn't want to get wrapped up in chasing something blindly; alas I left it alone.

@smitpatel
Copy link
Member

Triage decision:

  • We decided that single query mode is right decision we have made and we will keep the behavior
  • We will look into ways if we can improve perf of single SQL scenario.
  • Adding a hook which allows EF to generate split queries is infra costly and it has same bugs which made us change the behavior in first place. So we are not looking to add such a hook.
  • Due to being expression tree, and extension method which does processing would be much more expensive (perf & implementation cost both) than user rewriting the same query in split themselves. Linq is easy to write in code but hard to generate dynamically.
  • We will document how to rewrite the query to do split queries similar to how previous version did including stitching in tracking/non-tracking both scenarios.
  • Depending on complexity of above, we may look into provide some API to user where they can pass multiple queries and we would stitch the result. (Perhaps by new-ing up a context behind the hood to run multi-queries.

@dragnilar
Copy link
Author

@smitpatel I think the last point would probably provide the best flexibility, but I'll understand if your team decides against it. In the least, the documentation definitely needs to be updated so others can avoid running into this situation.

@sitepodmatt
Copy link

sitepodmatt commented Oct 1, 2019

I think this is quickly going to become the most frequent issue/question with EF Core 3.x once people to start to transition and have the WTF moment, and indeed whilst my EAV model is mandated by a legacy system and is a questionable design choice, this new restriction affects even the most simple scenarios - the canonical Blog/Tags/Comments/Upvotes. If we are really restricted to realistically 2 or 3 to .Include 1-to-many relations and/or carry the cognitive burden of computing the avg number of related rows to power of number of .Include so we don't stream too much redundant information then I can't see how EF Core is any better than the micro ORMs for the query side.

I understand the argument that splitting the queries can potentially carry consistency risks that aren't clear when not used with the right transaction modes and so wasn't perfect fit, however it was pragmatic and this strive for the right choice seems like a strive for purity rather than what the consumers of EF Core 3.0 desired.

@dragnilar
Copy link
Author

dragnilar commented Oct 1, 2019

@sitepodmatt Indeed, it was a pretty nasty surprise to see things like this with EF Core 3.0. As I implied, I am in a position where I can go either way and see both sides. And also for what it is worth, me and my team did notice certain areas perform better with 3.0 versus 2.6. Unfortunately, with the number of breaking changes (such as this) and the other issues that have cropped up on here over the past couple of days, we have decided to table upgrading to EF 3.0 for the time being.

Also, I think for the sake of backwards compatibility, they should have left in a hook or some type of flag on the context that you could toggle to use multi-query mode vs single query mode. I believe that would have been more pragmatic.

I almost wonder if @roji was being semi-prophetic when he said this on #12098:

Agreed, but we're not talking about a bad query, but rather about what happens with the default, standard EF Core join construct. At the end of the day, if the proposal is that EF Core translate the standard join syntax (Include()) via an SQL construct that results in extremely bad performance for the common scenario, then I think everyone will end up losing.

I doubt "everyone" lost with this design change. But at this stage I think it's too early to say definitively what was the net gain/loss. I am under the impression that most devs are not rushing to upgrade to Core 3.0, so it will take some time.

@Shane32
Copy link

Shane32 commented Oct 1, 2019

Just realize that there are a lot of devs, like me, that felt that EF Core 1.x / 2.x was a big step backwards from EF6 where in EF Core 2.x there is no way to issue certain queries without them being broken up into multiple queries. I can provide samples of seemingly simple queries on EF Core 2.x that would run a subquery individually for EVERY result in the primary table, totally breaking performance! I've had to write and rewrite queries many times to get EF 2.x to produce the desired SQL. For us, EF 3.0 is a pleasant relief, and while it would be nice if there was a 'multi-query mode' for EF 3.0, I'd rather take the 3.0 behavior over the 2.x behavior any day.

Ideally, I'd like to see a parameter added to Include to indicate multi-query behavior (e.g. .Include(x => x.Images, separateQuery: true)), plus an .AsSeparateQuery() method for splitting custom joins, but I understand that this might be a great deal of work. Hopefully an enhancement will make it into EF Core 5 to help with these scenarios.

@smitpatel
Copy link
Member

Since splitting query manually is not possible without tracking entity

It is possible as written in the gist.

@mtnoztrk
Copy link

mtnoztrk commented Sep 3, 2020

Since splitting query manually is not possible without tracking entity

It is possible as written in the gist.

Do you mean the opening post? Because afaik EF Plus needs tracking enabled.

@smitpatel
Copy link
Member

This link https://gist.github.com/smitpatel/d4cb3619e5b33e8d9ea24d3f2a88333a

@Tasteful
Copy link
Contributor

In #20892 we are getting the AsSplitQuery method. Will that work when you use own DbParameter and multiples Includes? If we reusing the same SqlParameter in EfCore3.1 in multiple queries an The SqlParameter is already contained by another SqlParameterCollection. exception is throw. This is described in #11370 and is marked as solved with the #12098, but is this case also solved when using the new AsSplitQuer function?

For EfCore 3.1 I'm using this as a workaround to fix this

using System;
using System.Data.Common;
using System.Linq;
using System.Reflection;
using Microsoft.Data.SqlClient;
using Microsoft.EntityFrameworkCore.Storage;
using Microsoft.EntityFrameworkCore.Storage.Internal;

namespace Test.Application.Data.EfCoreOverrides
{
    // register this as IRelationalCommandBuilderFactory in the service provider
    internal class ClonableRelationalCommandBuilderFactory : RelationalCommandBuilderFactory
    {
        public ClonableRelationalCommandBuilderFactory(
            RelationalCommandBuilderDependencies dependencies)
            : base(dependencies)
        {
        }

        public override IRelationalCommandBuilder Create()
        {
            return new ClonableRelationalCommandBuilder(Dependencies);
        }
    }

    [System.Diagnostics.CodeAnalysis.SuppressMessage("Usage", "EF1001:Internal EF Core API usage.", Justification = "<Pending>")]
    internal class ClonableRelationalCommandBuilder : RelationalCommandBuilder
    {
        private static readonly FieldInfo _parameterFieldInfo = typeof(RawRelationalParameter)
            .GetField("_parameter", BindingFlags.Instance | BindingFlags.NonPublic);

        public ClonableRelationalCommandBuilder(
            RelationalCommandBuilderDependencies dependencies) : base(dependencies)
        {
        }

        public override IRelationalCommandBuilder AddParameter(IRelationalParameter parameter)
        {
            if (parameter is CompositeRelationalParameter compositParameter)
            {
                var clonedParameters = new CompositeRelationalParameter(compositParameter.InvariantName, compositParameter.RelationalParameters
                    .Select(x =>
                    {
                        if (x is RawRelationalParameter rawParameter)
                        {
                            var dbParameter = (DbParameter)_parameterFieldInfo.GetValue(rawParameter);
                            return new ClonableRawRelationalParameter(x.InvariantName, dbParameter);
                        }
                        return x;
                    })
                    .ToList());

                return base.AddParameter(clonedParameters);
            }

            return base.AddParameter(parameter);
        }
    }

    [System.Diagnostics.CodeAnalysis.SuppressMessage("Usage", "EF1001:Internal EF Core API usage.", Justification = "<Pending>")]
    internal class ClonableRawRelationalParameter : RawRelationalParameter
    {
        private static readonly FieldInfo _parentFieldInfo = typeof(SqlParameter)
            .GetField("_parent", BindingFlags.Instance | BindingFlags.NonPublic);

        public ClonableRawRelationalParameter(string invariantName, DbParameter parameter) : base(invariantName, parameter)
        {
        }

        public override void AddDbParameter(DbCommand command, object value)
        {
            if (value is SqlParameter
                && !(_parentFieldInfo?.GetValue(value) is null))
            {
                base.AddDbParameter(command, (DbParameter)((ICloneable)value).Clone());
                return;
            }

            base.AddDbParameter(command, value);
        }
    }
}

@smitpatel

@smitpatel
Copy link
Member

smitpatel commented Sep 10, 2020

@Tasteful - Filed #22483

@Tasteful
Copy link
Contributor

I think the new issue-number is wrong, should be #22483

@smitpatel
Copy link
Member

Yes, my 2 key is faulty.

@Gio2k
Copy link

Gio2k commented Sep 11, 2020

This link https://gist.github.com/smitpatel/d4cb3619e5b33e8d9ea24d3f2a88333a

As a side effect of this workaround I have noticed that the List<> properties are now null, where earlier they would be just empty Lists. This means I need to check against null in many other places where i would just do an Any()

@dstj
Copy link

dstj commented Nov 24, 2020

I'm surprised to see so few mentions of the increased memory usage caused by the "single query / no automatic splitting" feature in EF Core 3. I've just hunted down a tenfold memory increase after moving to EF Core 3.1.10 (from 2.2.6) and the cause was that the query was no longer automatically split into 5 simpler queries.

EF Core 2 - ~120 MB used
big asset import - efcore2

EF Core 3 - ~1.3 GB used
big asset import - efcore3

This has a huge impact.

FYI, the EF Core 2 query looked like:

var company = await _dbContext.Companies
				.Include(c => c.Sites)
				.Include(c => c.Categories)
				.Include(c => c.Assets).ThenInclude(a => a.Site)
				.Include(c => c.Assets).ThenInclude(a => a.Tasks)
				.AsTracking()
				.SingleOrDefaultAsync(c => c.Id == companyId);

@ajcvickers
Copy link
Member

@dstj Please open a new issue and attach a small, runnable project or post a small, runnable code listing that reproduces what you are seeing so that we can investigate.

@ajcvickers
Copy link
Member

ajcvickers commented Nov 24, 2020

Closing this discussion issue since 5.0 has now shipped with split-query support for most scenarios. Remaining work is tracked by #21234.

@dragnilar
Copy link
Author

dragnilar commented Nov 24, 2020

@ajcvickers @smitpatel

FYI, we had a chance to test this today with the original query and 5.0 did resolve the performance problem (we had to enable Split Query mode obviously). I am noticing side effects of #22283, but we have our work around with EF Plus for now.

Thanks for addressing this and keeping the 🦄 magic going! 👍

@smitpatel
Copy link
Member

Updated the pending item issue number to #21234

@NivagL
Copy link

NivagL commented Jan 25, 2021

This change is broken logic. I'm now getting multiple parent records loaded because I want to include some child elements. That makes no sense at all. The change seems to be motivated by performance but at the expense of the logic? The fix is to put back many queries negating any performance improvement anyway. This is a terrible change. Giving it a fix name "split-query" just hides the fact that it's actually broken logic. I don't want split-query I want proper parent loading. I'm upgrading to netcore 3.1 rather than 5.

@mguinness
Copy link

It's not clear to me what your problem is, but upgrading from 2.1 to 3.1 probably won't give you the outcome you want. Split query was introduced in 5.0 to alleviate the problem of the query being resolved in a single SQL statement (as in the older EF 6).

@NivagL
Copy link

NivagL commented Jan 26, 2021

"I'm now getting multiple parent records loaded because I want to include some child elements." I use .Include/.ThenInclude which was working very well in 2.1. The problem is the join syntax in 3.1 now loads many parents as it joins to the child tables rather than independently applying any where expression. It's different, incorrect, logic. Previously the join did not exist and child elements were queried independently as they should be for a many-one relationship. Otherwise you get duplicated parent data (performance) and too many records loaded, (bad logic). You shouldn't be using a join to simulate many queries with similar where clauses - it breaks the relational model. My options seem to be to implement my own independent queries on 3.1 or upgrade to 5.0 and use "split-query" (which should be the default anyway).

@mguinness
Copy link

The problem you describe is exactly why split query was introduced, so you would be best served by upgrading to 5.0 instead of 3.1 (unless you have a blocker). Enabling split queries globally allows you to make that the default.

@NivagL
Copy link

NivagL commented Jan 27, 2021

Thanks for the advice. I have made progress with net core 3.1 and EF 5.0 with single queries. It's looking much better now so very much appreciated. One remaining problem, views. We use views in our solution in the same way as table with .Include/.ThenInclude to control loading of child objects to the view. That doesn't appear to work for views now (we used to have views as tables in the content with .ToTable). I have tried .ToTable & ToView in the Context but no luck. Can you confirm if this is another know issue or intended behaviour for some reason? Is there a work around?

@mguinness
Copy link

Glad that you've been able to make progress. Take a look at Keyless Entity Types for views. If you need further help, I'd recommend asking on Stack Overflow as you'll get much better help there than on a unrelated closed issue.

@ajcvickers ajcvickers removed this from the Discussions milestone Nov 16, 2021
@ajcvickers ajcvickers added closed-no-further-action The issue is closed and no further action is planned. and removed area-query labels Nov 16, 2021
@ajcvickers ajcvickers reopened this Oct 16, 2022
@ajcvickers ajcvickers closed this as not planned Won't fix, can't repro, duplicate, stale Oct 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed-no-further-action The issue is closed and no further action is planned. customer-reported
Projects
None yet
Development

No branches or pull requests