Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ExecuteUpdate/Delete (AKA bulk update, without loading data into memory) #795

Closed
Tracked by #24107
akirayamamoto opened this issue Oct 4, 2014 · 380 comments · Fixed by #28626
Closed
Tracked by #24107

ExecuteUpdate/Delete (AKA bulk update, without loading data into memory) #795

akirayamamoto opened this issue Oct 4, 2014 · 380 comments · Fixed by #28626
Assignees
Labels
area-bulkcud closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. type-enhancement
Milestone

Comments

@akirayamamoto
Copy link

akirayamamoto commented Oct 4, 2014

EF does not provide a batch update mechanism. A proposal is below.
Context.Customers.Update().Where.( c => c.CustType ==“New”).Set( x => x.CreditLimit=0)

Will you consider this feature?
More details here: https://entityframework.codeplex.com/workitem/52


Design summary: #795 (comment)

@rowanmiller
Copy link
Contributor

rowanmiller commented Oct 8, 2014

This is something we do want to do in EF, but not for the initial RTM of EF Core. Moving to backlog.

@rowanmiller rowanmiller added this to the Backlog milestone Oct 8, 2014
@djechelon
Copy link

Bulk is currently available in EntityFramework.Extended by loresoft (source https://github.com/loresoft/EntityFramework.Extended, Nuget https://www.nuget.org/packages/EntityFramework.Extended/).

Maybe if license allows, you could think about grabbing code from that project

@abatishchev
Copy link

Unfortunately EF.Extended isn't testable at all and author isn't going to address that.

@djechelon
Copy link

I mostly agree. EF.Extended is indeed difficult to test, but not actually "not testable at all", as after several attempts I made my PR run on AppVeyor. I did my own fork and implemented my own methods for MySQL (only because the original project didn't work with it) because the author is not maintaining the project any more.

What about porting that code, and unit-testing it, in the original EF package?
Really, this and the duplicate #3303 are really "must-have" features, we can't have an ORM with the bottleneck of massive operations.

😭 😭

@ilmax
Copy link
Contributor

ilmax commented May 13, 2016

Hi, I'm starting to investigate how complex would be to add support for set-based operation, and I'm actually only investigating Delete operation, I really would like to start a discussion with the team and the community members to start a discussion on how this feature can be designed.

Hoping that this will be achieved by EF Core, I'm posting my initial thoughts/questions on this subjects.

  • Is this the right place or it's better to open a new issue?
  • What set-based method should we expose? Delete, Update, and maybe Merge?
  • What should be done with entities affected by the query that's actually being change tracked (i.e. already loaded from db by the same context instance)? Should this scenario be supported?
  • To my understanding the update pipeline is generating SQL starting from a collection of ColumnModification instances, set-based operations should probably have a more powerful query generation capabilities, so we may need to mix somehow the query pipeline SQL generation capabilities with the update pipeline (or am I missing something?)
  • What api should be exposed?

My initial proposal is the following:

Delete

void Delete<TEntity>(this DbSet<TEntity> query, Expression<Func<TEntity, bool>> condition)

usage: context.Customers.Delete(c => c.HasPendingPayments)

or

void Delete<TEntity>(this IQueryable<TEntity> query)

usage: context.Customers.Where(c => c.HasPendingPayments).Delete()

Update

void Update<TEntity>(this IQueryable<TEntity> query, Expression<Action<TEntity>>)

usage: context.Customers.Where(c => c.HasPendingPayments).Update(c => c.Status = CustomerStatus.Disabled)

or (as suggested above)

void Set<TEntity>(this IQueryable<TEntity> query, Expression<Action<TEntity>>)

usage: context.Customers.Where(c => c.HasPendingPayments).Set(c => c.Status = CustomerStatus.Disabled)

Merge

Any suggestion?

This is just a quick list of initial thoughts, I will update this one with new findings (if any 😄 ) during this initial investigation, but would like to get the discussion started.

@anpete
Copy link
Contributor

anpete commented May 13, 2016

To my understanding the update pipeline is generating SQL starting from a collection of ColumnModification instances, set-based operations should probably have a more powerful query generation capabilities, so we may need to mix somehow the query pipeline SQL generation capabilities with the update pipeline (or am I missing something?)

This makes sense. We likely need to introduce UpdateExpression etc. and then port the existing impl. to use it, too.

@abatishchev
Copy link

Looks good to me.
However, I'd stick with CRUD naming, i.e. Update(...).
Delete can have no overload so would be used on top of Where(...). Delete(predicate) would be a nice addition though.

@djechelon
Copy link

djechelon commented May 16, 2016

Replying to @ilmax

What should be done with entities affected by the query that's actually being change tracked (i.e. already loaded from db by the same context instance)? Should this scenario be supported?

I think the most difficult part is to guess whether some modifications propagate to entities already within the session. This is not currently addressed by EntityFramework.Extended in my knowledge, and is supposed to cause troubles.

Example:

using(DbContext context = new MyDbContext()) {

    Cat molly = context.Cats.Single(c => c.name == "Molly");

    molly.Weight += 1.5; // feed Molly

    // Now feed all Eleanor's cats
    context.Cats.Update(
        c => c.Owner.FirstName == "Eleanor" && c.Owner.LastName == "Abernathy", //Where clause Func<Cat,bool>
        c => c.Weight += 1.5 //mapping function Action<Cat>
    );

    AssertFalse(molly.IsOverweight);
}

The above differs whether Molly is Eleanor's cat or not. If we can assure (only under proper transaction synchronization constraints) that molly is always in sync with the database, one IMO feasible solution could be running the Update both to the SQL datasource and into the Entity cache. The second is terribly easy because no runtime-generated SQL is involved but a sane IList operation on the tracked entities that are stored in-memory.


The above was a simple scenario because EF knows we are working on the cats, so affected entities are of Cat. What if I try to cheat and run a trivial 1-affected-record statement such as

context.Cats.Update(
    c => c.Id = 657,
    c => c.Owner.Sick = true
);

The update is supposed to run on the Cats, but, in practice, modifications apply to Person class, as the generated SQL would be (UPDATE PERSONS..., see also here).

This scenario is really over-complicated, even though easy to run in-memory. I am speaking of it only to illustrate possible misuses of the bulk statement.


Final point of discussion for the update, I want to ask if multi-column updates are supported and how.

I mean

context.Customers.Update(c => c.HasPendingPayments, c => { c.Status = CustomerStatus.Disabled; c.Overdraft = 0; });

UPDATE CUSTOMERS SET STATUS = 'DISABLED', OVERDRAFT = 0
WHERE PENDING_PAYMENTS = 1;

Cheers,
ΕΨΗΕΛΩΝ

@djechelon
Copy link

Another point of discussion is that I like is the overload I have used in my examples: int Update(this IQueryable<TEntity> dataSet, Expression<Func<TEntity,bool>> whereCondition, Action<TEntity> setClause);

so

context.Customers.Update(c => c.HasPendingPayments, c => c.Status = CustomerStatus.Disabled);

Normally databases return the number of affected rows, and it is helpful to the application.

So here are 2 separate proposals:

  1. Update (and thus Delete) should return int/long
  2. Update may accept both Where and Set, even though it would become a direct redirect to context.Entities.Where(xx).Set(yy) in the real code

@ilmax
Copy link
Contributor

ilmax commented May 18, 2016

@djechelon Good points, IMO at first we can reject set-based query if the context is already tracking object of the same type, unless it's proven to be very simple to propagate changes to tracked objects.

About the second example context.Cats.Update( c => c.Id = 657, c => c.Owner.Sick = true );
I propose to not support that case initially and evaluate based on community feedbacks if this is worth supporting or not.

Multi-column update IMO should be supported, I don't think that support it should be much different than what's required for single column ones.

@anpete

We likely need to introduce UpdateExpression etc. and then port the existing impl. to use it, too.

Is this change on your radar? Would you like me to open an issue to track this request?

@gdoron
Copy link

gdoron commented Jul 5, 2016

@ilmax Currently we use Raw SQL for these batch operations e.g. context.Database.ExecuteSqlCommand which ignores the tracked entities.

I wouldn't reject the query because some of the affected rows might already being tracked .
You should always work under the assumption that the tracked entity might have stale data regardless of this feature (a different Context in a different thread/trigger/job changed the entity).

I would love the see this feature and liked the methods signature.

@Seanxwy
Copy link

Seanxwy commented Jul 24, 2016

When to implement this feature?

@geirsagberg
Copy link

geirsagberg commented Sep 15, 2016

Seems like https://github.com/zzzprojects/EntityFramework-Plus provides batch delete etc., and has support for EF Core. Anyone looked into this?

@quanterion
Copy link

Are there any plans to implement this?

@borisdj
Copy link

borisdj commented May 12, 2017

As of today there is extension library EFCore.BulkExtensions with Bulk Operations (Insert, Update, Delete) and Batch (Delete, Update)
Example of usage:

context.BulkInsert(entitiesList);
context.BulkInsertOrUpdate(entitiesList);
context.BulkUpdate(entitiesList);
context.BulkDelete(entitiesList);
context.BulkRead(entitiesList);

context.Items.Where(a => a.ItemId >  500).BatchDelete();
context.Items.Where(a => a.ItemId <= 500).BatchUpdate(a => new Item { Quantity = a.Quantity + 100});

More info here: https://github.com/borisdj/EFCore.BulkExtensions
Can be installed from nuget: https://www.nuget.org/packages/EFCore.BulkExtensions/

@CornedBee
Copy link

I have tried EntityFramework-Plus in my own project, but had to discard it. It dynamically tries to load the SQLServer EF provider at runtime, because it uses some of its functionality to generate SQL for the batch queries. If the actual provider is not SQLServer, it then applies a few manual translations (e.g. changing field quoting style).

However, in our deployment we will not have the SQLServer EF provider available.

@RehanSaeed
Copy link

RehanSaeed commented May 25, 2017

There are three community effors to support bulk operations:

Has anyone had experience with them? I have a large database where I only do inserts of around 50 records at a time and EF Core goes into a death spiral, eating all memory on the machine. I'm trying to evaluate which of these would solve my problem.

@djechelon
Copy link

Dear @RehanSaeed, I have tried successfully the EntityFramework.Extended to perform bulk deletes, but I needed to compile my own version of it.

Your question is unclear. What operation are you performing so that EF core goes into out of memory? Are you massively inserting? Or are you loading all your results into memory at a certain time? Why not discussing on stackoverflow?

@RehanSaeed
Copy link

Just inserting. EntityFramework.Extended does not seem to support bulk inserts.

@djechelon
Copy link

djechelon commented May 25, 2017

No doubt. Bulk inserts can be handled by EF.BulkInserts, which I failed to use.

Eventually I used the SqlBulkLoader "manually"

@aradalvand
Copy link

aradalvand commented Aug 31, 2022

@roji @smitpatel Sorry to tag you guys but I wanted to ask a question before this is released: I noticed that the SetProperty method doesn't accept an actual value (a string literal, for example) as its second argument, only an expression, so you'd have to do:

.SetProperty(p => p.Name, _ => "Shay")

Instead of just:

.SetProperty(p => p.Name, "Shay")

Is there a particular reason behind this?! Why not just allow simple values? 🤔

@zhuangState
Copy link

zhuangState commented Sep 10, 2022

I hope it's simpler
.ExecuteUpdate (p => p.Name ="Shay") and .ExecuteUpdate (p =>new { Name ="Shay" ,Age =21 } )

@aradalvand
Copy link

aradalvand commented Sep 10, 2022

@zhuangState The first syntax isn't a valid C# expression (there are plans to improve expressions so it could become reality one day) and the latter syntax wouldn't be feasible due to the introduction of the required keyword in C#.

@John0King
Copy link

@aradalvand it use anonymous object, there no required keyword for it !, I believe if you give us .ExecuteUpdate(x=>x.Name, x=> mydto.Name) , then we will create our own .ExecuteUpdate(x=> myDto) .

@thongdoan
Copy link

I hope C# will have "Cascade notation" like Dart language, so we can write:

ExecuteUpdate(x => x.Note="789"
                   ..Counter++,
                   ..Name = x.FirstName + " "  + x.LastName
);

@crozone
Copy link

crozone commented Sep 20, 2022

Does the current implementation allow limiting the number of rows affected, eg DELETE FROM blah WHERE condition ORDER BY expression LIMIT 5000 or UPDATE blah SET thingy = 5 WHERE condition ORDER BY expression LIMIT 1?

@roji
Copy link
Member

roji commented Sep 20, 2022

@crozone which database supports the SQLs you posted above? AFAIK most databases do not allow placing LIMIT and ORDER BY clauses on DELETE or UPDATE as you're showing.

However, when a LINQ operator is used that cannot be directly translated to SQL, EF uses a subquery to achieve the same effect. For example, the following LINQ operation:

ctx.Blogs
    .OrderBy(b => b.Id)
    .Take(10)
    .ExecuteUpdate(sp => sp.SetProperty(b => b.Name, "Foo"));

... translates into the following SQL:

UPDATE [b]
SET [b].[Name] = N'Foo'
FROM [Blogs] AS [b]
INNER JOIN (
    SELECT TOP(@__p_0) [b0].[Id], [b0].[Name]
    FROM [Blogs] AS [b0]
    ORDER BY [b0].[Id]
) AS [t] ON [b].[Id] = [t].[Id]

@zhuangState
Copy link

@aradalvand Thank you for your reply

@crozone
Copy link

crozone commented Sep 20, 2022

@roji Hmm perhaps this is only a MySQL/MariaDB feature?

UPDATE

DELETE

Neither Postgres nor MSSQL appear to support it after all. SQLite supports it, but apparently only if compiled with SQLITE_ENABLE_UPDATE_DELETE_LIMIT ...

Very glad to see that EF will join against a subquery to get the same result regardless, so it doesn't even matter. The ability to .OrderBy(b => b.Id).Take(10).ExecuteUpdate() is exactly what I was hoping for. Cheers.

@roji
Copy link
Member

roji commented Sep 20, 2022

@crozone right, I remember now for both these databases.

Note that this feature has been implemented in such a way that providers may implement such "non-standard" operators if they so wish. That means that the default EF logic will create a subquery, but the MySQL provider can implement support for OrderBy/Take to produce the tighter SQL above.

/cc @lauxjpn

@AraHaan

This comment was marked as off-topic.

@Webreaper

This comment was marked as off-topic.

@Webreaper

This comment was marked as off-topic.

@EliBraid

This comment was marked as off-topic.

@Eli-Black-Work

This comment was marked as resolved.

@roji

This comment was marked as resolved.

@Eli-Black-Work

This comment was marked as resolved.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-bulkcud closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. type-enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.