Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exploring Scoped vs Transactional (IDbContextFactory) DbContexts #25653

Closed
Mike-E-angelo opened this issue Aug 22, 2021 · 15 comments
Closed

Exploring Scoped vs Transactional (IDbContextFactory) DbContexts #25653

Mike-E-angelo opened this issue Aug 22, 2021 · 15 comments
Labels
closed-no-further-action The issue is closed and no further action is planned. customer-reported

Comments

@Mike-E-angelo
Copy link

Ask a question

I am currently upgrading my framework to EfCore 6.0. In doing so, I have been taking the time to examine best practices to ensure that I am doing everything properly for my Blazor server-side application.

Currently, all my components and DbContext instances are scoped to the user. I am concerned about the memory utilization this may incur as more and more users adopt my application (🤞), but I have not been able to definitively ascertain this is an actual concern yet. I mention this as part of me is wrestling with the notion of avoiding premature optimization in my codebase, and solving a problem that does not actually exist yet.

OK, so with that tidbit aside, I started to do some performance analysis around scoped vs. transactional operations, which I share with you below. When I say "transactional," I primarily mean the use of IDbContextFactory (and pooled ones at that, as we'll see), but in my tests I use that to mean direct activation of a DbContext. Essentially, "transactional" means something that has to be created/disposed during an operation rather than pulled from (scoped) memory.

What's beneficial and elegant about my current design is that all IQueryable<T> instances are defined once per DbContext and then scoped to the user, along with the DbContext that created them. In effect, this caches the query but also adds the memory overhead which is the concern I shared earlier.

Switching everything to be transactional/IDbContextFactory would be very time consuming in my application, particularly all the stored IQueryable<T> queries that I have defined that are subsequently scoped to the user.

However, upon further inspection, I could take a whack at everything that is non-queryable, that is, writable, or anything involving a DbContext.SaveChangesAsync.

So then the thought struck me, and that leads me to my question (which I will share in a bit, I promise!):

How about using a singleton DbContext for all queries (reads), and using IDbContextFactory for everything else (writes)?

To do this, there are two identified issues I would need to do:

  1. Mark all root DbSet<T> queries as AsNoTracking.
  2. Turn off thread checking to ward off the InvalidOperationException that occurs when same-thread access occurs.

There may be others, but I wanted to throw the thought out there here to see if there is anything else to consider.

The Question

So the question is: is it considered OK to use a singleton-scoped DbContext in a Blazor server-side application to handle all the queries (reads) of the application, while IDbContextFactory handles all the modifications/operations (writes)?

Follow up: Is there anything really stopping this from happening from a design perspective? That is, are there limits to the amount of queries that one DbContext can process, assuming the thread-checking is disabled?

Include your code

The other aspect here that is driving me towards this compromise in my application design, is that I did some benchmarking, and what I found was surprising with a very basic in-memory DbContext. You can find the code here:

https://github.com/Mike-E-angelo/Stash/tree/master/EfCore.ScopedVsTransaction

Simply returning an empty DbSet<T> as an array returned the following metrics:

Method Mean Error StdDev Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
Scoped 2.489 us 0.0358 us 0.0334 us baseline 0.1945 - - 2 KB
Pooled 2.906 us 0.0244 us 0.0216 us 1.17x slower 0.02x 0.2022 - - 2 KB
Transactional 22.751 us 0.3772 us 0.5163 us 9.22x slower 0.27x 1.7700 0.0305 - 14 KB

Here, Scoped refers to caching the IQueryable<T> in memory (ala what happens when scoping to user), Pooled is using a PooledDbContextFactory, and Transactional is straight-up activating a new DbContext.

Looks like Scoped and Pooled are pretty much even here, and I probably would have continued to move toward a pooled IDbContextFactory model for all of my codebase until I appended a few expressions and saw the following:

Method Mean Error StdDev Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
Scoped 18.67 us 0.174 us 0.163 us baseline 0.6714 - - 6 KB
Pooled 31.57 us 0.622 us 0.950 us 1.71x slower 0.07x 1.2207 - - 10 KB
Transactional 61.67 us 0.738 us 0.690 us 3.30x slower 0.05x 2.8076 - - 23 KB

It would seem that the more expressions added, the more and more Scoped wins. And my codebase contains a lot of very complex queries containing a lot of expressions. All of which work amazing, btw. 😁 All of which is due to your amazing work over there.

So, seeing this is what got me going down this path and considering/contemplating a singleton DbContext to handle the reads of my application, while IDbContextFactory handles the writes.

Also, while I am at this. Keep in mind that my DbContext above is completely empty, and the simplest of operations generate 2KB-23KB of allocations. To me, this seems a tad excessive and wanted to ensure this is a known issue and/or if I am doing something fundamentally wrong in my tests. I am using the In-Memory provider, which I would expect to be pretty lean in such a scenario, but pointing this out just in case.

To close, I would really like to express my sincere gratitude for all your efforts out there. EfCore is really great, and the team there has been really helpful in attending to my questions/issues. I am a huge fan of this project and all your efforts. What you have made here definitely deserves a unicorn as a mascot, indeed. 🦄

Thank you for any assistance/insight you can provide. 👍

Include stack traces

NA

Include verbose output

NA

Include provider and version information

EF Core version: 6.0.0-rc.1.21416.1
Database provider: Microsoft.EntityFrameworkCore.InMemory
Target framework: (e.g. .NET 5.0) net6.0 rc1
Operating system: Windows 10
IDE: Visual Studio 2022 Preview 3.1

@ajcvickers
Copy link
Member

@Mike-E-angelo DbContext is not thread-safe. You cannot safely perform multiple queries (even no-tracking) concurrently. See Avoiding DbContext threading issues.

@Mike-E-angelo
Copy link
Author

Thank you for that link, @ajcvickers. I was aware of that, and that you could turn off the concurrency check, but I wanted to verify/ensure the possibility here.

I guess this is wishful thinking on my part, then. 😅😭

It would be nice to know what sort of issues occur from concurrent queries occurring. From the outset it would seem that read operations would be OK, but considering there can be anywhere to 23KB of allocations occurring to create an empty array, there could be complications.

😁

Wishful thinking aside, are there any plans to make queries/read-only behavior thread-safe?

@ajcvickers
Copy link
Member

ajcvickers commented Aug 22, 2021

Turning off the concurrency check only turns of the safety check; it doesn't make it safe. It can provide a percent or two of perf benefit in extremely high perf scenarios, like tested by TechEmpower.

The issues are the same as in most threading cases--undefined behavior that will occasionally cause crashes depending on the timing of the threads.

@roji
Copy link
Member

roji commented Aug 22, 2021

To add to @ajcvickers answer:

From the outset it would seem that read operations would be OK

It's important to understand that a single DbContext uses a single database connection, and those can only execute one query at a time. There's also various state kept internally in DbContext itself which isn't thread-safe.

However, assuming your using DbContext pooling, there really should not be any reason to want to reuse the same context - regardless of what your query looks like and how many expression it has. If you're seeing differently, can you please try to put together a minimal code sample that shows that, preferably as a simple console application?

@Mike-E-angelo
Copy link
Author

Turning off the concurrency check only turns of the safety check; it doesn't make it safe

OK, I am with you @ajcvickers. Thank you for that additional context, it is valuable to me and my understanding of this topic.

It's important to understand that a single DbContext uses a single database connection

That was the gotcha I was looking for @roji. 😁 I knew there was a catch somewhere here. Thank you for letting me know that. So that means if I shared 1 DbContext for the entire application it would slow to a crawl as more and more users are basically using that 1 connection (assuming there aren't any threading issues).

Alright, this is better understood to me. I appreciate the time and insight provided here. 👍

there really should not be any reason to want to reuse the same context - regardless of what your query looks like and how many expression it has

Correct, and your patience is appreciated here as I articulate my findings. The only re-use of contexts that is occurring is how my application is designed now (and I am trying to move away from), and that current design consists of 1 scoped/shared DbContext per circuit. What I was attempting to show with the benchmarks is that even though using a user-scoped DbContext is (as you state) not the recommended way of managing DbContext in a Blazor server-side application, it is still faster and produces far fewer allocations than the recommended alternatives (which do not re-use).

Conceptually, it's tough for me to commit to ditching re-use vs IDbContextFactory if what I have now (re-use) is faster and produces fewer allocations. That doesn't mean that I am adverse or disagree with IDbContextFactory. In fact, I want to use it. But currently, it seems slower and produces more overhead than re-using a context as designed now, and I am trying to justify and accept the parameters in making the switch.

That stated I think we're on the same page as far as the approach here. My only remaining issue now is those allocations. Is this a known issue that a request to materialize an empty DbSet to an array will result in 2KB-14KB of allocations?

@roji
Copy link
Member

roji commented Aug 22, 2021

So that means if I shared 1 DbContext for the entire application it would slow to a crawl as more and more users are basically using that 1 connection (assuming there aren't any threading issues).

It wouldn't crawl to a halt. If at any point the singleton DbContext is used concurrently, you'd get an exception at best and undefined behavior at worse. Technically, if you could be somehow sure that the singleton is never used concurrently, everything would work. Note that depending on exactly how things work in your application, it may be possible to trigger concurrent usage of the DbContext even without multiple users, e.g. if a single user clicks a button twice, with the 2nd time trying to use the DbContext before the 1st operation completed (you can disable the UI via a modal dialog to prevent this for as long as the operation is ongoing).

Conceptually, it's tough for me to commit to ditching re-use vs IDbContextFactory if what I have now (re-use) is faster and produces fewer allocations.

As I wrote above, that doesn't correspond to how things should be working. Whether you use a single DbContext instance or multiple pooled ones should make almost no difference; the overhead of getting and returning a pooled DbContext instance is negligible.

I really recommend carefully reading this doc page, which goes over how Blazor Server and EF interact, and shows various strategies for managing your DbContext.

Is this a known issue that a request to materialize an empty DbSet to an array will result in 2KB-14KB of allocations?

We worked a lot on improving the runtime perf of non-tracking queries for EF Core 6.0 (including reducing memory allocation), so give the latest preview a try; if you're using EF Core 5.0 there's a very good chance you'll see other results (would be good to hear about them too!).

@Mike-E-angelo
Copy link
Author

Mike-E-angelo commented Aug 22, 2021

Technically, if you could be somehow sure that the singleton is never used concurrently, everything would work

Right, but if I have 100 users all using that same context, and only 1 connection can be used at a time, and each operation takes 100ms to complete (rounding up 😁), it would take 10 seconds for all of those operations to complete, correct? This is what I mean by grinding to a halt. I'm actually thinking more than 100 users, more like 512 or 1,024, :P If they are all sharing that one connection and that connection was thread-protected, there would be a huge bottleneck because it only has one connection at a time.

the overhead of getting and returning a pooled DbContext instance is negligible.

So it's not the instance/activation/retrieval of the DbContext (which indeed seems very nice), it's using of the instance that is where the pain is occurring. I know I ended up writing a bunch up there, but within the jumble there is a link to a very simple project with the benchmarks that demonstrates this:

https://github.com/Mike-E-angelo/Stash/tree/master/EfCore.ScopedVsTransaction

I really recommend carefully reading this doc page, which

I know it doesn't seem like it, but I actually spent the day reading it and other articles before writing in with this question. 😆

so give the latest preview a try

The project above uses 6.0.0-rc.1.21416.1

So to summarize, calling ToArrayAsync on an empty DbSet appears to be creating at least 2KB. This seems really high by itself, but other benchmarks show the same activity will net as high as 23KB, which seems really REALLY high.

Please let me know if there is any further information you require to help further diagnose this issue and I will assist the best that I can. 👍

@Mike-E-angelo
Copy link
Author

Also, @roji in addition to the allocations, I want to be sure you understand that the other sticking point here is the time spent doing this. Calling ToArrayAsync on a DbContext that is pulled from a pool is 17% slower than calling the same query that has already been defined and stored in memory (as in a Scoped instance).

When you add a few (3) expressions to the mix, a pooled DbContext ToArrayAsync call is 71% slower than its scoped counterpart.

@roji
Copy link
Member

roji commented Aug 22, 2021

Right, but if I have 100 users all using that same context, and only 1 connection can be used at a time, and each operation takes 100ms to complete (rounding up grin), it would take 10 seconds for all of those operations to complete, correct?

DbContext doesn't serialize the connections for you. Once again, if you attempt using it concurrently, it would not take longer - it would throw (or have undefined behavior). Apart from that, yes - if you were to theoretically serialize all usage of a single DbContext instance (via some sort of lock or queue), then indeed things would be slow. But that wouldn't make much sense.

So it's not the instance/activation/retrieval (which indeed seems very nice), it's using of the instance that is where the pain is occurring.

There shouldn't be any difference here - a DbContext that you reuse yourself vs. a DbContext that you get from context pooling is the same DbContext - there's no difference in how it executes things.

I do note that in the Scoped case, you're not executing Set<Subject>() as you do in the other two cases, so you're not quite comparing apples-to-apples (depending on what exactly you're trying to measure). I'm assuming that in your benchmark above, when you added other operators, these were also included in the IQueryable for the Scoped case, so again that's work that's being done in Pooled/Transactional but not in Scoped. If you modify Scoped to perform the same thing, you should then see the pure overhead of DbContext pooling vs. just using a single instance all the time. This likely explains the 17%/71% differences you're seeing - would be good to get confirmation.

Though all that is quite academic. Sharing a singleton DbContext in an inherently concurrent application (e.g. a webapp) simply isn't a viable option.

One last (likely academic) point is that if you're using DbContext pooling, and care about every little bit off perf, then you're better off not using Set<T>(), but rather having DbSet properties on your context instead. This way they're set up once when the context is first created.

So to summarize, calling ToArrayAsync on an empty DbSet appears to be creating at least 2KB. This seems really high by itself, but other benchmarks show the same activity will net as high as 23KB, which seems really REALLY high.

I don't remember the absolute per-run allocation numbers I ended up with... 2KB does seem a little high for an empty non-tracking query, but not completely unreasonable. I'd have to look in a memory profiler again.

If you're interesting in the optimization work done for EF Core 6.0, here's some info.

@Mike-E-angelo
Copy link
Author

But that wouldn't make much sense.

I'm glad we have shared agreement here. 😁

there's no difference in how it executes things.

Correct, the difference is when is it executed. In a scoped session, it is executed once when it is stored in memory as the DbContext is also stored in memory as a scoped instance. In the other scenarios, it is created and executed each and every time a DbContext is retrieved/disposed.

I do note that in the Scoped case, you're not executing Set()

Good catch. Thank you for pointing that out. I have committed an update that makes it all consistent. The results are similar:

Method Mean Error StdDev Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
Scoped 19.21 us 0.379 us 0.601 us baseline 0.6714 - - 6 KB
Pooled 31.41 us 0.602 us 0.534 us 1.62x slower 0.06x 1.2207 - - 10 KB
Transactional 63.28 us 1.102 us 0.977 us 3.26x slower 0.11x 2.8076 - - 23 KB

so again that's work that's being done in Pooled/Transactional but not in Scoped.

As I attempted to describe above, this is the whole point. In a scoped user/circuit session, an IQueryable<T> instance can be defined and compiled when the session/circuit is created and then stored for the entire duration of that scoped session. So, there is a one time hit of defining and storing that query. After that, it is in memory and is not needed to be redefined/recompiled when the DbContext is retreived (as it, too, is a Scoped instance). In fact, by using this method and approach, the DbContext is never used for any querying and it's only the IQueryable<T> instances that are accessed at when needed at runtime.

Conversely, with transactional/pooled, each query must be created at the time the context is retrieved, as the new context is the dependency for the query. As such, there is more net overhead (in both time and allocations) in this scenario than one where it is scoped to memory.

One last (likely academic) point is that if you're using DbContext pooling, and care about every little bit off perf, then you're better off not using Set(),

This is so funny to me as I was doing exactly that until I read this section:
https://devblogs.microsoft.com/dotnet/announcing-entity-framework-core-6-0-preview-5-compiled-models/#dbcontext-instantiation

Which says to use Set<T> rather than named methods to speed up initialization. :)

I'd have to look in a memory profiler again.

Great, thank you for any time/consideration you can provide. 🙏

@Mike-E-angelo
Copy link
Author

Mike-E-angelo commented Aug 22, 2021

2KB does seem a little high for an empty non-tracking query

All in the details, isn't it? :) This made me realize that the benchmarks I provided were not calling AsNoTracking. So I added them to a set of new benchmarks, and adding this call actually adds even more allocations to pooled/transactional scenarios, by 2KB:

Method Mean Error StdDev Median Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
Scoped 20.40 us 0.406 us 1.042 us 20.03 us baseline 0.7019 - - 6 KB
Pooled 35.26 us 0.514 us 0.429 us 35.23 us 1.70x slower 0.11x 1.4648 - - 12 KB
Transactional 69.53 us 1.369 us 2.538 us 68.68 us 3.38x slower 0.24x 2.9297 - - 25 KB

Note, too, that the times are affected adversely as well.

@roji
Copy link
Member

roji commented Aug 22, 2021

As I attempted to describe above, this is the whole point. In a scoped user/circuit session, an IQueryable instance can be defined and compiled when the session/circuit is created and then stored for the entire duration of that scoped session. So, there is a one time hit of defining and storing that query. After that, it is in memory and is not needed to be redefined/recompiled when the DbContext is retreived (as it, too, is a Scoped instance). In fact, by using this method and approach, the DbContext is never used for any querying and it's only the IQueryable instances that are accessed at when needed at runtime.

You may need to understand better how EF actually handles queries. Composing LINQ operators over a DbSet doesn't compile them or optimize them in any way - it simply constructs an expression tree in memory; this isn't really EF yet, it's just what the LINQ operators do (e.g. Where). In any case, when that expression gets evaluated (via ToArray or similar), if checks its internal cache to see if the query has been compiled before. If so, it skips most of that and executes directly, otherwise it needs to go through the heavy process of query compilation. The crucial bit is that the query cache is not tied to a specific DbContext instance.

It's certainly possible that for very optimized queries using the InMemory provider (or even a real database), the process of composing a LINQ query itself starts to show up in benchmarks. If you want to avoid that, use EF Core's compiled query feature - this compiles a (fully composed) query once, and gives you back a function which you can invoke multiple times with different DbContext instances.

Finally, I haven't looked at your changes, but here's a quick benchmark of my own which shows different results (code at the bottom):

BenchmarkDotNet=v0.13.0, OS=ubuntu 21.04
Intel Xeon W-2133 CPU 3.60GHz, 1 CPU, 12 logical and 6 physical cores
.NET SDK=6.0.100-preview.7.21379.14
[Host] : .NET 6.0.0 (6.0.21.37719), X64 RyuJIT
DefaultJob : .NET 6.0.0 (6.0.21.37719), X64 RyuJIT

Method Provider Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
Same SqlServer 399.361 us 7.7786 us 8.9578 us 0.9766 - - 4 KB
Pooled SqlServer 399.124 us 7.8558 us 16.0474 us 0.9766 - - 4 KB
Same InMemory 6.927 us 0.0494 us 0.0462 us 0.7401 - - 3 KB
Pooled InMemory 7.632 us 0.1446 us 0.1547 us 0.7477 - - 3 KB

The part about SqlServer vs. InMemory is quite crucial; if you're looking at pure percentages, that pooling may seem expensive compared to Same. The moment you throw a real database in there, things start to look a bit different. Note that this run used 6.0.0-preview7.

This is so funny to me as I was doing exactly that until I read this section:
https://devblogs.microsoft.com/dotnet/announcing-entity-framework-core-6-0-preview-5-compiled-models/#dbcontext-instantiation

Which says to use Set rather than named methods to speed up initialization. :)

That advice is correct as long as DbContext pooling isn't being used. We could amend this, but we're really into micro-optimization here, which doesn't matter to all but the most high-perf applications.

Benchmark code
BenchmarkRunner.Run<Program>();

[MemoryDiagnoser]
public class Program
{
    private BlogContext _reusableContext { get; set; }
    private PooledDbContextFactory<BlogContext> _factory { get; set; }

    [Params(Providers.InMemory, Providers.SqlServer)]
    public Providers Provider { get; set; }

    [GlobalSetup]
    public async Task Setup()
    {
        var options = Provider == Providers.InMemory
            ? new DbContextOptionsBuilder<BlogContext>().UseInMemoryDatabase("foo").Options
            : new DbContextOptionsBuilder<BlogContext>().UseSqlServer(@"Server=localhost;Database=test;User=SA;Password=Abcd5678;Connect Timeout=60;ConnectRetryCount=0").Options;
        _reusableContext = new BlogContext(options);
        await _reusableContext.Database.EnsureDeletedAsync();
        await _reusableContext.Database.EnsureCreatedAsync();
        _factory = new PooledDbContextFactory<BlogContext>(options);
    }

    [Benchmark]
    public Blog[] Same()
    {
        return _reusableContext.Blogs.AsNoTracking().ToArray();
    }

    [Benchmark]
    public Blog[] Pooled()
    {
        using var context = _factory.CreateDbContext();
        return context.Blogs.AsNoTracking().ToArray();
    }

    public class BlogContext : DbContext
    {
        public DbSet<Blog> Blogs { get; set; }
        public BlogContext(DbContextOptions options) : base(options) {}
    }

    public class Blog
    {
        public int Id { get; set; }
        public string Name { get; set; }
    }

    public enum Providers { SqlServer, InMemory }
}

@Mike-E-angelo
Copy link
Author

use EF Core's compiled query feature - this compiles a (fully composed) query once, and gives you back a function which you can invoke multiple times with different DbContext instances.

EXCELLENT. That is indeed the missing piece here in my world. Allow me to look into this in addition to your benchmarks, and I will get back to you here when I have a better understanding.

Thank you for taking the time to provide the above valuable information and for the informative discussion, @roji. It is much appreciated. On a weekend no less. 😁

@roji
Copy link
Member

roji commented Aug 22, 2021

Sounds good, am happy I could help. Also always interested if you find odd perf tidbits that could be optimization opportunities.

@Mike-E-angelo
Copy link
Author

OK I really wish I could mark these posts as "answers" as I am so very happy to declare that we have one now.

To start, @roji your benchmarks above are very much like how I started with the very first benchmark results posted in my original/first post here. When no expressions are applied, Scoped/Pooled are very similar.

However, when expressions are applied, there's a huge deviation which was my concern BUT NOW I have an answer! Check out scoped vs. compiled query!

Method Mean Error StdDev Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
Scoped 20.992 us 0.4136 us 0.9585 us baseline 0.7324 - - 6 KB
Pooled 4.810 us 0.0939 us 0.1489 us 4.39x faster 0.28x 0.2213 - - 2 KB

Peep those metrics. 👀 By using a compiled query, I can store that as a singleton (replacing my IQueryable instances) and then run it through pooled DbContext whenever required. This is exactly what I was looking for, both in design and performance.

I am so happy I wrote in now. But honestly, I hope it's the last time I have to do so. 😆 You all are so amazing and I hate to be a burden. However, with this major upgrade in my world it was worth risking the time and I landed on exactly what I was looking for.

So, thank you once again, this is perfect!

@roji roji added the closed-no-further-action The issue is closed and no further action is planned. label Aug 23, 2021
@ajcvickers ajcvickers reopened this Oct 16, 2022
@ajcvickers ajcvickers closed this as not planned Won't fix, can't repro, duplicate, stale Oct 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed-no-further-action The issue is closed and no further action is planned. customer-reported
Projects
None yet
Development

No branches or pull requests

3 participants