-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High gen0 collect overhead with (suppressed) finalizer objects #48937
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
Tagging subscribers to this area: @dotnet/gc Issue DetailsOverviewWhen allocating many objects with a finalizer present, there is a non-negligible overhead on GC gen0 collects, even if the finalizer has been suppressed via This is causing our real-time application to hitch on every gen0 collection (around 5-10ms pause time). These gen0 collections only happen every 20-60 seconds. Use CaseA bit more information on our situation, in case it helps to put things into perspective. Our team develops a rhythm game/framework which requires sustained very low latency execution. We have recently been tracking user reports of occasional high frame times which align with GC invocation, specifically gen0 collections. Having worked with .NET (framework / core) for several decades, I have a general idea of what to expect in terms of gen0 collection performance, and the numbers we are seeing are much higher than expected, in the range of 5-15ms per collection with low (<1MB/s) alloc throughput and near zero promotion. One cause turned out to be a texture upload class we have, which rents memory from With our findings here, it seems that finalizers should be avoided in such cases, where objects are constructed in abundance. This is our general direction to resolve this issue, for what it's worth. Reproductionusing System;
namespace TestBasicAllocs
{
public static class Class1
{
public static void Main(string[] args)
{
bool finalizers = args[0] == "1";
for (int i = 0; i < 10000000; i++)
{
if (finalizers)
{
var thing = new FinalizingThing();
GC.SuppressFinalize(thing);
}
else
new NonFinalizingThing();
}
}
}
public class NonFinalizingThing
{
public NonFinalizingThing()
{
}
}
public class FinalizingThing
{
public FinalizingThing()
{
}
~FinalizingThing()
{
}
}
} Results
I am writing this issue up without a clear distinction of whether this should be considered a bug, performance issue, or an accepted (and potentially better-documented) hidden overhead of finalizers. I have read through .NET memory performance analysis documentation, which does mention that finalizers should be avoided, but also that calling Also worth noting that while memory analysis guides and profilers like Jetbrains dotMemory will highlight finalizers that were not suppressed, it cannot provide visibility and does not find issue with large numbers of allocations of objects with finalizers present in general. Maybe in the majority of cases this pause overhead is considered acceptable, but do consider that the above benchmarks are cases where gen0s are happening quite regularly. In our actual usage we have seen pause times as long as 30-50ms due to the same underlying issue, which implies that this overhead is not part of the consideration as to how often to run gen0 collects. I have tested against net472, .net core 3.1/3.2/5.0 and this is not a regression.
|
The typical pattern that we use in the core libraries is to just let garbage collector to take care of cleaning up after rare error cases. I do not think we have any place in the core libraries where we have a finalizer just to return memory to the array pool. |
@jkotas That is what I could see in my investigation (the lack of finalizer), We have a few other cases where we do use finalizers for non- Is it safe to say that your advice here is that finalizers should be avoided at all costs, even when |
The pooled arrays are just regular arrays. Once they become unreachable, they will get collected just like regular array. Of course, you do not want to depend on the pooled arrays getting collected all the time since it would effectively disable pooling. It is fine to depend on GC to collect the pooled array in rare circumstances. Finalizers make sense on types that are holding unmanaged resources if you would like to make these types robust against incorrect use. For public types, .NET design guidelines recommend using SafeHandles for unmanaged resource lifetime management. SafeHandles take care of finalization if resource is not disposed properly, but also protect against race conditions between use and disposal. |
Having worked with wincrypt and other unmanaged apis, my 2 cents is that you should only use finalizers when it's absolutely necessary. For example, when array is not returned to pool there is no risk of application blowing up, only risk of a slowdown and that could be detected by some kind of script runner and collecting GC/fps/other info in your building system. But when unmanaged resources are not released it could lead to memory leaks, dangling handles, unexpected failures. Those are much harder to detect and debug afterwards. And on top of that, I understand gamedev imposes very harsh requirements on latency and execution speed. I would recommend using some kind of ref struct wrappers and few self made analyzers to detect their proper usage. |
Hi, I'm working on the same project referenced in the OP. One case that we've recently discovered is This was being done for graphing runtime work diagnostics, and thus we'd like to keep this a real-time/per-frame process. The biggest problem we're having is that there's very little visibility on this, besides the raw number of objects allocated. We can get close by using ETW events and inspecting the types to see if they have finalizers, but it's by no means an exhaustive solution considering For example, if PerfView provided a count of finalizable objects at GC time, it would help track these issues down. If |
hi @peppy, thanks for your report. a comment -
so that's not exactly what I said in the doc 😃 this is what I said in the Finalizers section -
it just says GC will not need to promote it. it doesn't say GC would not need to scan it, and this text is about scanning -
having said that, I completely agree we should make this more diagnosable. I did talk about the internals of finalization in this blog post where I mentioned the method names related to scanning are mentioned - you could look those up before we provide an easier solution. we can provide this time via events like we already do in the mark phase for other types of scanning (stack, handle table, etc). would that be sufficient for you? hi @smoogipoo, right now we do log the type of the object that's being finalized in the FinalizeObject event if you have the type keyword enabled on informational level. can you please try that and see if the type for those objects show up for you? of course for your specific case, you will not see these events for WeakReference because for WeakReference we don't actually run their finalizers - GC just take a shortcut and frees the handle right away. what I mentioned above was we could fire an event that tells us how much time we spent in the finalization related scanning. that's another way. we can also include how many objects we scanned. |
@Maoni0 Thanks for your reply; glad you found this issue (and thanks for your guide, it is a great read). Also apparently I am blind since I missed that specific part about scanning. I think the structure of the document may have played part in that – the "Finalizer" section header is a weird bullet point and is also smaller than the sub-headers following it, making them look like a new section. |
@peppy I agree - I really struggled with the formatting since markdown offers pretty primitive options. I tried to make the line that says Finalizer part be more prominent by adding some kind of symbol but then it wouldn't let me make it a link sigh :) maybe I'll get one of the folks on my team who's better at this stuff to help me with formatting this better. |
@Maoni0 Thanks for letting me know about the
This would definitely help, as we're still trying to determine whether all remaining overhead is related to finalizers or not - they're just the biggest clue we have right now. Part of it is that we don't have full coverage due to WeakReference not being output as you mentioned, which is unfortunate given one of the core parts of the project is data bindings via WeakReference. Having time spent in finalization scanning would tell us if we're being led astray by this. |
@smoogipoo, @ivdiazsa and @cshung will be working on adding this info to the GC events. |
@Maoni0 Just coming back to this issue, I've tried quite a few things to get an idea of what's going wrong. One thing that I found was your blog post which mentioned using PerfView: https://devblogs.microsoft.com/dotnet/you-should-never-see-this-callstack-in-production/
The most I can drill down to is I suspect this may be because we're getting 97% broken stacks, with the tool suggesting to either:
However, PerfView does give GC stats when run as above on just the game framework itself ( Last time I tried it, Am I missing something with PerfView? Is there an issue that I can report somewhere? |
what are you trying to do exactly? if you want to see the CPU samples while a GC is happening broken stacks shouldn't matter since you just care about the native part of the stacks. if perfview doesn't show the native stacks maybe @brianrob could take a look but he's OOF this week. also if possible, it would help if you could share the trace. |
The primary question I'm trying to answer is why there's a large spike in Gen1 time in very similar application states. If I can answer this then I can start digging into the GC myself and check for reasons why that path may be getting hotter: ... Or why it's even hot in the first place (see below). I can't repro these results in an isolated context no matter how much Gen0/1 allocations I do, so I don't think it's related to the small 10MB/s being allocated. Additionally, I've had a suspicion for some time that it's due to our extensive use of
This is all theoretical at the moment because I don't know where to look without the native stacks.
Right. So then the issue is just that I'm not getting GC native stacks. Here's the traces I used for this post: PerfViewData.zip |
puzzling...I'm looking at your perfviewdata.etl,zip and I can see the callstacks just fine - this is showing the stacks for GC#158, a gen1 GC that took 24.5ms - in general your gen1 GCs are taking longer because they simply survived a lot more. it's interesting that find_first_object is taking a significant amount of time. another puzzling thing with the trace is I don't see a FileVersion event for the coreclr.dll you are using in the osu! process, could you please let me know which version you are using? the trace shows that you are using this version of coreclr.dll in one of the dotnet processes (pid: 7776) -
of course you could be using a different version in the osu! process. we did do some improvement in 5.0 that would help with the perf of find_first_object but we could do more but the biggest reason is as I pointed out above that GC just needed to do more work. to make progress on this it would be very useful to step back a bit and help me understand what your perf goals are - do you care mostly about the GCs that are >10ms which really means your gen1 GCs, or do you care about the long gen1 GCs? also another question is would it be possible to run your workload on the current 6.0 build? that would allow us to do experiments a lot more easily if we need to. |
Taking a look from a high level:
So i guess you could say we care about any single GC operations that run longer than 4ms, optimally. Not specifically gen1 but any blocking GC calls.
Yes, we can run on 6.0 releases without issue. |
@Maoni0 It looks like the reason I wasn't seeing the GC stacks is an issue on my side - PerfView automatically populates the "GroupPats" textbox which filters them out. I'm now able to see the GC stacks after clearing that textbox :)
Looks to match the event you have there:
I've done some more profiling. As requested, I've done this also for net6.0. This time I have two versions:
net5.zip All captures were taken with roughly the same reproduction steps, so they should be comparable. There doesn't seem to be much of a difference between net5 and net6 as far as I'm seeing, but I'll defer to you on that. I've written a small test program which allocates ~10MB/sec with varying numbers of finalizable objects and posted the results here: https://gist.github.com/smoogipoo/4c44af65bbf6fb4cbaea3fae29bef504 |
it looks like you have concurrent GC disabled, is that the case? |
I thought that concurrent GC was turned on by default (that's what this doc says: https://docs.microsoft.com/en-us/dotnet/core/run-time-config/garbage-collector). Is that not the case anymore? We don't turn it off, the only setting we change is I've forced Edit: As for the latency mode, we're currently discussing because it looks like for during important sections |
it is on by default. but somehow it's turned off for you. the gen0 budget is supposed to be 6mb max for workstation GC but yours is 32MiB. do you have any GC configs set? can you take a look under the debugger if wks::gc_heap::gc_can_use_concurrent is true or false? |
my bad - I'm doing this
I mistakenly thought I was capping this the other way (max to min). usually folks run client apps on machines with small cache sizes. so that means your gen0_min_size is larger than 6mb then. one thing you could do is to limit the gen0 max budget with the |
@Maoni0: Sorry for the wait, it looks much better with 6MB max Gen0. Here's a log (GDrive link because GitHub seems to not like the 28MB zip): https://drive.google.com/file/d/1CcGcbHCD4a4Rx727InxzFnHbXkmTiBY1/view?usp=sharing I did some frame time analysis with different budgets: I'm not sure if this is related, but I noticed that up to 32MB Gen0 is allocated from the very start of the application during which only some EFCore population + loading and JIT compilation is taking place (checked via dotMemory/dotTrace): |
that's great to hear! I do have a couple of other observations -
|
I've attempted to do the gen aware analysis, but I've run into a few problems. I also tested with the blog post's code and here's what I found:
This is quite hard to answer because the application has many stages. It's a game so the time that any individual user keeps it open is variable, but generally not longer than 48hrs. Stutters in menus and overlays are not desirable but understandable. For example, when loading online data to display in an overlay, it's understandable that the GC would collect/promote more. Regardless, here's a few GCCollectOnly traces I captured:
Not sure what I should be looking at here to move forward. |
@cshung could you please help @smoogipoo? |
@smoogipoo I think I figured out what is going on with the generational aware analysis. It looks like the view is guarded against the And the For the short term, we should be able to workaround that limitation. A cleaner approach is to build your own PerfView with that check removed. If building is too much work, we might be able to fake PerfView by adding Longer term, |
@cshung, no I don't think that these views need to be under this check. Feel free to post a PR to remove the check. |
First of all, thanks everyone for your continued interest and support in this issue. I have compiled PerfView myself and looked into the gen-aware analysis. This is going to be a long one - I have some promising and instructive results but still not 100% sure what the path forward is. (note: In all of the following, the Gen1 with large Pause Time is the one that was gen-aware analysed) DataTraces: https://drive.google.com/file/d/1mJUfpnmxswdejP5HaYpsPPnpRDITyVQ9/view?usp=sharing Relevant osu-framework branch: https://github.com/smoogipoo/osu-framework/tree/custom-vbo-array-pool Initial gen-aware analysisFirst I did a gc-gen aware analysis on our master/mainline branches:
(data: This makes sense, the top two entires are related to Perhaps instructive: Note the difference in magnitudes of the Replacing ImageSharp pools with unmanaged memoryInitial replacement of the ImageSharp pools with .NET pools didn't lead to a noteworthy improvement, so I took it a step further and converted this particular code to use unmanaged memory that's completely hidden away from the GC (apart from add/remove memory pressure which my experience tells me is good practice when using
(data: Nothing much seems to have changed here. We're promoting much less now (less than half of I also tested with not applying memory pressure, which didn't lead to any Gen1 difference other than induced Gen2s not occurring (which seem to be costly here) (data: After reaching this dead end, I decided to look into the root Disabling PoolableSkinnableSampleIt's not quite easy for us to "fix"
(data: This looks promising. The Gen1 times are getting very close to acceptable levels (keep in mind we're aiming for 4ms). And combining this change with the unmanaged arrays from above:
(data: There's still a few spikes - the above traces show Gen1s taking >8ms later in the snapshots, but this is definitely better. ConclusionThis is where my analysis stops, anything more from here is extremely challenging and requires significant re-architecting of our code to test. But the main takeaway from the above is that, if I'm understanding the data correctly, the count of objects promoted ( Am I correct in thinking that I'll definitely report back on our progress and with further questions and detailings - I'm not entirely sure how to fix even |
These I suspect somewhere in the code we have a delegate (which could look like a lambda) being set to a static field, the delegate itself is an instance method, leading the system also rooting the object hosting the method as its closure. This is a typical reason why memory is leaked.
With your comment, I suspect the whole pool is leaked into gen2 through the above mechanism. If you could take a crash dump, using We could just bet on a random dump, but if we are wanted to increase the odds of finding it, here is the perfect time to capture that dump. At this point, the gen-aware analysis reaches the point it captured the spiking GC. If we have a leaking delegate, the delegate is set right before this GC. runtime/src/coreclr/vm/gcenv.ee.cpp Lines 1644 to 1662 in 27baae9
|
You're right, this is just an event handler local to a parenting class. These
The event handler is this one: https://github.com/ppy/osu/blob/2b4649a3ea2a919f8869a2e66854448b05780d01/osu.Game/Skinning/SkinProvidingContainer.cs#L22 Here's a simplified form of what's going on: using System;
using System.Collections.Generic;
using System.Linq;
namespace Demo
{
class Program
{
static void Main(string[] args)
{
var samplePool = new DrawablePool<PoolableSkinnableSample>(0);
var skinProvidingContainer = new SkinProvidingContainer();
// We have a recursive data structure of Drawable objects.
var hierarchy = new Drawable
{
Children =
{
new Drawable
{
Children =
{
//... This continues on nesting with different Drawable types for a while, and at some point:
skinProvidingContainer,
samplePool,
}
}
}
};
// Some time passes...
// Note: At this point you can assume that everything above here is guaranteed to be in Gen2.
// Objects are retrieved from the pool and added to the hierarchy.
skinProvidingContainer.Children.Add(samplePool.Get());
skinProvidingContainer.Children.Add(samplePool.Get());
// Within a few seconds afterwards, they're removed from the hierarchy and returned to the pool.
// Note: event subscriptions aren't removed here. This is intentional.
// We have methods on get/return where we could move the subscriptions/unsubscriptions to, however since there are very little objects in total we think it's better to
// forego this and any additional handling required as a result, such as what if the event was called before it was re-bound to, just for simplicity.
var children = skinProvidingContainer.Children.OfType<PoolableSkinnableSample>().ToArray();
skinProvidingContainer.Children.Clear();
foreach (var c in children)
samplePool.Return(c);
// Note that there are two ways which the samples could have leaked into Gen2 here:
// 1. Via the event handler subscription.
// 2. Via them being returned to the pool.
// Immediately, or within a few seconds afterwards, they're retrieved from the pool and added back to the hierarchy.
skinProvidingContainer.Children.Add(samplePool.Get());
skinProvidingContainer.Children.Add(samplePool.Get());
// The above process repeats many times...
// And much, much later on everything clears up and all events unbound/etc.
// Note: This never occurs in the traces I provided. It's intentional as I'm not really interested this far out (only within the 3 minute windows provided).
hierarchy.Dispose();
}
}
public class Drawable : IDisposable
{
public readonly List<Drawable> Children = new List<Drawable>();
public void Dispose()
{
GC.SuppressFinalize(this);
Dispose(true);
foreach (var child in Children)
child.Dispose();
Children.Clear();
}
protected virtual void Dispose(bool isDisposing)
{
}
}
public class DrawablePool<T> : Drawable
where T : Drawable, new()
{
private readonly Stack<T> pool = new Stack<T>();
public DrawablePool(int initialSize)
{
for (int i = 0; i < initialSize; i++)
pool.Push(new T());
}
public T Get()
{
if (pool.TryPop(out var obj))
return obj;
return new T();
}
public void Return(T obj) => pool.Push(obj);
protected override void Dispose(bool isDisposing)
{
base.Dispose(isDisposing);
foreach (var pooledChild in pool)
pooledChild.Dispose();
}
}
public class SkinProvidingContainer : Drawable
{
public event Action SourceChanged;
// ... Other logic in this class
}
public class PoolableSkinnableSample : Drawable
{
private SkinProvidingContainer parentSkinProvidingContainer;
// Assume this is automatically called sometime after ctor() and definitely before Dispose(), via our dependency-injection mechanism.
private void load(SkinProvidingContainer parentSkinProvidingContainer)
{
this.parentSkinProvidingContainer = parentSkinProvidingContainer;
parentSkinProvidingContainer.SourceChanged += sourceChanged;
}
private void sourceChanged()
{
// Do something.
}
protected override void Dispose(bool isDisposing)
{
base.Dispose(isDisposing);
if (parentSkinProvidingContainer != null)
parentSkinProvidingContainer.SourceChanged -= sourceChanged;
}
}
} I don't know how to improve this process because there's two leaks to Gen2:
Options I'm seeing right now include:
Realistically I don't think we can resolve this at our end with local optimisations without considerable effort or resorting to the |
@Maoni0 spotted that in the
GC 34 is an obvious outlier, it is true that we are having about 3 times more objects and 2 times more bytes than the previous GC, but that doesn't explain why we are pausing for 30x amount of time. The rest look more normal, although we see there can be an occasional spike in If we see cases like id = 34 happen consistently, we might want to drill deeper to see what happens there. Maybe we should consider taking CPU samples to figure out what happened exactly during that 200ms. |
That pause (id = 34) is so long only because it's the one which gen aware analysis triggered on. |
@smoogipoo what is your current goal? are you aiming to make the other gen1 GCs with higher than usual pause times shorter? or just all gen1 GCs shorter? right now you have at least 3 gen0 GCs for 1 gen1 GC. you could experiment with making the gen1 budget smaller (we don't expose this as a config right now) but I think you are already building the runtime right? so you could experiment with this. I don't generally advise folks to mess with this but if you are in a very controlled environment you can experiment (essentially the LowLatency mode is setting gen0/gen1 budgets very small but they are too small for you). in |
The goal is for GC pauses to not take longer than 4ms, regardless of collected generation. The lower we can go without completely crippling the GC, the better. I've focused a lot on Gen1 because those collections are the outliers, but the Gen0 times we're seeing are also quite significant. Ideally we'd like to have a completely time-bounded GC. For example, Unity recently released an "incremental GC" where you can set a time slice for each increment, however I understand that this is a huge feat and (probably?) not in the scope of .NET in the short or long-term future. So instead, I want to understand why GC pauses are taking so long in the first place, and whether it's something we can improve on from our side:
As you've said and we've discovered previously, it does seem like the bulk of our issues come down to gen0/gen1 budgets. The problem is that this isn't something we can set at runtime, and as this is a user-facing game, we can't tell each and every user to adjust their GCGen0MaxBudget. Our only path forward would be to create a custom loader for our application, which feels a bit ugly. The Java G1 GC has a
I'm not currently building the runtime, but I'll look into it. |
Unity's GC is non generational and non compacting. doing incremental on such a GC isn't that much work. our GC is both generational and compacting which makes doing incremental a much, much harder job (a mark and sweep only GC wouldn't be able to handle the kinds of workload we get on .NET). having said that, it doesn't mean we don't have a goal to do it. you are correct of course in saying that specifying a max budget is not as flexible as specifying a pause limit. our regions work in .NET 6 is to build a foundation for future work like providing a soft target pause time. G1 uses regions. to understand the cost of a GC, I recommend to read this and this section in mem-doc.
is there a reason why you can't use configs to specify these (right now they are not specifiable with runtimeconfig but we can make it so you can specify them that way. it's trivial amount of work. that's the standard way you add configs to .net core apps)? |
Great to hear progress is being made in this area! Using runtimeconfig sounds like a good solution. |
@smoogipoo let me know if you hit problems building coreclr yourself. feel free to send me email if you need help (my email alias at work is maonis). |
Just to confirm one last thing that's been on our minds - if we have 2 million objects in gen2, could this in any way affect the performance of promotions from gen0/gen1 to gen2 (via the leaking seen above), just due to the count of objects already in gen2 alone? |
Here's more data as requested. These measurements are a little different from the above in terms of the state of the game and pressure on the GC, but the same results are exhibited. The first and last 5 seconds of the traces can be mostly discarded as load times. Base: base.nettrace.zip All above are with Base with
Lastly, I also gave regions a shot, and you probably don't want to hear about it yet since it's still WIP, but I'll document my findings anyway:
|
This is definitely interesting, I would love to be able to reproduce these failures and get them fixed. |
I finally figure out a potential reason why it doesn't work for you. On Linux, the environment variables are case-sensitive, the right casing is Although my blog post is Windows-centric, I updated the environment variable part so that if someone follows along and wanted to experiment on Linux, they will copy the right casing there. |
Recently, I've noticed that we're getting different performance characteristics on Linux and Windows on identical hardware (AMD Ryzen 3950x). I've run tests similar to the above in an even more isolated environment to attempt to reproduce the results. The most telling (to me) was the difference in frame times (how many milliseconds a game's render loop takes):
As above, the Here's the traces: Collected with Is this a potential issue with the GC tuning too aggressively on Linux?
I took another look into this since I've been following the recent GC-related PRs, and can still repro it. Here's a script to reproduce with: https://gist.github.com/smoogipoo/1b87b3b518095199db0993133550abbe It should crash within 30 seconds - I've had it both segfault and exception at multiple areas. |
Overview
When allocating many objects with a finalizer present, there is a non-negligible overhead on GC gen0 collects, even if the finalizer has been suppressed via
GC.SuppressFinalize
. The hypothesis is that this is due to the emptying of the finalizer queue.This is causing our real-time application to hitch on every gen0 collection (around 5-10ms pause time). These gen0 collections only happen every 20-60 seconds.
Use Case
A bit more information on our situation, in case it helps to put things into perspective.
Our team develops a rhythm game/framework which requires sustained very low latency execution. We have recently been tracking user reports of occasional high frame times which align with GC invocation, specifically gen0 collections.
Having worked with .NET (framework / core) for several decades, I have a general idea of what to expect in terms of gen0 collection performance, and the numbers we are seeing are much higher than expected, in the range of 5-15ms per collection with low (<1MB/s) alloc throughput and near zero promotion.
One cause turned out to be a texture upload class we have, which rents memory from
ArrayPool
and returns on disposal. This class may be constructed every frame for streaming texture data. While we do our best to explicitly dispose after consumption, it has a finalizer implemented as a safety measure, to ensure the memory is returned to theArrayPool
no matter what (I think this is a pretty common practice).With our findings here, it seems that finalizers should be avoided in such cases, where objects are constructed in abundance. This is our general direction to resolve this issue, for what it's worth.
Reproduction
Results
dotnet-trace collect -- .\bin\Debug\net5.0\TestBasicAllocs.exe 0
dotnet-trace collect -- .\bin\Debug\net5.0\TestBasicAllocs.exe 1
I am writing this issue up without a clear distinction of whether this should be considered a bug, performance issue, or an accepted (and potentially better-documented) hidden overhead of finalizers. I have read through .NET memory performance analysis documentation, which does mention that finalizers should be avoided, but also that calling
GC.SuppressFinalize
should recover all but the allocation overhead. Similar information is present in "official" documentation and user comments, but I have been unable to find anything referencing a gen0 collection-time overhead.Also worth noting that while memory analysis guides and profilers like Jetbrains dotMemory will highlight finalizers that were not suppressed, it cannot provide visibility and does not find issue with large numbers of allocations of objects with finalizers present in general. Maybe in the majority of cases this pause overhead is considered acceptable, but do consider that the above benchmarks are cases where gen0s are happening quite regularly. In our actual usage we have seen pause times as long as 30-50ms due to the same underlying issue, which implies that this overhead is not part of the consideration as to how often to run gen0 collects.
I have tested against net472, .net core 3.1/3.2/5.0 and this is not a regression.
The text was updated successfully, but these errors were encountered: