Provide API to understand how many allocations have happened #17891

terrajobst · 2016-07-19T21:35:16Z

As discussed in https://github.com/dotnet/coreclr/issues/6275 we would like an API to expose how many bytes were allocated since the beginning of a thread's life time.

What about an API like this:

namespace System
{
    public static class GC
    {
        public long GetAllocatedBytesForCurrentThread();
    }
}

The usage could like this:

[Fact]
public void EnsureAllocations()
{
    var allocationsBefore = GC.GetAllocatedBytesForCurrentThread();

    var str = "{'Items':[" + 
        string.Join(",", Enumerable.Range(0, 3000).Select(i => "{}")) + "]}";
    var jObject = JObject.Parse(str);

    var allocationsAfter = Thread.GetAllocatedBytesForCurrentThread();
    var allocations = allocationsAfter - allocationsBefore;

    const long MaximumBytesAllowed = 200 * 1024;
    Assert.LowerOrEqual(allocations, MaximumBytesAllowed);
}

Questions:

Can/should this method track bytes per native OS thread or for the logical call context?
Does this track memory across async boundaries? Should it?

@Maoni0

Updates

Based on @vancem's concerns we've decided not to add GetAllocatedBytesForThread(int managedThreadId).

The text was updated successfully, but these errors were encountered:

Maoni0 · 2016-07-19T21:51:12Z

Thanks @terrajobst.

This tracks allocations on the GC heap for the managed thread and does not track memory when the execution moves to another thread.

For each managed thread, it maintains an allocation context - allocations from this thread will be managed by this context and the number will be reported based on certain values in this context.

terrajobst · 2016-07-19T21:59:47Z

When you say "thread" I think you mean whatever the given runtime decides to use to implement the System.Thread type. Which could be fibers but are usually threads.

But what happens for tasks and async? Some information, such as synchronization context, is passed via the call context and thus flows across thread boundaries so that it doesn't matter if the given async operation completes on the same thread or a different thread. I assume allocation information wouldn't flow?

Maoni0 · 2016-07-19T22:25:29Z

When you say "thread" I think you mean whatever the given runtime decides to use to implement the System.Thread type.

Yes.

But what happens for tasks and async? Some information, such as synchronization context, is passed via the call context. I assume this information wouldn't flow?

The allocated bytes is an attribute on a thread - it has no concept of tasks/async. If you want to know how many bytes a method that involves some async stuff allocates, you can inspect the synchronization context yourself to follow how the execution switches threads and get the allocated bytes for thread it switches to and add them together.

ayende · 2016-07-19T22:41:34Z

I suggest that the name would be: GetTotalAllocatedBytes(), instead of GetCurrentAllocatedBytes(), because it represent the sum total of memory allocated by this thread (as I understand it), not the current allocated (but not GCed) memory.

Maoni0 · 2016-07-19T23:04:43Z

I don't have strong opinions on the name. I'll leave that up to @terrajobst.

This represents the allocated bytes, not survived bytes. Allocated is not taking GC into consideration while survived is. So if you do

ulong currentAlloc = Thread.NewGetAllocAPI();
MyClass o = new MyClass(); // this allocates 100 bytes.
// do stuff with o but no other allocations
ulong allocated = Thread.NewGetAllocAPI() - currentAlloc;

allocated will be 100 regardless of whether a GC happened inbetween and possibly cleaned up o.

svick · 2016-07-19T23:10:13Z

Should this method be on Thread? There is already a somewhat similar method GetTotalMemory(bool) on the GC class. Maybe the two methods should stay together? In that case, the name could be something like GC.GetCurrentThreadTotalAllocatedBytes() (though that is a fairly long name).

ayende · 2016-07-20T05:16:40Z

What about an overload? GC.GetTotalMemory(Thread thread) ?

ayende · 2016-07-20T05:17:22Z

Being able to ask that on another thread is interesting and give us the chance to do some interesting metrics (what is the allocation rate per thread, so we can make decisions on that).

mellinoe · 2016-07-20T17:33:01Z

I'm not sure we can / want to have System.GC depend on the higher-level System.Threading.Thread concept. UWP currently doesn't actually support Thread (which we should fix), but even still I'm not sure it's the best dependency to bring in.

terrajobst · 2016-07-21T00:49:42Z

I agree with @mellinoe, having that on the GC type doesn't seem to be the right place -- especially because as @Maoni0 outlined that this API wouldn't take GCs into account. So I think Thread is more suitable.

However, it seems @ayende raised good points on naming and versatility, so what about:

namespace System.Threading
{
    partial class Thread
    {
        public ulong GetTotalAllocatedBytes();
    }
}

Maoni0 · 2016-07-21T21:06:26Z

Unless you've stashed the Thread object somewhere it will be pretty expensive to get to the object via CurrentThread. And in many cases we do not expose the Thread object. So I would provide both a static method for getting the current thread's alloc bytes and an instance method for people who have the Thread object stashed somewhere.

clrjunkie · 2016-07-22T19:49:22Z

Any specific reason why this should be implemented as a Method instead of a Property?

Does this take in to account native allocations as well? If not, I think having “Total” in the name can be confusing as well.

I suggest keeping it short:

Thread.AllocatedBytes

Or

Thread.ManagedHeapAllocatedBytes (in case native memory is planned to be tracked separately in the future)

I think in this context the past tense clearly conveys the notion of something being accumulated.

Maoni0 · 2016-07-25T18:53:10Z

this does not include native allocations.

vancem · 2016-07-25T19:22:38Z

Some comments.

I don't believe that this API should be on System.Thread. Thread really is a very FUNDAMENTAL concept, so you don't want it's property-space polluted with things like this (there are plenty of other metrics that you might also want to track). Conceptually performance metrics are LESS fundamental (you can live without them (as we have been doing)).

I don't think it is bad to put this in the GC class because GC does not really mean GC so much as 'managed memory'. However you certainly could make the argument that since these APIs are not fundamental, they should go in their own class (e.g. GCHeapMetrics).

On the question of whether this tracks native allocations: NO. For several reasons

It is not usually intersting
It is hard to implement (there is no one choke point to instrument).
It is confusing (How can you explain the number to people (how do they know it is correct?). With GC memory, the allocation events give a precise breakdown that should 'add up'.

Should it track over async: BOTH

Basically we need two APIs (or a flag on one API). As Manoi indicates the easiest implementation ignores Task hops, but that makes using this for interesting scenarios like allocations per service request (which almost certainly will use async), problematic. The async-aware version is built on top of the other by using AsyncLocals to 'flow' the lower level information (which does have a cost, so you don't want to do it if don't want it)

That fact that there is overhead actually argues for a different API. Instead of this being 'always on' we really should have a object that 'holds' the current value. Thus we should really have GCAllocationTracker, and if it is alive then tracking happens, and if none are alive then you can avoid the overhead of tracking. This generalizes nicely to other performance tracking (which may also have overhead). For example, in the future you might ask it to remember the types being allocoated, or their stack traces).

Anyway the API would look like

class GCAllocationTracker() {
GCAllocationTracker(bool includeSpawnedTasks);

 long AllocatedBytes { get }

}

ayende · 2016-07-25T20:23:53Z

Given just a GCHeapMetrics or MemoryMetrics (I agree with your
reasoning that this should be separate class) and a single thread specific
property, users can implement the async version externally.

I would rather focus on the stuff that require cooperation from the runtime
and isn't possible without it, then do the rest as a separate issue to do
the async stuff once you can just write the managed code only and send a PR

*Hibernating Rhinos Ltd *

Oren Eini* l CEO l *Mobile: + 972-52-548-6969

Office: +972-4-622-7811 *l *Fax: +972-153-4-622-7811

On Mon, Jul 25, 2016 at 10:23 PM, Vance Morrison notifications@github.com
wrote:

Some comments.

I don't believe that this API should be on System.Thread. Thread really is
a very FUNDAMENTAL concept, so you don't want it's property-space polluted
with things like this (there are plenty of other metrics that you might
also want to track). Conceptually performance metrics are LESS fundamental
(you can live without them (as we have been doing)).

I don't think it is bad to put this in the GC class because GC does not
really mean GC so much as 'managed memory'. However you certainly could
make the argument that since these APIs are not fundamental, they should go
in their own class (e.g. GCHeapMetrics).

On the question of whether this tracks native allocations: NO. For several
reasons

It is not usually intersting

It is hard to implement (there is no one choke point to instrument).

It is confusing (How can you explain the number to people (how do they
know it is correct?). With GC memory, the allocation events give a precise
breakdown that should 'add up'.

Should it track over async: BOTH

Basically we need two APIs (or a flag on one API). As Manoi indicates the
easiest implementation ignores Task hops, but that makes using this for
interesting scenarios like allocations per service request (which almost
certainly will use async), problematic. The async-aware version is built on
top of the other by using AsyncLocals to 'flow' the lower level information
(which does have a cost, so you don't want to do it if don't want it)

That fact that there is overhead actually argues for a different API.
Instead of this being 'always on' we really should have a object that
'holds' the current value. Thus we should really have GCAllocationTracker,
and if it is alive then tracking happens, and if none are alive then you
can avoid the overhead of tracking. This generalizes nicely to other
performance tracking (which may also have overhead). For example, in the
future you might ask it to remember the types being allocoated, or their
stack traces).

Anyway the API would look like

class GCAllocationTracker() {
GCAllocationTracker(bool includeSpawnedTasks);

long AllocatedBytes { get }

}

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/dotnet/corefx/issues/10157#issuecomment-235056703,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAHIs3WlzLK1m9JbOj0cQHXbsE6MjUUQks5qZQ0vgaJpZM4JQMec
.

clrjunkie · 2016-07-26T20:56:56Z

My understanding is that the api aims to answer the question of how much was allocated during an execution path, and since there is no class modeling the allocation part of the managed heap (e.g only ‘new’ keyword), having the word ‘Thread’ in the api is as close you can get in terms of expressiveness. Personally, I don’t consider having that property on the Thread class not sensible but if you insist on separating metrics I would suggest:

class ThreadMetrics() { ThreadMetrics(); long AllocatedBytes { get } }

Q: Can this be a struct?

I’m not sure what are the exact requirements for the ‘Task’ version, might be good idea to list those.

As for tracking native memory, I wouldn’t say it’s not usually interesting, since anyone working with bitmaps may want to know what’s behind the managed proxy object, but indeed this look like a secondary concern. I agree it would be hard to implement since there’s no one choke point to instrument, otherwise the property value would provide a complete picture.

clrjunkie · 2016-07-26T21:49:21Z

@Maoni0 Can you please explain how this information gets updated after doing 'new' and what exactly is a thread 'alloc_contex' ?

Maoni0 · 2016-07-27T00:39:22Z

@clrjunkie this is described in the Design of Allocator section in GC BotR. GC gives out allocation contexts and then JIT will usually sub allocate in this alloc context. When JIT runs out an alloc context it will goto GC and get a new one.

clrjunkie · 2016-07-27T10:06:13Z

@Maoni0 Thanks.

Can you please add diagram to the document (hyphens and pipes will suffice) to help visualize how heap segments are divided into object/empirical segments and how multiple heaps come into play. I mean given the following text, it’s unclear to me if each “heap segment” is partitioned into individual “object segments” or if one “object segment” spans multiple “heap segments”. Furthermore, I don't understand how ephemeral segments relate to small object heap segments (did you mean “objects of ephemeral generation”?)

Physical representation of the managed heap:

The heap segments are partitioned into small and large object segments, given the distinction of small and large objects. On each heap the heap segments are chained together. There is at least one small object segment and one large segment.

There’s always only one ephemeral segment in each small object heap, which is where gen0 and gen1 live.

In addition to the ephemeral segment, there can be zero, one or more additional segments, which will be gen2 segments since they only contain gen2 objects

Currently, only steps involved in the collection process are outlined under “Functional Behavior”. Can you please detail the steps involved for memory allocation as well?
My understanding is that the motivation behind “Allocation Contexts” is to allocate memory in chunks to reduce allocation latency. Will the AllocatedBytes reflect the sum of all Allocation Contexts or actual "committed object memory"? I mean if it’s the sum of all “Allocation Contexts” that implies that in the worst case the error margin for reporting actual user allocated bytes would be the sum of all “Allocation Context” size (e.g quantum) minus object overhead bytes + 1 byte minimum. If so, I think it’s important to document so the AllocatedBytes figure doesn’t look erroneous.
I don’t quite understand the relationship between “generations” and “allocation contexts”

On a single-processor (meaning 1 logical processor) machine, a single context is used, which is the generation 0 allocation context.

Maoni0 · 2016-07-27T18:47:28Z

Those are good suggestions for documentation and we will add them to our backlog. For the API, we will document them appropriately on MSDN.

clrjunkie · 2016-07-27T20:28:12Z

"we will add them to our backlog. For the API, we will document them appropriately on MSDN"

@terrajobst this isn't serious..

terrajobst · 2016-07-29T18:54:13Z

We've reviewed the API today with @Maoni0, @vancem, @KrzysztofCwalina, and @mellinoe and concluded the following:

We want to make sure this API is low-level, and can be used everywhere. Thus, we don't want to depend on the Thread class.
We want, however, to be able to the allocated bytes for the current thread as well as any thread.

namespace System
{
    public static class GC
    {
        public long GetAllocatedBytesForCurrentThread();
        public long GetAllocatedBytesForThread(int managedThreadId);
    }
}

We should also have a more high-level API that allows tracking, as @vancem explained above. But this should be a separate issue. I've filed dotnet/corefx#10431.

Also, this API will not track bytes across information contexts. This should be part of a higher level API.

ayende · 2016-07-29T19:51:41Z

Are there going to be cost associated with querying allocation on a different thread, vs. checking on the local one?

Maoni0 · 2016-07-29T21:24:06Z

as far as I can see, based on a cursory reading of code in mscorlib/vm, given a managed thread ID, you'll need to enumerate through managed threads till you find one with that ID. we could sort the managed thread IDs to make the look up faster but you still gotta do the look up without a Thread object. @ericeil would know if there's a better way...

ayende · 2016-07-29T21:32:05Z

But calling it on the local thread is fast? The reason I'm asking is that if there is going to be a linear search, that is going to hurt.
Alternatively, what about an object that represent tracking on a particular thread, so we can cache that? (Or maybe cache the thread object itself on calls to the static method?)

ayende · 2016-07-29T21:32:36Z

Just to note, I have some scenarios where I have greater than 2000 threads running on a system.

clrjunkie · 2016-07-29T21:55:10Z

public long GetAllocatedBytesForThread(int managedThreadId);

In my opinion, this overload makes no sense.

When would be the "right time" to call this overload (asynchronously) on the managedThreadId to reason about the thread's allocations?

What would the result of such method call or any successive call even mean when the result is based on arbitrary execution points of the target thread, especially when allocated bytes is cumulative?

ayende · 2016-07-29T22:00:09Z

@clrjunkie Assume that I have a bunch of threads doing work. I can use this in a watchdog thread to monitor them and see that they aren't allocating too much over time, and if they do, I can stop them.

that is a pretty rare scenario, and would probably be handled better by the thread itself, though.

clrjunkie · 2016-07-29T22:20:48Z

@ayende
would probably be handled better by the thread itself, period.

vancem · 2016-08-01T19:17:38Z

Immo @terrajobst, I don't think we should add the API

   public long GetAllocatedBytesForThread(int managedThreadId);

At this time. It is not clear to me what the scenario is other than simply accumulating work for all threads. We should either simply drop it, and if we want, we can add this API

    public long GetAllocatedBytesForAllThreads();

However I am conflicted with this one because we really don't want to synchronize the threads it so it is not really accurate (I suppose we can document that).

I am tempted at this point to only add GetAllocatedBytesForCurrentThread() until we have a clearer idea of the scenario for any of the other candicates.

ayende · 2016-08-01T19:56:39Z

This one already exists:

public long GetAllocatedBytesForAllThreads();

https://msdn.microsoft.com/en-us/library/system.appdomain.monitoringtotalallocatedmemorysize(v=vs.110).aspx

*Hibernating Rhinos Ltd *

Oren Eini* l CEO l *Mobile: + 972-52-548-6969

Office: +972-4-622-7811 *l *Fax: +972-153-4-622-7811

On Mon, Aug 1, 2016 at 10:18 PM, Vance Morrison notifications@github.com
wrote:

Immo @terrajobst https://github.com/terrajobst, I don't think we should
add the API

public long GetAllocatedBytesForThread(int managedThreadId);
public long GetAllocatedBytesForCurrentThread();
public long GetAllocatedBytesForAllThreads();
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/dotnet/corefx/issues/10157#issuecomment-236678904,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAHIszoON1FEggCujmhTfjXIQXu8mkQgks5qbkaBgaJpZM4JQMec
.

vancem · 2016-08-01T20:13:01Z

The issue is that AppDomain.MonitoringTotalAllocatedMemorySize does not exist in .NET Core. In general AppDomains are concept we don't wish to promote going forward, and frankly having the monitoring APIs split like this is not good.

But I am OK with simply not creating a GC.GetAllocatedBytesForAllThreads right now.

terrajobst · 2016-08-02T03:20:15Z

@vancem neither adding GetAllocatedBytesForThread(int managedThreadId) nor GetAllocatedBytesForAllThreads() seems reasonable to me. I've updated the proposal.

Maoni0 · 2016-08-02T05:26:49Z

@vancem, you don't need to synchronize for getting allocated for all threads. This would be done the same way we do AD allocated bytes - we don't synchronize; we just get each heap's allocated bytes. Unless you completely stopped every thread in the process except the current one, it will not be totally accurate anyway and the level of accuracy we are providing is sufficient. However, I would not create this because no one has asked to create it and I don't see a strong usage scenario for it.

@terrajobst, I am fine with not providing the API with managedThreadID. As I mentioned in the API review, I am not thrilled with the user providing such an ID (instead of the Thread object) anyway. While I can see a reasonable usage scenario for getting another thread's allocated bytes as @ayende mentioned, you could do it on the thread you are monitoring (may not be as elegant but then again neither is the API with the managed thread ID).

clrjunkie · 2016-08-02T13:31:48Z

Having the allocation monitoring code execute within the same thread as the code being monitored is not only elegant (and quite innovative) but more importantly: it provides a simple programing model for implementing the corrective action (no need for any signaling protocol between the watchdog thread and the worker thread). You can also log allocations per code sections, deterministically, and reason about those at a logical level as opposed to logging an arbitrary byte count after the fact. It’s also simpler to implement and more portable because you don’t need to figure out how to get the list of threads and manage them or deal with tracking a threadpool that shrinks/grows.

clrjunkie · 2016-08-02T14:18:56Z

Thinking about this some more, if the API would prove to show a consistent use-case, it may be appropriate to discuss implementing support at the language level as well:

try(MAX_ALLOCATED_BYTES,` MAX_EXECUTION_TIME_MILLI)
{
}
catch(MaxAllocatedBytesException e)
{
    // Corrective Action
    return;
}
catch(MaxExecutionTimeException e)
{
    // Corrective Action
    return;
}

Where both constraints are checked at the end of the try block.

Obviously there need to be multiple ‘overloads’ for ‘try’ including one without arguments as exist today.

edit:

Possibly having the constraints named, so the order in which they are specified will determine what would be thrown first in case both thresholds are reached.

try(MaxAllocatedBytes=X,` MaxExecutionTimeMilli=Y)

ayende · 2016-08-02T15:20:03Z

@clrjunkie While I would absolutely love that, you are tying allocations to memory usage, and that isn't the case.
The problem is that something like:

for(var i =0; i < 1000 * 1000 ; i++)
{
   Console.WriteLine(i.ToString());
}

Here you are allocating a lot of memory, but your memory utilization is going to be pretty low because the GC can efficiently collect it.

clrjunkie · 2016-08-02T15:39:05Z

@ayende
My suggestion is syntactic sugar for what may turn out to be a common usage pattern of the proposed api.

As always, you need to pick the right tool for the problem your trying to solve (or monitor :).

So considering your example, the interesting problem is:

for(var i =0; i < 1000 * 1000 ; i++)
{
   try(MaxAllocatedBytes=1000)
   {
    tasks[i].execute(); 
   }
   catch(MaxAllocatedBytesException e)
   {
      // Boom! executed a task that allocated 1MB
      DoSomethingWith(tasks[i], e.AllocatedBytes);
   }
 }

clrjunkie · 2016-08-02T15:55:23Z

@ayende
Also keep in mind that this API definitely does not substitute a profiler, but it seems useful when processing user submitted tasks that can vary in memory usage (e.g dynamic queries) , that is assuming of course you can establish a baseline for what’s considered valid and preferably divided the problem such you can anticipate an approaching abnormal situation before it occurs.

tenor · 2016-08-02T16:16:58Z

Somewhat related to https://github.com/dotnet/coreclr/issues/5563

GSPP · 2016-08-23T11:31:26Z

The API name AllocatedBytes (and similar) could suggest that these are the currently allocated bytes. I think TotalAllocatedBytes is a better name.

joshfree assigned Maoni0 Aug 3, 2016

jkotas closed this as completed Oct 9, 2016

msftgits transferred this issue from dotnet/corefx Jan 31, 2020

msftgits added this to the 2.0.0 milestone Jan 31, 2020

terrajobst mentioned this issue Jan 31, 2020

Add diagnostic tracker API #17961

Closed

NickCraver mentioned this issue Jan 31, 2020

API Proposal: Expose global allocations akin to AppDomain.MonitoringTotalAllocatedMemorySize #26602

Closed

ghost locked as resolved and limited conversation to collaborators Dec 30, 2020

Provide API to understand how many allocations have happened #17891

Provide API to understand how many allocations have happened #17891

Comments

terrajobst commented Jul 19, 2016

Updates

Maoni0 commented Jul 19, 2016

terrajobst commented Jul 19, 2016 • edited Loading

Maoni0 commented Jul 19, 2016

ayende commented Jul 19, 2016

Maoni0 commented Jul 19, 2016 • edited Loading

svick commented Jul 19, 2016

ayende commented Jul 20, 2016

ayende commented Jul 20, 2016

mellinoe commented Jul 20, 2016

terrajobst commented Jul 21, 2016

Maoni0 commented Jul 21, 2016

clrjunkie commented Jul 22, 2016

Maoni0 commented Jul 25, 2016

vancem commented Jul 25, 2016

ayende commented Jul 25, 2016

clrjunkie commented Jul 26, 2016

clrjunkie commented Jul 26, 2016

Maoni0 commented Jul 27, 2016

clrjunkie commented Jul 27, 2016

Maoni0 commented Jul 27, 2016

clrjunkie commented Jul 27, 2016

terrajobst commented Jul 29, 2016 • edited Loading

ayende commented Jul 29, 2016

Maoni0 commented Jul 29, 2016

ayende commented Jul 29, 2016

ayende commented Jul 29, 2016

clrjunkie commented Jul 29, 2016

ayende commented Jul 29, 2016

clrjunkie commented Jul 29, 2016

vancem commented Aug 1, 2016 • edited Loading

ayende commented Aug 1, 2016

vancem commented Aug 1, 2016

terrajobst commented Aug 2, 2016

Maoni0 commented Aug 2, 2016

clrjunkie commented Aug 2, 2016

clrjunkie commented Aug 2, 2016 • edited Loading

ayende commented Aug 2, 2016

clrjunkie commented Aug 2, 2016

clrjunkie commented Aug 2, 2016 • edited Loading

tenor commented Aug 2, 2016

GSPP commented Aug 23, 2016

terrajobst commented Jul 19, 2016 •

edited

Loading

Maoni0 commented Jul 19, 2016 •

edited

Loading

terrajobst commented Jul 29, 2016 •

edited

Loading

vancem commented Aug 1, 2016 •

edited

Loading

clrjunkie commented Aug 2, 2016 •

edited

Loading

clrjunkie commented Aug 2, 2016 •

edited

Loading