Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide API to understand how many allocations have happened #17891

Closed
terrajobst opened this issue Jul 19, 2016 · 41 comments
Closed

Provide API to understand how many allocations have happened #17891

terrajobst opened this issue Jul 19, 2016 · 41 comments
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Threading
Milestone

Comments

@terrajobst
Copy link
Member

As discussed in https://github.com/dotnet/coreclr/issues/6275 we would like an API to expose how many bytes were allocated since the beginning of a thread's life time.

What about an API like this:

namespace System
{
    public static class GC
    {
        public long GetAllocatedBytesForCurrentThread();
    }
}

The usage could like this:

[Fact]
public void EnsureAllocations()
{
    var allocationsBefore = GC.GetAllocatedBytesForCurrentThread();

    var str = "{'Items':[" + 
        string.Join(",", Enumerable.Range(0, 3000).Select(i => "{}")) + "]}";
    var jObject = JObject.Parse(str);

    var allocationsAfter = Thread.GetAllocatedBytesForCurrentThread();
    var allocations = allocationsAfter - allocationsBefore;

    const long MaximumBytesAllowed = 200 * 1024;
    Assert.LowerOrEqual(allocations, MaximumBytesAllowed);
}

Questions:

  • Can/should this method track bytes per native OS thread or for the logical call context?
  • Does this track memory across async boundaries? Should it?

@Maoni0

Updates

  • Based on @vancem's concerns we've decided not to add GetAllocatedBytesForThread(int managedThreadId).
@Maoni0
Copy link
Member

Maoni0 commented Jul 19, 2016

Thanks @terrajobst.

This tracks allocations on the GC heap for the managed thread and does not track memory when the execution moves to another thread.

For each managed thread, it maintains an allocation context - allocations from this thread will be managed by this context and the number will be reported based on certain values in this context.

@terrajobst
Copy link
Member Author

terrajobst commented Jul 19, 2016

When you say "thread" I think you mean whatever the given runtime decides to use to implement the System.Thread type. Which could be fibers but are usually threads.

But what happens for tasks and async? Some information, such as synchronization context, is passed via the call context and thus flows across thread boundaries so that it doesn't matter if the given async operation completes on the same thread or a different thread. I assume allocation information wouldn't flow?

@Maoni0
Copy link
Member

Maoni0 commented Jul 19, 2016

When you say "thread" I think you mean whatever the given runtime decides to use to implement the System.Thread type.

Yes.

But what happens for tasks and async? Some information, such as synchronization context, is passed via the call context. I assume this information wouldn't flow?

The allocated bytes is an attribute on a thread - it has no concept of tasks/async. If you want to know how many bytes a method that involves some async stuff allocates, you can inspect the synchronization context yourself to follow how the execution switches threads and get the allocated bytes for thread it switches to and add them together.

@ayende
Copy link
Contributor

ayende commented Jul 19, 2016

I suggest that the name would be: GetTotalAllocatedBytes(), instead of GetCurrentAllocatedBytes(), because it represent the sum total of memory allocated by this thread (as I understand it), not the current allocated (but not GCed) memory.

@Maoni0
Copy link
Member

Maoni0 commented Jul 19, 2016

I don't have strong opinions on the name. I'll leave that up to @terrajobst.

This represents the allocated bytes, not survived bytes. Allocated is not taking GC into consideration while survived is. So if you do

ulong currentAlloc = Thread.NewGetAllocAPI();
MyClass o = new MyClass(); // this allocates 100 bytes.
// do stuff with o but no other allocations
ulong allocated = Thread.NewGetAllocAPI() - currentAlloc;

allocated will be 100 regardless of whether a GC happened inbetween and possibly cleaned up o.

@svick
Copy link
Contributor

svick commented Jul 19, 2016

Should this method be on Thread? There is already a somewhat similar method GetTotalMemory(bool) on the GC class. Maybe the two methods should stay together? In that case, the name could be something like GC.GetCurrentThreadTotalAllocatedBytes() (though that is a fairly long name).

@ayende
Copy link
Contributor

ayende commented Jul 20, 2016

What about an overload? GC.GetTotalMemory(Thread thread) ?

@ayende
Copy link
Contributor

ayende commented Jul 20, 2016

Being able to ask that on another thread is interesting and give us the chance to do some interesting metrics (what is the allocation rate per thread, so we can make decisions on that).

@mellinoe
Copy link
Contributor

I'm not sure we can / want to have System.GC depend on the higher-level System.Threading.Thread concept. UWP currently doesn't actually support Thread (which we should fix), but even still I'm not sure it's the best dependency to bring in.

@terrajobst
Copy link
Member Author

I agree with @mellinoe, having that on the GC type doesn't seem to be the right place -- especially because as @Maoni0 outlined that this API wouldn't take GCs into account. So I think Thread is more suitable.

However, it seems @ayende raised good points on naming and versatility, so what about:

namespace System.Threading
{
    partial class Thread
    {
        public ulong GetTotalAllocatedBytes();
    }
}

@Maoni0
Copy link
Member

Maoni0 commented Jul 21, 2016

Unless you've stashed the Thread object somewhere it will be pretty expensive to get to the object via CurrentThread. And in many cases we do not expose the Thread object. So I would provide both a static method for getting the current thread's alloc bytes and an instance method for people who have the Thread object stashed somewhere.

@clrjunkie
Copy link

Any specific reason why this should be implemented as a Method instead of a Property?

Does this take in to account native allocations as well? If not, I think having “Total” in the name can be confusing as well.

I suggest keeping it short:

Thread.AllocatedBytes

Or

Thread.ManagedHeapAllocatedBytes (in case native memory is planned to be tracked separately in the future)

I think in this context the past tense clearly conveys the notion of something being accumulated.

@Maoni0
Copy link
Member

Maoni0 commented Jul 25, 2016

this does not include native allocations.

@vancem
Copy link
Contributor

vancem commented Jul 25, 2016

Some comments.

I don't believe that this API should be on System.Thread. Thread really is a very FUNDAMENTAL concept, so you don't want it's property-space polluted with things like this (there are plenty of other metrics that you might also want to track). Conceptually performance metrics are LESS fundamental (you can live without them (as we have been doing)).

I don't think it is bad to put this in the GC class because GC does not really mean GC so much as 'managed memory'. However you certainly could make the argument that since these APIs are not fundamental, they should go in their own class (e.g. GCHeapMetrics).

On the question of whether this tracks native allocations: NO. For several reasons

  1. It is not usually intersting
  2. It is hard to implement (there is no one choke point to instrument).
  3. It is confusing (How can you explain the number to people (how do they know it is correct?). With GC memory, the allocation events give a precise breakdown that should 'add up'.

Should it track over async: BOTH

Basically we need two APIs (or a flag on one API). As Manoi indicates the easiest implementation ignores Task hops, but that makes using this for interesting scenarios like allocations per service request (which almost certainly will use async), problematic. The async-aware version is built on top of the other by using AsyncLocals to 'flow' the lower level information (which does have a cost, so you don't want to do it if don't want it)

That fact that there is overhead actually argues for a different API. Instead of this being 'always on' we really should have a object that 'holds' the current value. Thus we should really have GCAllocationTracker, and if it is alive then tracking happens, and if none are alive then you can avoid the overhead of tracking. This generalizes nicely to other performance tracking (which may also have overhead). For example, in the future you might ask it to remember the types being allocoated, or their stack traces).

Anyway the API would look like

class GCAllocationTracker() {
GCAllocationTracker(bool includeSpawnedTasks);

 long AllocatedBytes { get } 

}

@ayende
Copy link
Contributor

ayende commented Jul 25, 2016

Given just a GCHeapMetrics or MemoryMetrics (I agree with your
reasoning that this should be separate class) and a single thread specific
property, users can implement the async version externally.

I would rather focus on the stuff that require cooperation from the runtime
and isn't possible without it, then do the rest as a separate issue to do
the async stuff once you can just write the managed code only and send a PR

*Hibernating Rhinos Ltd *

Oren Eini* l CEO l *Mobile: + 972-52-548-6969

Office: +972-4-622-7811 *l *Fax: +972-153-4-622-7811

On Mon, Jul 25, 2016 at 10:23 PM, Vance Morrison notifications@github.com
wrote:

Some comments.

I don't believe that this API should be on System.Thread. Thread really is
a very FUNDAMENTAL concept, so you don't want it's property-space polluted
with things like this (there are plenty of other metrics that you might
also want to track). Conceptually performance metrics are LESS fundamental
(you can live without them (as we have been doing)).

I don't think it is bad to put this in the GC class because GC does not
really mean GC so much as 'managed memory'. However you certainly could
make the argument that since these APIs are not fundamental, they should go
in their own class (e.g. GCHeapMetrics).

On the question of whether this tracks native allocations: NO. For several
reasons

  1. It is not usually intersting
  2. It is hard to implement (there is no one choke point to instrument).
  3. It is confusing (How can you explain the number to people (how do they
    know it is correct?). With GC memory, the allocation events give a precise
    breakdown that should 'add up'.

Should it track over async: BOTH

Basically we need two APIs (or a flag on one API). As Manoi indicates the
easiest implementation ignores Task hops, but that makes using this for
interesting scenarios like allocations per service request (which almost
certainly will use async), problematic. The async-aware version is built on
top of the other by using AsyncLocals to 'flow' the lower level information
(which does have a cost, so you don't want to do it if don't want it)

That fact that there is overhead actually argues for a different API.
Instead of this being 'always on' we really should have a object that
'holds' the current value. Thus we should really have GCAllocationTracker,
and if it is alive then tracking happens, and if none are alive then you
can avoid the overhead of tracking. This generalizes nicely to other
performance tracking (which may also have overhead). For example, in the
future you might ask it to remember the types being allocoated, or their
stack traces).

Anyway the API would look like

class GCAllocationTracker() {
GCAllocationTracker(bool includeSpawnedTasks);

long AllocatedBytes { get }

}


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/dotnet/corefx/issues/10157#issuecomment-235056703,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAHIs3WlzLK1m9JbOj0cQHXbsE6MjUUQks5qZQ0vgaJpZM4JQMec
.

@clrjunkie
Copy link

My understanding is that the api aims to answer the question of how much was allocated during an execution path, and since there is no class modeling the allocation part of the managed heap (e.g only ‘new’ keyword), having the word ‘Thread’ in the api is as close you can get in terms of expressiveness. Personally, I don’t consider having that property on the Thread class not sensible but if you insist on separating metrics I would suggest:

class ThreadMetrics() { ThreadMetrics(); long AllocatedBytes { get } }

Q: Can this be a struct?

I’m not sure what are the exact requirements for the ‘Task’ version, might be good idea to list those.

As for tracking native memory, I wouldn’t say it’s not usually interesting, since anyone working with bitmaps may want to know what’s behind the managed proxy object, but indeed this look like a secondary concern. I agree it would be hard to implement since there’s no one choke point to instrument, otherwise the property value would provide a complete picture.

@clrjunkie
Copy link

@Maoni0 Can you please explain how this information gets updated after doing 'new' and what exactly is a thread 'alloc_contex' ?

@Maoni0
Copy link
Member

Maoni0 commented Jul 27, 2016

@clrjunkie this is described in the Design of Allocator section in GC BotR. GC gives out allocation contexts and then JIT will usually sub allocate in this alloc context. When JIT runs out an alloc context it will goto GC and get a new one.

@clrjunkie
Copy link

@Maoni0 Thanks.

  • Can you please add diagram to the document (hyphens and pipes will suffice) to help visualize how heap segments are divided into object/empirical segments and how multiple heaps come into play. I mean given the following text, it’s unclear to me if each “heap segment” is partitioned into individual “object segments” or if one “object segment” spans multiple “heap segments”. Furthermore, I don't understand how ephemeral segments relate to small object heap segments (did you mean “objects of ephemeral generation”?)

Physical representation of the managed heap:

The heap segments are partitioned into small and large object segments, given the distinction of small and large objects. On each heap the heap segments are chained together. There is at least one small object segment and one large segment.

There’s always only one ephemeral segment in each small object heap, which is where gen0 and gen1 live.

In addition to the ephemeral segment, there can be zero, one or more additional segments, which will be gen2 segments since they only contain gen2 objects

  • Currently, only steps involved in the collection process are outlined under “Functional Behavior”. Can you please detail the steps involved for memory allocation as well?
  • My understanding is that the motivation behind “Allocation Contexts” is to allocate memory in chunks to reduce allocation latency. Will the AllocatedBytes reflect the sum of all Allocation Contexts or actual "committed object memory"? I mean if it’s the sum of all “Allocation Contexts” that implies that in the worst case the error margin for reporting actual user allocated bytes would be the sum of all “Allocation Context” size (e.g quantum) minus object overhead bytes + 1 byte minimum. If so, I think it’s important to document so the AllocatedBytes figure doesn’t look erroneous.
  • I don’t quite understand the relationship between “generations” and “allocation contexts”

On a single-processor (meaning 1 logical processor) machine, a single context is used, which is the generation 0 allocation context.

@Maoni0
Copy link
Member

Maoni0 commented Jul 27, 2016

Those are good suggestions for documentation and we will add them to our backlog. For the API, we will document them appropriately on MSDN.

@clrjunkie
Copy link

"we will add them to our backlog. For the API, we will document them appropriately on MSDN"

@terrajobst this isn't serious..

@terrajobst
Copy link
Member Author

terrajobst commented Jul 29, 2016

We've reviewed the API today with @Maoni0, @vancem, @KrzysztofCwalina, and @mellinoe and concluded the following:

  • We want to make sure this API is low-level, and can be used everywhere. Thus, we don't want to depend on the Thread class.
  • We want, however, to be able to the allocated bytes for the current thread as well as any thread.
namespace System
{
    public static class GC
    {
        public long GetAllocatedBytesForCurrentThread();
        public long GetAllocatedBytesForThread(int managedThreadId);
    }
}

We should also have a more high-level API that allows tracking, as @vancem explained above. But this should be a separate issue. I've filed dotnet/corefx#10431.

Also, this API will not track bytes across information contexts. This should be part of a higher level API.

@ayende
Copy link
Contributor

ayende commented Jul 29, 2016

Are there going to be cost associated with querying allocation on a different thread, vs. checking on the local one?

@Maoni0
Copy link
Member

Maoni0 commented Jul 29, 2016

as far as I can see, based on a cursory reading of code in mscorlib/vm, given a managed thread ID, you'll need to enumerate through managed threads till you find one with that ID. we could sort the managed thread IDs to make the look up faster but you still gotta do the look up without a Thread object. @ericeil would know if there's a better way...

@ayende
Copy link
Contributor

ayende commented Jul 29, 2016

But calling it on the local thread is fast? The reason I'm asking is that if there is going to be a linear search, that is going to hurt.
Alternatively, what about an object that represent tracking on a particular thread, so we can cache that? (Or maybe cache the thread object itself on calls to the static method?)

@ayende
Copy link
Contributor

ayende commented Jul 29, 2016

Just to note, I have some scenarios where I have greater than 2000 threads running on a system.

@clrjunkie
Copy link

public long GetAllocatedBytesForThread(int managedThreadId);

In my opinion, this overload makes no sense.

When would be the "right time" to call this overload (asynchronously) on the managedThreadId to reason about the thread's allocations?

What would the result of such method call or any successive call even mean when the result is based on arbitrary execution points of the target thread, especially when allocated bytes is cumulative?

@ayende
Copy link
Contributor

ayende commented Jul 29, 2016

@clrjunkie Assume that I have a bunch of threads doing work. I can use this in a watchdog thread to monitor them and see that they aren't allocating too much over time, and if they do, I can stop them.

that is a pretty rare scenario, and would probably be handled better by the thread itself, though.

@clrjunkie
Copy link

@ayende
would probably be handled better by the thread itself, period.

@vancem
Copy link
Contributor

vancem commented Aug 1, 2016

Immo @terrajobst, I don't think we should add the API

   public long GetAllocatedBytesForThread(int managedThreadId);

At this time. It is not clear to me what the scenario is other than simply accumulating work for all threads. We should either simply drop it, and if we want, we can add this API

    public long GetAllocatedBytesForAllThreads();

However I am conflicted with this one because we really don't want to synchronize the threads it so it is not really accurate (I suppose we can document that).

I am tempted at this point to only add GetAllocatedBytesForCurrentThread() until we have a clearer idea of the scenario for any of the other candicates.

@ayende
Copy link
Contributor

ayende commented Aug 1, 2016

This one already exists:

public long GetAllocatedBytesForAllThreads();

https://msdn.microsoft.com/en-us/library/system.appdomain.monitoringtotalallocatedmemorysize(v=vs.110).aspx

*Hibernating Rhinos Ltd *

Oren Eini* l CEO l *Mobile: + 972-52-548-6969

Office: +972-4-622-7811 *l *Fax: +972-153-4-622-7811

On Mon, Aug 1, 2016 at 10:18 PM, Vance Morrison notifications@github.com
wrote:

Immo @terrajobst https://github.com/terrajobst, I don't think we should
add the API

public long GetAllocatedBytesForThread(int managedThreadId);

public long GetAllocatedBytesForCurrentThread();
public long GetAllocatedBytesForAllThreads();


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/dotnet/corefx/issues/10157#issuecomment-236678904,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAHIszoON1FEggCujmhTfjXIQXu8mkQgks5qbkaBgaJpZM4JQMec
.

@vancem
Copy link
Contributor

vancem commented Aug 1, 2016

The issue is that AppDomain.MonitoringTotalAllocatedMemorySize does not exist in .NET Core. In general AppDomains are concept we don't wish to promote going forward, and frankly having the monitoring APIs split like this is not good.

But I am OK with simply not creating a GC.GetAllocatedBytesForAllThreads right now.

@terrajobst
Copy link
Member Author

@vancem neither adding GetAllocatedBytesForThread(int managedThreadId) nor GetAllocatedBytesForAllThreads() seems reasonable to me. I've updated the proposal.

@Maoni0
Copy link
Member

Maoni0 commented Aug 2, 2016

@vancem, you don't need to synchronize for getting allocated for all threads. This would be done the same way we do AD allocated bytes - we don't synchronize; we just get each heap's allocated bytes. Unless you completely stopped every thread in the process except the current one, it will not be totally accurate anyway and the level of accuracy we are providing is sufficient. However, I would not create this because no one has asked to create it and I don't see a strong usage scenario for it.

@terrajobst, I am fine with not providing the API with managedThreadID. As I mentioned in the API review, I am not thrilled with the user providing such an ID (instead of the Thread object) anyway. While I can see a reasonable usage scenario for getting another thread's allocated bytes as @ayende mentioned, you could do it on the thread you are monitoring (may not be as elegant but then again neither is the API with the managed thread ID).

@clrjunkie
Copy link

Having the allocation monitoring code execute within the same thread as the code being monitored is not only elegant (and quite innovative) but more importantly: it provides a simple programing model for implementing the corrective action (no need for any signaling protocol between the watchdog thread and the worker thread). You can also log allocations per code sections, deterministically, and reason about those at a logical level as opposed to logging an arbitrary byte count after the fact. It’s also simpler to implement and more portable because you don’t need to figure out how to get the list of threads and manage them or deal with tracking a threadpool that shrinks/grows.

@clrjunkie
Copy link

clrjunkie commented Aug 2, 2016

Thinking about this some more, if the API would prove to show a consistent use-case, it may be appropriate to discuss implementing support at the language level as well:

try(MAX_ALLOCATED_BYTES,` MAX_EXECUTION_TIME_MILLI)
{
}
catch(MaxAllocatedBytesException e)
{
    // Corrective Action
    return;
}
catch(MaxExecutionTimeException e)
{
    // Corrective Action
    return;
}

Where both constraints are checked at the end of the try block.

Obviously there need to be multiple ‘overloads’ for ‘try’ including one without arguments as exist today.

edit:

Possibly having the constraints named, so the order in which they are specified will determine what would be thrown first in case both thresholds are reached.

try(MaxAllocatedBytes=X,` MaxExecutionTimeMilli=Y)

@ayende
Copy link
Contributor

ayende commented Aug 2, 2016

@clrjunkie While I would absolutely love that, you are tying allocations to memory usage, and that isn't the case.
The problem is that something like:

for(var i =0; i < 1000 * 1000 ; i++)
{
   Console.WriteLine(i.ToString());
}

Here you are allocating a lot of memory, but your memory utilization is going to be pretty low because the GC can efficiently collect it.

@clrjunkie
Copy link

@ayende
My suggestion is syntactic sugar for what may turn out to be a common usage pattern of the proposed api.

As always, you need to pick the right tool for the problem your trying to solve (or monitor :).

So considering your example, the interesting problem is:

for(var i =0; i < 1000 * 1000 ; i++)
{
   try(MaxAllocatedBytes=1000)
   {
    tasks[i].execute(); 
   }
   catch(MaxAllocatedBytesException e)
   {
      // Boom! executed a task that allocated 1MB
      DoSomethingWith(tasks[i], e.AllocatedBytes);
   }
 }

@clrjunkie
Copy link

clrjunkie commented Aug 2, 2016

@ayende
Also keep in mind that this API definitely does not substitute a profiler, but it seems useful when processing user submitted tasks that can vary in memory usage (e.g dynamic queries) , that is assuming of course you can establish a baseline for what’s considered valid and preferably divided the problem such you can anticipate an approaching abnormal situation before it occurs.

@tenor
Copy link

tenor commented Aug 2, 2016

Somewhat related to https://github.com/dotnet/coreclr/issues/5563

@GSPP
Copy link

GSPP commented Aug 23, 2016

The API name AllocatedBytes (and similar) could suggest that these are the currently allocated bytes. I think TotalAllocatedBytes is a better name.

@jkotas jkotas closed this as completed Oct 9, 2016
@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 2.0.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 30, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Threading
Projects
None yet
Development

No branches or pull requests