New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profiling: How to get GC Metrics in-process #4382

Open
discostu105 opened this Issue Apr 16, 2016 · 15 comments

Comments

Projects
None yet
9 participants
@discostu105
Contributor

discostu105 commented Apr 16, 2016

Let's say, you have a profiler which is loaded inside of the process via Profiling API. Let's say you want to capture the impact the GC has on your process (just the CLR in the process your profiler is loaded in). Stats that would be interesting are: Gen0,1,2,LOH collection count and duration. Which native API would I use to properly get such statistics periodically with low overhead?

Some thoughts:

  • In full framework, this would be very possible via CLR Memory Performance counter metrics ("% Time in GC", "gen X collections", ect). However, perfcounters have been removed for CoreCLR.
  • Yes, there are callbacks from the Profiling API (ICorProfilerCallback::RuntimeSuspendStarted) that indicate "suspensions". However, these callbacks don't give much details on the reason of suspension (e.g. no info about which generation is being collected). Also, I am unsure about background/concurrent GC. I guess those won't trigger runtime suspensions (in all cases).
  • Then there is ETW. At first, it does not give you nice statistics, but events instead. So, you can capture events such as GCStart, GCEnd and you have to calculate the timings you want yourself. So far so good, but it seems that efficiently capturing ETW events from within the process (via native API) is not an easy thing. One way seem to be "private sessions", but apparently, they only work via log-files that need to be parsed [0]. That seems much too complicated and too much overhead (file-IO) for just getting some GC-stats.
  • In ETW, there are also so called "real time sessions", which seem to be more what I want (no file IO, just in-memory buffers). However, there's a global limit of how many such sessions can be active, and also it requires admin-privileges to create them. Not optimal either.
  • Browsing through the sources, I've found COR_GC_STATS to be an interesting datastructure. It contains pretty much what I am looking. It's available via CorGCHost::GetStats. Is there any way for a profiler to get to a CorGCHost instance (if the CLR is not self-hosted)?

[0] https://social.msdn.microsoft.com/Forums/en-US/2664a9cf-172a-45c3-81cc-5a4b93b3be7d/etw-private-session-without-logfile?forum=clr

@discostu105

This comment has been minimized.

Show comment
Hide comment
@discostu105

discostu105 Apr 16, 2016

Contributor

Btw: I'm not sure if it's better to ask such questions on StackOverflow or on Github. Please let me if I should use SO instead.

Contributor

discostu105 commented Apr 16, 2016

Btw: I'm not sure if it's better to ask such questions on StackOverflow or on Github. Please let me if I should use SO instead.

@jkotas

This comment has been minimized.

Show comment
Hide comment
@jkotas
Member

jkotas commented Apr 19, 2016

cc @Maoni0

@Maoni0

This comment has been minimized.

Show comment
Hide comment
@Maoni0

Maoni0 Apr 19, 2016

Member

If you look deeper in some of the categories you've mentioned, there's more info. perf counters is an exception as we made a decision to not invest more in perf counters when we did background GC in 4.0. So you get whatever support it offered before (which means not much support for background GC) when it's enabled. COR_GC_STATS works off of the perf counter info so is also not available if perf counter is not enabled.

For both profiling and ETW, you can certainly get more info -

For profiling, ICorProfilerCallback2::GarbageCollectionStarted/Finished tells you exactly when a GC starts and finishes and with a bunch of other info - the generation it collects, generation bounds and etc.

For ETW, it's a low level mechanism so by design it does not give you statistics (unless of course you are firing stats info in your events). That's why we have tools that build on top of this mechanism to give you stats. Take a look at TraceEvent and PerfView which let you capture/interpret the events (in managed code) and look at stats.

Let me know if these still do not solve your problems.

Member

Maoni0 commented Apr 19, 2016

If you look deeper in some of the categories you've mentioned, there's more info. perf counters is an exception as we made a decision to not invest more in perf counters when we did background GC in 4.0. So you get whatever support it offered before (which means not much support for background GC) when it's enabled. COR_GC_STATS works off of the perf counter info so is also not available if perf counter is not enabled.

For both profiling and ETW, you can certainly get more info -

For profiling, ICorProfilerCallback2::GarbageCollectionStarted/Finished tells you exactly when a GC starts and finishes and with a bunch of other info - the generation it collects, generation bounds and etc.

For ETW, it's a low level mechanism so by design it does not give you statistics (unless of course you are firing stats info in your events). That's why we have tools that build on top of this mechanism to give you stats. Take a look at TraceEvent and PerfView which let you capture/interpret the events (in managed code) and look at stats.

Let me know if these still do not solve your problems.

@discostu105

This comment has been minimized.

Show comment
Hide comment
@discostu105

discostu105 Apr 19, 2016

Contributor

Although ICorProfilerCallback2::GarbageCollectionStarted/Finished would provide the information I'm looking for, I know for a fact that it causes too much overhead. It needs to have a certain profiler capability enabled (COR_PRF_MONITOR_GC). Just by enabling it, Concurrent GC is disabled for example. Since I'm looking for a low-overhead solution, this is not a (good) option for me.

Too bad, COR_GC_STATS does not fit, it looked so good :). Thanks for your info by the way. Also, I did not know that perfcounters don't provide good values when background GC is active, so thanks for that as well.

TraceEvent looks like a nice managed library for certain ETW event sources (also CLR event's I am looking for). Maybe we can extract what we need from there. Does it use private logger sessions? Does it apply to all other limitations of ETW (mentioned in my initial post)?

We did some fair research on ETW, and there is an in-memory mode (real-time mode), where events can be processed from memory and do not need to go to a file (which is great). But unfortunately such real-time mode session can only be made system-wide and there is a limitation of 64 sessions per system (what if I want to monitor more than 64 .net processes?). Unfortunately, that mode is not combineable with private logger sessions which on the other hand does not require special permissions, and where you can make isolated sessions for each process, and there is no 64 session limit. I'm starting to get the feeling that ETW was not exactly made for my requirements :).

Contributor

discostu105 commented Apr 19, 2016

Although ICorProfilerCallback2::GarbageCollectionStarted/Finished would provide the information I'm looking for, I know for a fact that it causes too much overhead. It needs to have a certain profiler capability enabled (COR_PRF_MONITOR_GC). Just by enabling it, Concurrent GC is disabled for example. Since I'm looking for a low-overhead solution, this is not a (good) option for me.

Too bad, COR_GC_STATS does not fit, it looked so good :). Thanks for your info by the way. Also, I did not know that perfcounters don't provide good values when background GC is active, so thanks for that as well.

TraceEvent looks like a nice managed library for certain ETW event sources (also CLR event's I am looking for). Maybe we can extract what we need from there. Does it use private logger sessions? Does it apply to all other limitations of ETW (mentioned in my initial post)?

We did some fair research on ETW, and there is an in-memory mode (real-time mode), where events can be processed from memory and do not need to go to a file (which is great). But unfortunately such real-time mode session can only be made system-wide and there is a limitation of 64 sessions per system (what if I want to monitor more than 64 .net processes?). Unfortunately, that mode is not combineable with private logger sessions which on the other hand does not require special permissions, and where you can make isolated sessions for each process, and there is no 64 session limit. I'm starting to get the feeling that ETW was not exactly made for my requirements :).

@vancem

This comment has been minimized.

Show comment
Hide comment
@vancem

vancem Apr 19, 2016

Member

@brianrob

We have some tentative plans to wire all current ETW events back into managed code so that they look like and EventSource. The idea is that events like GCStats would be available to you in-proc.

However this does not help you today. As you have discovered, ETW does not really work well for 'self-monitoring' scenarios.

Another possibility is that it is also reasonable for use to have a profiler API that gets called any time an ETW event happens. This does not exist, but is reasonably straightforward addition. With this you could use the profiler API to get at the information you want in-proc.

Both of these options requires new code to be written in coreclr, but that is what pull requests are about. If you want to pursue either of these there should be some discussion about the details of the design first.

Member

vancem commented Apr 19, 2016

@brianrob

We have some tentative plans to wire all current ETW events back into managed code so that they look like and EventSource. The idea is that events like GCStats would be available to you in-proc.

However this does not help you today. As you have discovered, ETW does not really work well for 'self-monitoring' scenarios.

Another possibility is that it is also reasonable for use to have a profiler API that gets called any time an ETW event happens. This does not exist, but is reasonably straightforward addition. With this you could use the profiler API to get at the information you want in-proc.

Both of these options requires new code to be written in coreclr, but that is what pull requests are about. If you want to pursue either of these there should be some discussion about the details of the design first.

@Maoni0

This comment has been minimized.

Show comment
Hide comment
@Maoni0

Maoni0 Apr 19, 2016

Member

I have been advocating that we should have a low overhead GC profiling mode (but haven't succeeded :-) apparently this hasn't been an urgent thing to have for most of our customers who use the profiling API) - for some people it's enough to know some stats that are very cheap for the runtime to provide (eg, they don't need the root reference, or moved references and etc). And profiling has the very nice attribute of being real time. It shouldn't be much work to allow an additional mode that gives you a subset of things GC profiling currently offers (you don't have to disable concurrent GC - it was just work that hasn't been done to enable it). If you want, you are certainly welcome to open an issue for it to discuss if someone (could be yourself :)) wants work on it.

Member

Maoni0 commented Apr 19, 2016

I have been advocating that we should have a low overhead GC profiling mode (but haven't succeeded :-) apparently this hasn't been an urgent thing to have for most of our customers who use the profiling API) - for some people it's enough to know some stats that are very cheap for the runtime to provide (eg, they don't need the root reference, or moved references and etc). And profiling has the very nice attribute of being real time. It shouldn't be much work to allow an additional mode that gives you a subset of things GC profiling currently offers (you don't have to disable concurrent GC - it was just work that hasn't been done to enable it). If you want, you are certainly welcome to open an issue for it to discuss if someone (could be yourself :)) wants work on it.

@discostu105

This comment has been minimized.

Show comment
Hide comment
@discostu105

discostu105 Apr 20, 2016

Contributor

All in all, it seems like there are not perfect options which will help me short-term. I guess we'll need to go with ETW for now (with it's limitations). Looking forward, the ideas you mentioned are definitively interesting to explore for future versions.

@vancem

We have some tentative plans to wire all current ETW events back into managed code so that they look like and EventSource. The idea is that events like GCStats would be available to you in-proc.

That certainly sounds interesting. My profiler has native+managed parts, so I could also consume stuff managed. Though, I have a slight preference for consuming such an API natively, because pure native threads are not subject to GC suspensions themself, so they would be (a little bit) more deterministic.

Another possibility is that it is also reasonable for use to have a profiler API that gets called any time an ETW event happens. This does not exist, but is reasonably straightforward addition. With this you could use the profiler API to get at the information you want in-proc.

Sounds cool. In order to control overhead, some up-front filtering mechanism would be needed I guess. Given such a fine-grained filtering mechanism, this approach would be very flexible (and thus useful for my specific use-case). Also (not knowing the internals of ETW) my naive mind asks: where do the buffers live and how large will they be? Would this still go through Windows Kernel?

@Maoni0

I have been advocating that we should have a low overhead GC profiling mode (but haven't succeeded :-) apparently this hasn't been an urgent thing to have for most of our customers who use the profiling API) - for some people it's enough to know some stats that are very cheap for the runtime to provide (eg, they don't need the root reference, or moved references and etc). And profiling has the very nice attribute of being real time. It shouldn't be much work to allow an additional mode that gives you a subset of things GC profiling currently offers (you don't have to disable concurrent GC - it was just work that hasn't been done to enable it). If you want, you are certainly welcome to open an issue for it to discuss if someone (could be yourself :)) wants work on it.

We've actually used those detailed infos that the Profiling API provides (root references, moved references, ...), but for a very different use-case. In that use-case we would only enable COR_PRF_MONITOR_GC for a short period of time, then disable it again (which actually works since some .NET version - thanks for that :)).

But the use-case I am aiming for right now, is to know only the very basic metrics (gen x collection count, duration and as well size of all heaps). I just want to report these metrics periodically (every X seconds), that's it. So, some API which just gives me a struct with those values would be sufficient.

One question by the way: If any of those features discussed here would be implemented for CoreCLR, would it flow back to Full CLR and be released with some future version? Is it basically the same code with different ifdef definitions or is there a very separate repository for it?

Contributor

discostu105 commented Apr 20, 2016

All in all, it seems like there are not perfect options which will help me short-term. I guess we'll need to go with ETW for now (with it's limitations). Looking forward, the ideas you mentioned are definitively interesting to explore for future versions.

@vancem

We have some tentative plans to wire all current ETW events back into managed code so that they look like and EventSource. The idea is that events like GCStats would be available to you in-proc.

That certainly sounds interesting. My profiler has native+managed parts, so I could also consume stuff managed. Though, I have a slight preference for consuming such an API natively, because pure native threads are not subject to GC suspensions themself, so they would be (a little bit) more deterministic.

Another possibility is that it is also reasonable for use to have a profiler API that gets called any time an ETW event happens. This does not exist, but is reasonably straightforward addition. With this you could use the profiler API to get at the information you want in-proc.

Sounds cool. In order to control overhead, some up-front filtering mechanism would be needed I guess. Given such a fine-grained filtering mechanism, this approach would be very flexible (and thus useful for my specific use-case). Also (not knowing the internals of ETW) my naive mind asks: where do the buffers live and how large will they be? Would this still go through Windows Kernel?

@Maoni0

I have been advocating that we should have a low overhead GC profiling mode (but haven't succeeded :-) apparently this hasn't been an urgent thing to have for most of our customers who use the profiling API) - for some people it's enough to know some stats that are very cheap for the runtime to provide (eg, they don't need the root reference, or moved references and etc). And profiling has the very nice attribute of being real time. It shouldn't be much work to allow an additional mode that gives you a subset of things GC profiling currently offers (you don't have to disable concurrent GC - it was just work that hasn't been done to enable it). If you want, you are certainly welcome to open an issue for it to discuss if someone (could be yourself :)) wants work on it.

We've actually used those detailed infos that the Profiling API provides (root references, moved references, ...), but for a very different use-case. In that use-case we would only enable COR_PRF_MONITOR_GC for a short period of time, then disable it again (which actually works since some .NET version - thanks for that :)).

But the use-case I am aiming for right now, is to know only the very basic metrics (gen x collection count, duration and as well size of all heaps). I just want to report these metrics periodically (every X seconds), that's it. So, some API which just gives me a struct with those values would be sufficient.

One question by the way: If any of those features discussed here would be implemented for CoreCLR, would it flow back to Full CLR and be released with some future version? Is it basically the same code with different ifdef definitions or is there a very separate repository for it?

@Maoni0

This comment has been minimized.

Show comment
Hide comment
@Maoni0

Maoni0 Apr 21, 2016

Member

Whether a feature flows back to full clr is determined case by case - it's all based on how much value it can bring to full clr customers and how much work it takes to flow back.

Member

Maoni0 commented Apr 21, 2016

Whether a feature flows back to full clr is determined case by case - it's all based on how much value it can bring to full clr customers and how much work it takes to flow back.

@mleenhardt

This comment has been minimized.

Show comment
Hide comment
@mleenhardt

mleenhardt Apr 7, 2018

I wanted to bump this and ask whether anything had changed on this topic? Specifically, is there any way to get GC stats from within the process? My environment is .NET Core running on Linux.

I'm aware of GC.CollectionCount which does provide some info. I was however interested in collecting metrics on the amount of time and percentage of total CPU time spent in the GC. These seem to be really important data points that you'd want to be able to monitor over time when working on performance sensitive applications and I was surprised to not have this available.

My current plan is to have a side care process running a profiling session and reporting various metrics (using perf and LTTng) about the monitored process. Is there an easier way that I missed to go about doing this? Being used to simply querying Performance counters on Windows, I really wish there was.

mleenhardt commented Apr 7, 2018

I wanted to bump this and ask whether anything had changed on this topic? Specifically, is there any way to get GC stats from within the process? My environment is .NET Core running on Linux.

I'm aware of GC.CollectionCount which does provide some info. I was however interested in collecting metrics on the amount of time and percentage of total CPU time spent in the GC. These seem to be really important data points that you'd want to be able to monitor over time when working on performance sensitive applications and I was surprised to not have this available.

My current plan is to have a side care process running a profiling session and reporting various metrics (using perf and LTTng) about the monitored process. Is there an easier way that I missed to go about doing this? Being used to simply querying Performance counters on Windows, I really wish there was.

@brianrob

This comment has been minimized.

Show comment
Hide comment
@brianrob

brianrob Apr 9, 2018

Member

@lmeenhardt, using perf and/or LTTng is a reasonable way to go about this, especially if you want very detailed information. This has been done before by others as well. There are APIs to get the collection count as well as the amount of memory that has been allocated on the current thread, but if you want data per-GC then LTTng is likely your best bet. Note that if you don't care about collecting CPU and/or blocked time stacks then you just need LTTng. It can give you all of the runtime events as well as kernel events without stacks.

Also, as @vancem noted above, we are moving towards making all of the runtime events, including GC events, available in-process through EventListeners. There has been much work done on the internals of the runtime and logging infrastructure to make this possible, but we are not there yet. You can read some about the plans at https://github.com/dotnet/designs/blob/master/accepted/cross-platform-performance-monitoring.md.

Member

brianrob commented Apr 9, 2018

@lmeenhardt, using perf and/or LTTng is a reasonable way to go about this, especially if you want very detailed information. This has been done before by others as well. There are APIs to get the collection count as well as the amount of memory that has been allocated on the current thread, but if you want data per-GC then LTTng is likely your best bet. Note that if you don't care about collecting CPU and/or blocked time stacks then you just need LTTng. It can give you all of the runtime events as well as kernel events without stacks.

Also, as @vancem noted above, we are moving towards making all of the runtime events, including GC events, available in-process through EventListeners. There has been much work done on the internals of the runtime and logging infrastructure to make this possible, but we are not there yet. You can read some about the plans at https://github.com/dotnet/designs/blob/master/accepted/cross-platform-performance-monitoring.md.

@cesna

This comment has been minimized.

Show comment
Hide comment
@cesna

cesna Jun 19, 2018

@mleenhardt did your plan worked? (side care process with LTTng) Or maybe you found an easier solution? I am facing same challenge now. Looking for ways to get GC "perf counters" on Linux.

cesna commented Jun 19, 2018

@mleenhardt did your plan worked? (side care process with LTTng) Or maybe you found an easier solution? I am facing same challenge now. Looking for ways to get GC "perf counters" on Linux.

@mleenhardt

This comment has been minimized.

Show comment
Hide comment
@mleenhardt

mleenhardt Jun 19, 2018

Haven't gotten around working on this yet but unless there is now something better, that's still the plan.

mleenhardt commented Jun 19, 2018

Haven't gotten around working on this yet but unless there is now something better, that's still the plan.

@brianrob

This comment has been minimized.

Show comment
Hide comment
@brianrob

brianrob Jun 20, 2018

Member

The work I described above around exposing runtime data through EventListener is in-progress and so hopefully by the time you're ready it will be as well. I don't have an ETA yet though.

Member

brianrob commented Jun 20, 2018

The work I described above around exposing runtime data through EventListener is in-progress and so hopefully by the time you're ready it will be as well. I don't have an ETA yet though.

@ocoster-ff

This comment has been minimized.

Show comment
Hide comment
@ocoster-ff

ocoster-ff Jul 24, 2018

I have a need for a similar feature as @mleenhardt - I will need to collect different GC stats in-proc in a cross platform manner (so a side-car LTTng process won't quite cut it...).

ocoster-ff commented Jul 24, 2018

I have a need for a similar feature as @mleenhardt - I will need to collect different GC stats in-proc in a cross platform manner (so a side-car LTTng process won't quite cut it...).

@Maoni0

This comment has been minimized.

Show comment
Hide comment
@Maoni0

Maoni0 Jul 26, 2018

Member

can you tell me how you would use the CPU time spent in GC if it was available? obviously, some of this CPU time was consumed while the user threads were running while the rest wasn't. if you can tell me the usage scenario it would be helpful to understand if the CPU time spent in GC is the right thing to provide.

Member

Maoni0 commented Jul 26, 2018

can you tell me how you would use the CPU time spent in GC if it was available? obviously, some of this CPU time was consumed while the user threads were running while the rest wasn't. if you can tell me the usage scenario it would be helpful to understand if the CPU time spent in GC is the right thing to provide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment