-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MetricsEventSource #54333
Add MetricsEventSource #54333
Conversation
Tagging subscribers to this area: @tarekgh, @tommcdon, @pjanotti Issue DetailsThe feature is still a work in progress but wanted to let others Our out-of-process tools like dotnet-counters and dotnet-monitor need This change does not create any new BCL API surface, the aggregated
|
@tarekgh @dotnet/dotnet-diag @cijothomas @reyang @victlu @wiktork @jander-msft @shirhatti |
...ies/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/AggregationManager.cs
Outdated
Show resolved
Hide resolved
...ies/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/AggregationManager.cs
Outdated
Show resolved
Hide resolved
...ies/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/AggregationManager.cs
Outdated
Show resolved
Hide resolved
...ies/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/AggregationManager.cs
Outdated
Show resolved
Hide resolved
...ies/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/AggregationManager.cs
Outdated
Show resolved
Hide resolved
...ies/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/MetricsEventSource.cs
Show resolved
Hide resolved
...ies/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/MetricsEventSource.cs
Show resolved
Hide resolved
...braries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/ObjectSequence.cs
Show resolved
Hide resolved
...System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/ObjectSequence.netcore.cs
Show resolved
Hide resolved
...System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/StringSequence.netcore.cs
Show resolved
Hide resolved
The feature is still a work in progress but wanted to let others see it in its current state while I am refining it. Our out-of-process tools like dotnet-counters and dotnet-monitor need access to the metrics produced by the new Meter APIs without requiring the app to take any dependency on a separate OpenTelemetry library. System.Diagnostics.Metrics EventSource is a new source designed to let those tools access this data. The EventSource includes high performance in-proc pre-aggregation capable of observing millions of instrument invocations/sec/thread with low CPU overhead. This change does not create any new BCL API surface, the aggregated data is solely exposed by subscribing to the EventSource such as using ETW, EventPipe, Lttng, or EventListener. For anyone wanting in-process APIs to consume the data they could either use MeterListener for unaggregated data or a library such as OpenTelemetry for pre-aggregated data.
Thanks for the nice review @gfoidl! I think almost all of it has been applied and a few parts were rendered moot from other changes. |
f5945e0
to
87c8fca
Compare
- Made some adjustments to the events - Added a bunch of tests - Fixed all the bugs I found with those tests - Misc refactoring - PR feedback
87c8fca
to
f871425
Compare
I got most of the changes I wanted made and rebased it on main so now it is probably more reasonable to review it. |
...braries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/RateAggregator.cs
Outdated
Show resolved
Hide resolved
...libraries/System.Diagnostics.DiagnosticSource/src/System.Diagnostics.DiagnosticSource.csproj
Outdated
Show resolved
Hide resolved
...libraries/System.Diagnostics.DiagnosticSource/src/System.Diagnostics.DiagnosticSource.csproj
Outdated
Show resolved
Hide resolved
src/libraries/System.Diagnostics.DiagnosticSource/src/Properties/InternalsVisibleTo.cs
Outdated
Show resolved
Hide resolved
...ies/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/AggregationManager.cs
Outdated
Show resolved
Hide resolved
...ies/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/AggregationManager.cs
Outdated
Show resolved
Hide resolved
...libraries/System.Diagnostics.DiagnosticSource/src/System.Diagnostics.DiagnosticSource.csproj
Outdated
Show resolved
Hide resolved
...ies/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/AggregationManager.cs
Outdated
Show resolved
Hide resolved
...ies/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/AggregationManager.cs
Show resolved
Hide resolved
...ies/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/AggregationManager.cs
Outdated
Show resolved
Hide resolved
// This explicitly uses a Thread and not a Task so that metrics still work | ||
// even when an app is experiencing thread-pool starvation. Although we | ||
// can't make in-proc metrics robust to everything, this is a common enough | ||
// problem in .NET apps that it feels worthwhile to take the precaution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CC @stephentoub just in case he has any comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How many of these AggregationManager instances will there be in a process, and thus how many of these threads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At any given time there should be at most one instance of AggregationManager and one thread. MetricsEventSource disposes the old one (which joins the thread) before creating new one.
...ies/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/AggregationManager.cs
Show resolved
Hide resolved
...ies/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/AggregationManager.cs
Outdated
Show resolved
Hide resolved
...ies/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/AggregationManager.cs
Show resolved
Hide resolved
...raries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/AggregatorStore.cs
Outdated
Show resolved
Hide resolved
...raries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/AggregatorStore.cs
Show resolved
Hide resolved
...ies/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/MetricsEventSource.cs
Outdated
Show resolved
Hide resolved
...ies/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/MetricsEventSource.cs
Outdated
Show resolved
Hide resolved
...ies/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/MetricsEventSource.cs
Outdated
Show resolved
Hide resolved
...ies/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/MetricsEventSource.cs
Outdated
Show resolved
Hide resolved
...ies/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/MetricsEventSource.cs
Outdated
Show resolved
Hide resolved
...ies/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/MetricsEventSource.cs
Outdated
Show resolved
Hide resolved
...ies/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/MetricsEventSource.cs
Outdated
Show resolved
Hide resolved
...ies/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/MetricsEventSource.cs
Outdated
Show resolved
Hide resolved
...ies/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/MetricsEventSource.cs
Outdated
Show resolved
Hide resolved
...ies/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/MetricsEventSource.cs
Outdated
Show resolved
Hide resolved
...ies/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/MetricsEventSource.cs
Outdated
Show resolved
Hide resolved
Value1 = value1; | ||
} | ||
|
||
public override int GetHashCode() => Value1?.GetHashCode() ?? 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is Value1 ever written outside of the ctor? If no, it should be readonly. If yes, does any code depend on this GetHashCode being stable, e.g. are these ever put into a dictionary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is Value1 ever written outside of the ctor? Yes
If yes, does any code depend on this GetHashCode being stable? Yes
The path that modifies these after construction is here: https://github.com/dotnet/runtime/pull/54333/files#diff-37b757b2c75dad499405642004b936213328f52d7becdec52b3cde0d7948a54bR411-R431
I think the key invariant is that the values are never changed after inserting into the dictionary.
...braries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/ObjectSequence.cs
Outdated
Show resolved
Hide resolved
...braries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/ObjectSequence.cs
Outdated
Show resolved
Hide resolved
...braries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/ObjectSequence.cs
Show resolved
Hide resolved
...braries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/ObjectSequence.cs
Show resolved
Hide resolved
...braries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/ObjectSequence.cs
Outdated
Show resolved
Hide resolved
...braries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/ObjectSequence.cs
Outdated
Show resolved
Hide resolved
...braries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/ObjectSequence.cs
Outdated
Show resolved
Hide resolved
...System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/ObjectSequence.netcore.cs
Show resolved
Hide resolved
...braries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/RateAggregator.cs
Outdated
Show resolved
Hide resolved
...braries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/RateAggregator.cs
Outdated
Show resolved
Hide resolved
...braries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/RateAggregator.cs
Outdated
Show resolved
Hide resolved
...braries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/StringSequence.cs
Show resolved
Hide resolved
if (genericDefType == typeof(Counter<>)) | ||
{ | ||
return () => new RateSumAggregator(); | ||
} | ||
else if (genericDefType == typeof(ObservableCounter<>)) | ||
{ | ||
return () => new RateAggregator(); | ||
} | ||
else if (genericDefType == typeof(ObservableGauge<>)) | ||
{ | ||
return () => new LastValue(); | ||
} | ||
else if (genericDefType == typeof(Histogram<>)) | ||
{ | ||
return () => new ExponentialHistogramAggregator(DefaultHistogramConfig); | ||
} | ||
else | ||
{ | ||
return null; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: switch statement/expression on genericDefType
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it works? case typeof(typename)
gives an error that the value isn't constant. Type matching on instrument would require I know the exact type I think but I only know a partial type
...ies/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/AggregationManager.cs
Show resolved
Hide resolved
else if (stateUnion is MultiSizeLabelNameDictionary<TAggregator> aggsMultiSize) | ||
{ | ||
aggsMultiSize.Collect(visitFunc); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seconding this! I think the switch matches better with the sum type nature of the state
variable. But, again, it's personal preference.
lock (this) | ||
{ | ||
LastValueStatistics stats = new LastValueStatistics(_lastValue); | ||
_lastValue = null; | ||
return stats; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does Update
need this lock too? If it isn't locking also, I'm not sure if we need this lock here. We could swap for an interlocked exchange or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you could do this one with an InterlockedExchange instead of the lock. I left it as lock because it was simple and I don't anticipate this code is on the hot path. The hot paths are InstrumentState.Update(), AggregatorStore.GetAggregator(), RateSumAggregator.Update() and Histogram.Update(). Everything else is unlikely to go above a few hundred invocations/sec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything else is unlikely to go above a few hundred invocations/sec.
Which methods might be invoked several hundred times a second? Ones that might allocate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which methods might be invoked several hundred times a second? Ones that might allocate?
Yeah, if someone requested to collect metrics once per second and enabled a few hundred metrics then the InstrumentState.Collect() path would be running a few hundred times per second. InstrumentState.Collect() invokes AggregatorStore.Collect() which in turn invokes Aggregator.Collect(). There are a few allocations down that path.
None of that is going to happen automatically, an engineer needed to run a diagnostic tool to turn it on, they needed to specify they wanted collections every second, and they needed to specify a bunch of metrics they wanted collected.
/// ad-hoc monitoring for the new Instrument APIs. This source only supports one listener | ||
/// at a time. Each new listener will overwrite the configuration about which metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than having multiple session overwrite the configuration in 6, could we keep track of a union of requested metrics?
All sessions would observe additions to the metric collection and increases in the frequency.
e.g.,
Session A enables Metric1;Metric2 @ 10s intervals.
Session B enables Metric1;Metric3 @ 5s intervals.
then both sessions would see Metric1;Metric2;Metric3 @ 5s intervals.
Session B disables.
Session A continues to see Metric1;Metric2;Metric3 @ 5s intervals.
Then in 7 we could attempt to make changes unique to a session which would require more bookkeeping like your comment says.
- bugfix to use invariant culture - support time series and histogram limits
Thanks for all the review feedback everyone! I think everything has been addressed at this point. There are a few comments above where someone asked a question and I answered it that I didn't resolve so that folks could see the answers to their questions. If there is any remaining feedback please let me know, otherwise I plan to hit the merge button tomorrow. Also as a heads up I've got at least one more change I am thinking to make in a separate PR to improve how the EventSource handles exceptions that are thrown from user provided callbacks. |
This bit of code treats two consecutive fields of an object as a span of the two fields (similarly for related types). I suspect this is treading on dangerous territory and may lead to incorrect optimizations by the jit, especially if later on there is code that intermingles references from the Lines 32 to 37 in 5705c98
Lines 22 to 30 in 5705c98
|
The feature is still a work in progress but wanted to let others
see it in its current state while I am refining it.
Our out-of-process tools like dotnet-counters and dotnet-monitor need
access to the metrics produced by the new Meter APIs without
requiring the app to take any dependency on a separate OpenTelemetry
library. System.Diagnostics.Metrics EventSource is a new source designed
to let those tools access this data. The EventSource includes high
performance in-proc pre-aggregation capable of observing
millions of instrument invocations/sec/thread with low CPU overhead.
This change does not create any new BCL API surface, the aggregated
data is solely exposed by subscribing to the EventSource such as
using ETW, EventPipe, Lttng, or EventListener. For anyone wanting
in-process APIs to consume the data they could either use MeterListener
for unaggregated data or a library such as OpenTelemetry for
pre-aggregated data.
Todo list: