Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Commit 50dd834

Browse files
authored
[Local GC] Introduce standalone eventing design document (#15570)
* Typing * First draft * Update the doc based on feedback * Next iteration based on feedback * Iteration feedback
1 parent 66bd34e commit 50dd834

File tree

1 file changed

+237
-0
lines changed

1 file changed

+237
-0
lines changed
Lines changed: 237 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,237 @@
1+
# Standalone GC Eventing Design
2+
3+
Author: Sean Gillespie (@swgillespie) - 2017
4+
5+
This document aims to provide a specification for how a standalone GC fires
6+
events that can be collected by trace collectors. Such a feature is highly desirable
7+
for a standalone GC since it is the primary way that is used to reason about
8+
GC performance. Since a standalone GC is not permitted to link against the rest of the runtime, all
9+
communication betwene the runtime and the GC most pass through dynamically-dispatched interfaces.
10+
11+
## Definitions
12+
13+
* An **event** is some unit of information that the runtime can issue if requested. In general,
14+
this is used for lightweight tracing. Managed code can issue events using
15+
`System.Diagnostics.Tracing.EventSource`. Native code (i.e. the runtime) issues events by calling macros
16+
that delegate the issuing of events to autogenerated code that is generated to interface with the
17+
underlying event implementation. Events are only issued if they are turned on; the mechanism by which
18+
events are turned on is not in the scope of this document
19+
* The **payload** of an event is some amount of data that is delivered with the event itself. Its size
20+
may be variable. Most events that are fired by the runtime have a schema (predefined layout), but it
21+
is not a requirement.
22+
23+
## Goals
24+
25+
The goal of this document is to describe a system that allows for the efficient firing of
26+
performance events by a standalone GC. This system must have three properties in order to be
27+
acceptable:
28+
29+
1. It must be *efficient* to query whether or not a particular event is turned on. It is not acceptable
30+
to perform an indirection (i.e. cross the GC/EE interface boundary) in order to get this information.
31+
2. The cost of firing an event by a standalone GC should be comparable to the cost of firing an event
32+
without using a standalone GC.
33+
3. A standalone GC must be able to add new events without having to recompile the EE.
34+
35+
## Querying Whether Events Are Enabled
36+
37+
It is not acceptable to perform an indirection when querying whether or not a requested event is enabled. Therefore,
38+
it follows that the GC must maintain some state about what events are currently enabled. Events are enabled through
39+
*keywords* and *levels* on a particular provider; a particular event is enabled if the provider to which the event
40+
belongs has the event's keyword and level enabled.
41+
42+
The GC fires events from two providers: the "main" provider, `Microsoft-Windows-DotNETRuntime`, and the "private"
43+
provider, `Microsoft-Windows-DotNETRuntimePrivate`. The GC must track the enabled keyword and level status of each
44+
provider separately. To accomplish this, the GC will contain a class with this signature:
45+
46+
```c++
47+
enum GCEventProvider
48+
{
49+
GCEventProvider_Default = 0,
50+
GCEventProvider_Private = 1
51+
};
52+
53+
54+
class GCEventStatus
55+
{
56+
public:
57+
// Returns true if the given keyword and level are enabled for the given provider,
58+
// false otherwise.
59+
static bool IsEnabled(GCEventProvider provider, int keyword, int level);
60+
61+
// Enables events with the given keyword and level on the given provider.
62+
static bool Enable(GCEventProvider provider, int keyword, int level);
63+
64+
// Disables events with the given keyword and level on the given provider.
65+
static bool Disable(GCEventProvider provider, int keyword, int level);
66+
};
67+
```
68+
69+
The GC will use `GCEventStatus::IsEnabled` to query whether or not a particular event is enabled. Whenever the EE observes
70+
a change in what keywords or levels are enabled for a particular provider, it must inform the GC of the change so that
71+
it can update `GCEventStatus` using `Enable` and `Disable`. The exact mechanism by which the EE observes a change in the
72+
event state is described further below. ("Getting Informed of Changes to Event State").
73+
74+
When the EE *does* observe a change in event state, it must inform the GC of these changes so that it can update its
75+
state accordingly. The following additions are made to the `IGCHeap` API surface area:
76+
77+
```c++
78+
class IGCHeap
79+
{
80+
// Enables or disables events with the given keyword and level on the default provider.
81+
virtual void ControlEvents(bool enable, int keyword, int level) = 0;
82+
83+
// Enables or disables events with the given keyword and level on the private provider.
84+
virtual void ControlPrivateEvents(bool enable, int keyword, int level) = 0;
85+
};
86+
```
87+
88+
The currently enabled keywords and levels are encoded as bit vectors so that querying whether an event is enabled
89+
is efficient:
90+
91+
```c++
92+
uint32_t enabledLevels[2];
93+
uint32_t enabledKeywords[2];
94+
95+
bool GCEventStatus::IsEnabled(GCEventProvider provider, int keyword, int level)
96+
{
97+
size_t index = static_cast<size_t>(provider);
98+
return (enabledLevels[index] & level) && (enabledKeywords[index] & keyword);
99+
}
100+
```
101+
102+
## Firing Events
103+
104+
In order to fire an event, the GC will need to communicate with the EE in some way. The EE is ultimately responsible for routing the event to any appropriate subsystems (ETW, LTTNG, EventPipe) and the GC has no knowledge of what it is going to do with events that we give it.
105+
106+
Events are divided into two categories: **known** events and **dynamic** (or **custom**) events. Known events are known
107+
to the EE and correspond one-to-one to individual event types fired by the underlying platform loggers. Dynamic events are
108+
events not known to the EE; their use and description is below in the "Dynamic Events" section.
109+
110+
All events are fired through the `IGCToCLREventSink` interface, which is accessed through the `IGCToCLR` interface given
111+
to the GC on startup:
112+
113+
```c++
114+
class IGCToCLREventSink
115+
{
116+
117+
};
118+
119+
class IGCToCLR
120+
{
121+
virtual IGCToCLREventSink* EventSink() = 0;
122+
};
123+
```
124+
125+
Every known event is fired through its own dedicated callback on `IGCToCLREventSink`. For example, the `GCEnd` event
126+
is fired through a callback like this:
127+
128+
```c++
129+
class GCToCLREventSink : public IGCToCLREventSink { ... }
130+
131+
GCToCLREventSink::FireGCEnd(uint32_t count, uint16_t depth)
132+
{
133+
// ...
134+
}
135+
```
136+
137+
`GCTOCLREventSink::FireGCEnd` is responsible for constructing and dispatching the event to platform loggers if the event
138+
is enabled. The principal advantage of having one callback per known event is that known events can reach into EE internals
139+
and add data to events that the GC otherwise would not be aware of. Concrete examples of this would be:
140+
* The addition of `ClrInstanceId` to the payload of many events fired by the GC
141+
* Getting human-readable type names for objects allocated on the heap
142+
* Correlating `GCStart` events with collections induced via ETW.
143+
144+
## Defining and Firing Dynamic Events
145+
146+
It is useful for a standalone GC to be able to fire events that the EE was not previously aware of. For example, it is useful for a GC developer to add some event-based instrumentation to the GC, especially when testing new features and ensuring that they work as expected. Furthermore, it is desirable for GCs that are shipped from this repository
147+
(the "CLR GC") to interopate seamlessly with future versions of the .NET Core EE, which implies that it should be possible
148+
for the GC within this repository to add new events without having to recompile the runtime.
149+
150+
While it is possible for some eventing implementations to receive events that are created at runtime, not all eventing implementations (particularly LTTNG) are not flexible enough for this. In order to accomodate new events, another method is added to `IGCToCLREventSink`:
151+
152+
```c++
153+
void IGCToCLREventSink::FireDynamicEvent(
154+
/* IN */ const char* eventName,
155+
/* IN */ void* payload,
156+
/* IN */ size_t payloadSize
157+
);
158+
```
159+
160+
A runtime implementing this callback will implement it by having a "catch-all" GC event whose schema is an arbitrary sequence of bytes. Tools can parse the (deliberately unspecified) binary format provided by GC events that use this mechanism in order to recover the data within the payload.
161+
162+
Dynamic events will by fired by the GC whenever developers want to add a new event but don't want to force
163+
users to get a new version of the runtime in order to utilize the new event.
164+
165+
## Getting Informed of Changes to Event State
166+
167+
There are three mechanisms by which CoreCLR is able to log events: EventPipe, ETW, and LTTNG. When it comes to changing the state of events, EventPipe and ETW both allow users to attach callbacks that are invoked whenever events are enabled or disabled. For these two mechanisms, it is sufficient to use this callback mechanism to call
168+
IGCEventController::{Enable/Disable}Events from within such a callback in order to inform the GC of changes to tracing state.
169+
170+
LTTNG does not have such a mechanism. In order to observe changes in the eventing state, LTTNG must be polled periodically.
171+
Other eventing components in CoreCLR must already poll LTTNG for changes in the eventing state, so this design can utilize
172+
that same poll to inform the GC of changes.
173+
174+
## Implementation Concerns
175+
176+
An implementation of this spec must take care not to perturb the existing GC code base too much. The GC fires events through
177+
the use of macros generated by the ETW message compiler and carefully mocked by code generation for the other platform
178+
logging implementations. The eventing scheme in this document will need to provide implementations for all eventing
179+
macros used by the GC ([1]).
180+
181+
These headers are often auto-generated. We will need to take care to re-use existing code generators if possible - after all,
182+
we do want it to be easy to add new events. It will likely be difficult to balance auto-generated code with the need for
183+
subtle custom modifications to event dispatch code.
184+
185+
Tools (specifically PerfView) will want to be enlightened about dynamic events fired by the runtime. The extent to which
186+
we want to make this experience nice is up to us, but we will most likely want to ship PerfView support for any dynamic event
187+
that we end up shipping with the CLR GC.
188+
189+
[1]: https://github.com/dotnet/coreclr/blob/cab0db6345a7941f75d991281bcc0079d28ba182/src/gc/env/etmdummy.h#L5-L57
190+
191+
### Concrete Example: Porting a single known event to IGCToCLREventSink
192+
193+
The following steps illustrate what needs to be done to bring a single event over to `IGCToCLREventSink`:
194+
195+
Two things are needed by the GC in order to fire an event: a way to determine if the event is on, and a way to fire
196+
the event. Events can be fired even if they aren't enabled (they occasionally are); the platform loggers will ignore the
197+
event if it is not enabled. However, the GC generally avoids doing expensive eventing-related operations if an event is
198+
not on.
199+
200+
The GC generally uses the `ETW_EVENT_ENABLED` macro to query whether an event is on. `ETW_EVENT_ENABLED` will be
201+
implemented in terms of `GCEventStatus::IsEnabled` above, so you will need to define appropriate macros in order for your
202+
event to work here. This will likely mean that you will need to define a macro that turns the name of your event into a
203+
pair of a level and a keyword, which will determine whether or not your event is enabled.
204+
205+
To fire the event, you can add your event's callback to `IGCToCLREventSink`:
206+
207+
```c++
208+
class IGCToCLREventSink
209+
{
210+
virtual void FireYourEvent(
211+
/* your event arguments here... */
212+
) = 0;
213+
};
214+
```
215+
216+
You can then define the `FireYourEvent` macro in `src/gc/env/etmdummy.h` to point to your new method on `IGCToCLREventSink`
217+
Your implementation of `FireYourEvent` in the EE will need to calculate any EE-specific data (e.g. `ClrInstanceId`) and
218+
then forward the event arguments onto the platform logger, which can be done with the `Fire` macros that the EE has access
219+
to (that we implemented for the GC).
220+
221+
### Concrete Example: Adding a dynamic event
222+
223+
The following steps illustrate what needs to be done to add a new dynamic event:
224+
225+
There are two things that need to be written: the `ETW_EVENT_ENABLED` support for your new event and the macro responsible
226+
for firing your event. `ETW_EVENT_ENABLED` can be implemented in the same manner as a known event, by introducing a macro
227+
for your event name that expands to your event's level and keyword.
228+
229+
Firing of a dynamic event ultimately must call `FireDynamicEvent` with an event name and a serialized payload. Therefore,
230+
it is the responsibility of the GC to format an event's payload into a binary format. It is ultimately up to you(1) to write
231+
a function that serializes your event's arguments into a buffer and sends the buffer to `FireDynamicEvent`. This code in
232+
turn can be wired up to the GC codebase by defining a new `FireMyCustomEvent` macro whose arguments are forwarded onto
233+
the event serialization function.
234+
235+
1. C++ has the ability to auto-generate large swaths of this code. The implementation of this spec will provide a series of
236+
composable helper functions that automate the serialization of arguments so that they do not have to be written by the
237+
developer adding a new event.

0 commit comments

Comments
 (0)