[PROF-6558] Show profiler overhead in flamegraph for CPU Profiling 2.0 #2607

ivoanjo · 2023-02-08T17:16:14Z

What does this PR do?:

This PR changes the cpu-time/wall-time sampling in the CPU Profiling 2.0 profiler to expose the profiler overhead as part of the captured data.

Specifically, during sampling, the profiler now measures how much cpu-time/wall-time is spent by the profiler itself, and adds that to the profiler as a separate stack (that shows up as belonging to dd-trace-rb).

Motivation:

In the old Ruby profiler, the profiler always worked from a background thread. Because we also sampled that background thread, the flamegraph included the overhead of the profiler -- it was the overhead of the background thread.

With CPU Profiling 2.0, sampling is done in whatever thread is holding the Global VM Lock, which means that this part of the profiler overhead was not being exposed. With this change, this impact on code is now accounted for.

Additional Notes:

I was torn between a few options, and decided to use the stack of the CpuAndWallTimeWorker thread to represent the overhead of the profiler.

Thus, when this stack shows up in another thread, it represents the overhead of the profiler on that thread.

I could have gone with a placeholder (e.g. just a frame that says "Profiler Sampling" or something similar), but such placeholders break the "Only My Code" feature.

Using a stack from a different thread than the one we're supposedly "sampling" adds a bit of complexity; but we can always undo this and go with the placeholder in this becomes problematic.

How to test the change?:

Change includes code coverage.

I got tired of the extra boilerplate from using hashes and introduced two small structs to contain the parsed profile data.

This will enable it to be reused when we want to add the profiler overhead as a stack.

**What does this PR do?**: This PR changes the cpu-time/wall-time sampling in the CPU Profiling 2.0 profiler to expose the profiler overhead as part of the captured data. Specifically, during sampling, the profiler now measures how much cpu-time/wall-time is spent by the profiler itself, and adds that to the profiler as a separate stack (that shows up as belonging to dd-trace-rb). **Motivation**: In the old Ruby profiler, the profiler always worked from a background thread. Because we also sampled that background thread, the flamegraph included the overhead of the profiler -- it was the overhead of the background thread. With CPU Profiling 2.0, sampling is done in whatever thread is holding the Global VM Lock, which means that this part of the profiler overhead was not being exposed. With this change, this impact on code is now accounted for. **Additional Notes**: I was torn between a few options, and decided to use the stack of the `CpuAndWallTimeWorker` thread to represent the overhead of the profiler. Thus, when this stack shows up in another thread, it represents the overhead of the profiler on that thread. I could have gone with a placeholder (e.g. just a frame that says "Profiler Sampling" or something similar), but such placeholders break the "Only My Code" feature. Using a stack from a different thread than the one we're supposedly "sampling" adds a bit of complexity; but we can always undo this and go with the placeholder in this becomes problematic. **How to test the change?**: Change includes code coverage.

Ooops! I missed updating this spec since it doesn't run on all Rubies.

Because the new feature relies on adding extra samples to the flamegraph, the ```ruby expect(stats.fetch(:signal_handler_enqueued_sample)).to be >= sample_count ``` assertion was failing.

This will be useful when we want to process the profile and either focus on or disregard these samples.

The profiler overhead frames show up non-deterministically (because they show up in whatever thread has the Global VM Lock), so they can cause a lot of the specs to become flaky. Excluding them should be enough to avoid the issue.

marcotc · 2023-02-13T23:43:16Z

decided to use the stack of the CpuAndWallTimeWorker thread to represent the overhead of the profiler.

This seems ok. I agree that there's no straightforward lightweight solution here.
Any precise solution I can think of involves adding an extra dimension to all samples.
I think the chosen solution is a good compromise.

ivoanjo · 2023-02-14T11:30:58Z

Thanks Marco for the review! :)

Merging away!

ivoanjo added 8 commits February 8, 2023 12:26

Introduce Sample and Frame structs in ProfileHelpers

45b0da4

I got tired of the extra boilerplate from using hashes and introduced two small structs to contain the parsed profile data.

Extract metrics update during sample into a separate function

9f76677

This will enable it to be reused when we want to add the profiler overhead as a stack.

Silence Rubocop warning

2880f59

Fix broken spec after previous refactoring

0816723

Ooops! I missed updating this spec since it doesn't run on all Rubies.

Fix flaky spec caused by the new "show profiler overhead" feature

89e72e1

Because the new feature relies on adding extra samples to the flamegraph, the ```ruby expect(stats.fetch(:signal_handler_enqueued_sample)).to be >= sample_count ``` assertion was failing.

Add label to samples representing the profiler overhead

061304b

This will be useful when we want to process the profile and either focus on or disregard these samples.

ivoanjo marked this pull request as ready for review February 8, 2023 17:16

ivoanjo requested a review from a team February 8, 2023 17:16

github-actions bot added the profiling Involves Datadog profiling label Feb 8, 2023

marcotc approved these changes Feb 13, 2023

View reviewed changes

Merge branch 'master' into ivoanjo/prof-6558-expose-profiler-overhead-v2

f295062

ivoanjo merged commit cf7cadf into master Feb 14, 2023

ivoanjo deleted the ivoanjo/prof-6558-expose-profiler-overhead-v2 branch February 14, 2023 11:31

github-actions bot added this to the 1.10.0 milestone Feb 14, 2023

TonyCTHsu mentioned this pull request Mar 6, 2023

Bump to version 1.10.0 #2667

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PROF-6558] Show profiler overhead in flamegraph for CPU Profiling 2.0 #2607

[PROF-6558] Show profiler overhead in flamegraph for CPU Profiling 2.0 #2607

ivoanjo commented Feb 8, 2023

marcotc commented Feb 13, 2023

ivoanjo commented Feb 14, 2023

[PROF-6558] Show profiler overhead in flamegraph for CPU Profiling 2.0 #2607

[PROF-6558] Show profiler overhead in flamegraph for CPU Profiling 2.0 #2607

Conversation

ivoanjo commented Feb 8, 2023

marcotc commented Feb 13, 2023

ivoanjo commented Feb 14, 2023