Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROF-7307] Avoid triggering allocation sampling during sampling #2690

Merged
merged 3 commits into from
Mar 16, 2023

Conversation

ivoanjo
Copy link
Member

@ivoanjo ivoanjo commented Mar 13, 2023

What does this PR do?:

In rare cases (such as when throwing exceptions), the profiler may need to allocate Ruby objects during sampling.

This object allocation will trigger the object_allocation_tracepoint but we don't want to try to sample in the middle of sampling.

To avoid any issues altogether, I've introduced a during_sample flag that can be used to detect this and avoid trying to do sampling during sampling.

Note that the functions modified all run while holding on to the global VM lock, which is why this flag does not need any special concerns re: concurrency.

I've also chosen to keep a counter for how often this happens that may come in handy when debugging any issues around this (and should be cheap enough to keep).

Motivation:

This PR is one more piece on the path to providing allocation profiling. The key missing part is the "// TODO: Sampling goes here", which is what will actually trigger recording of stack traces when objects get allocated.

In the spirit of small, incremental changes, I'm opening a PR with just this, and will later follow up with the rest of the implementation.

Additional Notes:

I also slightly moved around the declarations of fields inside struct cpu_and_wall_time_worker_state and in the constructor. I've made this change in a separate commit, so I recommend reviewing this PR commit-by-commit as that makes a lot more sense.

How to test the change?:

This change is a bit hard to test, since it affects only the C-level internal state of the profiler. I can't think of a great way of testing this that doesn't involve adding a lot of complexity to the profiler C code just for the purpose of enabling testing.

Instead, I plan to test this using integration testing, e.g. when making sure that the allocation profiling can be run at the same time as other kinds of profiling.

This new grouping hopefully makes a bit more sense than the somewhat
arbitrary order we were using before.
@ivoanjo ivoanjo requested a review from a team March 13, 2023 17:40
@github-actions github-actions bot added the profiling Involves Datadog profiling label Mar 13, 2023
In rare cases (such as when throwing exceptions), the profiler may need
to allocate Ruby objects during sampling.

This object allocation will trigger the `object_allocation_tracepoint`
but we don't want to try to sample in the middle of sampling.

To avoid any issues altogether, I've introduced a `during_sample` flag
that can be used to detect this and avoid trying to do sampling
during sampling.

Note that the functions modified all run while holding on to the
global VM lock, which is why this flag does not need any special
concerns re: concurrency.

I've also chosen to keep a counter for how often this happens that
may come in handy when debugging any issues around this (and should
be cheap enough to keep).
@ivoanjo ivoanjo force-pushed the ivoanjo/prof-7307-allocation-sampling branch from 16e3dde to 7317ee4 Compare March 13, 2023 17:41
// In a few cases, we may actually be allocating an object as part of profiler sampling. We don't want to recursively
// sample, so we just return early
if (state->during_sample) {
state->stats.allocations_during_sample++;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we tracking it "just in case"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good question. No, that wasn't my intention; I vaguely intended to use it for validation, but hadn't thought very far after that.

In fact, I initially implemented this on my working branch before I did the change in #2644, so I've been sitting on this for quite some time.

TL;DR, now that #2644 is in, I think rather than vague intentions, I can actually add specs asserting that common operations in the profiler actually don't cause allocations.

So, let me revisit this PR and add a few tests where I actually assert on this value. That seems like a nice low-hanging win that can already go in with this change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a spec that makes use of this in 1d16535

@codecov-commenter
Copy link

codecov-commenter commented Mar 13, 2023

Codecov Report

Merging #2690 (1d16535) into master (70f59e6) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master    #2690   +/-   ##
=======================================
  Coverage   98.08%   98.09%           
=======================================
  Files        1168     1168           
  Lines       64221    64230    +9     
  Branches     2857     2858    +1     
=======================================
+ Hits        62994    63004   +10     
+ Misses       1227     1226    -1     
Impacted Files Coverage Δ
...filing/collectors/cpu_and_wall_time_worker_spec.rb 95.56% <100.00%> (+0.11%) ⬆️

... and 3 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

As discussed during PR review, let's use this counter for correctness
validation.
@ivoanjo ivoanjo merged commit a05d2eb into master Mar 16, 2023
@ivoanjo ivoanjo deleted the ivoanjo/prof-7307-allocation-sampling branch March 16, 2023 12:23
@github-actions github-actions bot added this to the 1.11.0 milestone Mar 16, 2023
@lloeki lloeki modified the milestones: 1.11.0, 1.11.0.beta1 Apr 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
profiling Involves Datadog profiling
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants