-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PROF-7307] Avoid triggering allocation sampling during sampling #2690
Conversation
This new grouping hopefully makes a bit more sense than the somewhat arbitrary order we were using before.
In rare cases (such as when throwing exceptions), the profiler may need to allocate Ruby objects during sampling. This object allocation will trigger the `object_allocation_tracepoint` but we don't want to try to sample in the middle of sampling. To avoid any issues altogether, I've introduced a `during_sample` flag that can be used to detect this and avoid trying to do sampling during sampling. Note that the functions modified all run while holding on to the global VM lock, which is why this flag does not need any special concerns re: concurrency. I've also chosen to keep a counter for how often this happens that may come in handy when debugging any issues around this (and should be cheap enough to keep).
16e3dde
to
7317ee4
Compare
// In a few cases, we may actually be allocating an object as part of profiler sampling. We don't want to recursively | ||
// sample, so we just return early | ||
if (state->during_sample) { | ||
state->stats.allocations_during_sample++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we tracking it "just in case"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good question. No, that wasn't my intention; I vaguely intended to use it for validation, but hadn't thought very far after that.
In fact, I initially implemented this on my working branch before I did the change in #2644, so I've been sitting on this for quite some time.
TL;DR, now that #2644 is in, I think rather than vague intentions, I can actually add specs asserting that common operations in the profiler actually don't cause allocations.
So, let me revisit this PR and add a few tests where I actually assert on this value. That seems like a nice low-hanging win that can already go in with this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a spec that makes use of this in 1d16535
Codecov Report
@@ Coverage Diff @@
## master #2690 +/- ##
=======================================
Coverage 98.08% 98.09%
=======================================
Files 1168 1168
Lines 64221 64230 +9
Branches 2857 2858 +1
=======================================
+ Hits 62994 63004 +10
+ Misses 1227 1226 -1
... and 3 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
As discussed during PR review, let's use this counter for correctness validation.
What does this PR do?:
In rare cases (such as when throwing exceptions), the profiler may need to allocate Ruby objects during sampling.
This object allocation will trigger the
object_allocation_tracepoint
but we don't want to try to sample in the middle of sampling.To avoid any issues altogether, I've introduced a
during_sample
flag that can be used to detect this and avoid trying to do sampling during sampling.Note that the functions modified all run while holding on to the global VM lock, which is why this flag does not need any special concerns re: concurrency.
I've also chosen to keep a counter for how often this happens that may come in handy when debugging any issues around this (and should be cheap enough to keep).
Motivation:
This PR is one more piece on the path to providing allocation profiling. The key missing part is the "// TODO: Sampling goes here", which is what will actually trigger recording of stack traces when objects get allocated.
In the spirit of small, incremental changes, I'm opening a PR with just this, and will later follow up with the rest of the implementation.
Additional Notes:
I also slightly moved around the declarations of fields inside
struct cpu_and_wall_time_worker_state
and in the constructor. I've made this change in a separate commit, so I recommend reviewing this PR commit-by-commit as that makes a lot more sense.How to test the change?:
This change is a bit hard to test, since it affects only the C-level internal state of the profiler. I can't think of a great way of testing this that doesn't involve adding a lot of complexity to the profiler C code just for the purpose of enabling testing.
Instead, I plan to test this using integration testing, e.g. when making sure that the allocation profiling can be run at the same time as other kinds of profiling.