-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PROF-9370] Enable Garbage Collection profiling by default (when profiling is enabled) #3558
[PROF-9370] Enable Garbage Collection profiling by default (when profiling is enabled) #3558
Conversation
**What does this PR do?** This PR turns the Garbage Collection profiling feature on by default. Specifically it: 1. Introduces a new `c.profiling.advanced.gc_enabled` setting, which defaults to `true` (can also be toggled using the `DD_PROFILING_GC_ENABLED` env var) 2. Deprecated the previous `c.profiling.advanced.force_enable_gc_profiling` setting. This setting only served to turn the feature on, not turn it off, so it no longer makes sense to exist. 3. Does not turn on GC profiling on Rubies affected by https://bugs.ruby-lang.org/issues/18464 . Specifically, GC profiling can now only be turned on in Ruby versions 2.x, 3.1.4+, 3.2.3+ and 3.3+ . This avoids impacting known-buggy Rubies. **Motivation:** We've done through validation that this feature is working nicely and with low overhead, and it works great with the timeline view. **Additional Notes:** Regarding the impact of Ruby issue 18464, I explicitly chose to no longer allow GC profiling to be enabled on affected Rubies. I did this so as to simplify the implementation of the settings. Because we wanted to enable this feature by default, having a way to still enable it on these Rubies would mean either having another option (e.g. keep `force_enable_gc_profiling` around?), somehow introducing a tri-state value for the setting (like we do for the `no_signals_workaround_enabled`), or fiddling with precedences to distinguish the default value from a user-set one. In the end, I decided to simplify and not do any of the above. We can still change our minds later and introduce it based on user feedback and needs, but unless a user really needs to be on 3.0.x for which there is no bug fix release, we really really really really instead recomend moving to the latest patch release of the 3.1.x or 3.2.x series. **How to test the change?** The change includes test coverage. You can also profile a simple script and observe that GC profiling is included by default, without any extra option having to be set.
'with this option as profiles will be incomplete.' | ||
) | ||
end | ||
return false unless settings.profiling.advanced.gc_enabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For allocation we moved it out of advanced. Should we apply the same reasoning to gc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking behind putting that one out of advanced and leaving this one in advanced, is that I'm not sure exactly when (if?) we're going to enable allocation profiling by default.
Thus, it's a setting that I expect many users will want to toggle until we change that.
Conversely, for GC profiling, given we are enabling it by default I don't think anyone will need or should to touch this.
I'd say its existence is mostly a support mechanism, where if we find any issue that we suspect may be related to the feature, we have the means to disable it.
(This is why I left timeline_enabled
under advanced as well).
LMK your thoughts, if you still think this distinction is too subtle, I don't particularly mind moving it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. I think it makes sense and just wanted to understand the pattern you were going with here so I could follow it in the future :)
"Current Ruby version (#{RUBY_VERSION}) has a VM bug where enabling GC profiling would cause "\ | ||
'crashes (https://bugs.ruby-lang.org/issues/18464). GC profiling has been disabled.' | ||
) | ||
return false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this too aggressive since it only manifests when using Ractors (which bring other problems in those versions as well?).
Allocation suffers from the same issue and does not enforce false for this bug. We should either enforce there as well or we could have a separate flag governing disablement of both features related to ractor bugs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll concede it's a bit aggressive, and I shared my reasoning in the Additional notes for the commit.
TBH I think this feature is less important than allocation/heap, and so while we could put them on the same bucket, I... somewhat think the others deserve the extra attention, while this one doesn't quite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair fair 😄 And as you mention, we can revisit this at any point in the future anyway.
Reminder to self: Remove separate configurations we have for GC profiling testing once this gets merged in |
Merging this in! Thanks for the feedback (in this PR and in the review doc!) |
**What does this PR do?** This PR removes the benchmarking configuration for testing GC profiling. This is because in #3558 GC profiling was enabled by default and so the `only-profiling` configuration now includes GC profiling (and thus we don't need a separate one). **Motivation:** Remove benchmarking configurations that no longer make sense to have. **Additional Notes:** N/A **How to test the change?** Validate this configuration is no longer tested.
What does this PR do?
This PR turns the Garbage Collection profiling feature on by default (when profiling is enabled).
Specifically it:
Introduces a new
c.profiling.advanced.gc_enabled
setting, which defaults totrue
(can also be toggled using theDD_PROFILING_GC_ENABLED
env var)Deprecates the previous
c.profiling.advanced.force_enable_gc_profiling
setting. This setting only served to turn the feature on, not turn it off, so it no longer makes sense to exist.Does not turn on GC profiling on Rubies affected by https://bugs.ruby-lang.org/issues/18464 . Specifically, GC profiling can now only be turned on in Ruby versions 2.x, 3.1.4+, 3.2.3+ and 3.3+ . This avoids impacting known-buggy Rubies.
Motivation:
We've done through validation that this feature is working nicely and with low overhead, and it works great with the timeline view.
Additional Notes:
Regarding the impact of Ruby issue 18464, I explicitly chose to no longer allow GC profiling to be enabled on affected Rubies.
I did this so as to simplify the implementation of the settings.
Because we wanted to enable this feature by default, having a way to still enable it on these Rubies would mean either having another option (e.g. keep
force_enable_gc_profiling
around?), somehow introducing a tri-state value for the setting (like we do for theno_signals_workaround_enabled
), or fiddling with precedences to distinguish the default value from a user-set one.In the end, I decided to simplify and not do any of the above. We can still change our minds later and introduce it based on user feedback and needs, but unless a user really needs to be on 3.0.x for which there is no bug fix release, we really really really really instead recomend moving to the latest patch release of the 3.1.x or 3.2.x series.
How to test the change?
The change includes test coverage. You can also profile a simple script and observe that GC profiling is included by default, without any extra option having to be set.
For Datadog employees:
credentials of any kind, I've requested a review from
@DataDog/security-design-and-guidance
.