Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROF-9370] Enable Garbage Collection profiling by default (when profiling is enabled) #3558

Merged
merged 1 commit into from
Apr 3, 2024

Conversation

ivoanjo
Copy link
Member

@ivoanjo ivoanjo commented Mar 27, 2024

What does this PR do?

This PR turns the Garbage Collection profiling feature on by default (when profiling is enabled).

Specifically it:

  1. Introduces a new c.profiling.advanced.gc_enabled setting, which defaults to true (can also be toggled using the DD_PROFILING_GC_ENABLED env var)

  2. Deprecates the previous c.profiling.advanced.force_enable_gc_profiling setting. This setting only served to turn the feature on, not turn it off, so it no longer makes sense to exist.

  3. Does not turn on GC profiling on Rubies affected by https://bugs.ruby-lang.org/issues/18464 . Specifically, GC profiling can now only be turned on in Ruby versions 2.x, 3.1.4+, 3.2.3+ and 3.3+ . This avoids impacting known-buggy Rubies.

Motivation:

We've done through validation that this feature is working nicely and with low overhead, and it works great with the timeline view.

Additional Notes:

Regarding the impact of Ruby issue 18464, I explicitly chose to no longer allow GC profiling to be enabled on affected Rubies.

I did this so as to simplify the implementation of the settings.

Because we wanted to enable this feature by default, having a way to still enable it on these Rubies would mean either having another option (e.g. keep force_enable_gc_profiling around?), somehow introducing a tri-state value for the setting (like we do for the no_signals_workaround_enabled), or fiddling with precedences to distinguish the default value from a user-set one.

In the end, I decided to simplify and not do any of the above. We can still change our minds later and introduce it based on user feedback and needs, but unless a user really needs to be on 3.0.x for which there is no bug fix release, we really really really really instead recomend moving to the latest patch release of the 3.1.x or 3.2.x series.

How to test the change?

The change includes test coverage. You can also profile a simple script and observe that GC profiling is included by default, without any extra option having to be set.

For Datadog employees:

  • If this PR touches code that signs or publishes builds or packages, or handles
    credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.
  • This PR doesn't touch any of that.

**What does this PR do?**

This PR turns the Garbage Collection profiling feature on by default.

Specifically it:

1. Introduces a new `c.profiling.advanced.gc_enabled` setting, which
   defaults to `true` (can also be toggled using the
   `DD_PROFILING_GC_ENABLED` env var)

2. Deprecated the previous
   `c.profiling.advanced.force_enable_gc_profiling` setting. This
   setting only served to turn the feature on, not turn it off, so it
   no longer makes sense to exist.

3. Does not turn on GC profiling on Rubies affected by
   https://bugs.ruby-lang.org/issues/18464 . Specifically, GC profiling
   can now only be turned on in Ruby versions 2.x, 3.1.4+, 3.2.3+ and
   3.3+ . This avoids impacting known-buggy Rubies.

**Motivation:**

We've done through validation that this feature is working nicely and
with low overhead, and it works great with the timeline view.

**Additional Notes:**

Regarding the impact of Ruby issue 18464, I explicitly chose to no
longer allow GC profiling to be enabled on affected Rubies.

I did this so as to simplify the implementation of the settings.

Because we wanted to enable this feature by default, having a way
to still enable it on these Rubies would mean either having another
option (e.g. keep `force_enable_gc_profiling` around?), somehow
introducing a tri-state value for the setting (like we do for the
`no_signals_workaround_enabled`), or fiddling with precedences to
distinguish the default value from a user-set one.

In the end, I decided to simplify and not do any of the above. We
can still change our minds later and introduce it based on user
feedback and needs, but unless a user really needs to be on
3.0.x for which there is no bug fix release, we really really
really really instead recomend moving to the latest patch release
of the 3.1.x or 3.2.x series.

**How to test the change?**

The change includes test coverage. You can also profile a simple
script and observe that GC profiling is included by default, without
any extra option having to be set.
@ivoanjo ivoanjo requested review from a team as code owners March 27, 2024 16:20
@github-actions github-actions bot added core Involves Datadog core libraries profiling Involves Datadog profiling labels Mar 27, 2024
'with this option as profiles will be incomplete.'
)
end
return false unless settings.profiling.advanced.gc_enabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For allocation we moved it out of advanced. Should we apply the same reasoning to gc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking behind putting that one out of advanced and leaving this one in advanced, is that I'm not sure exactly when (if?) we're going to enable allocation profiling by default.

Thus, it's a setting that I expect many users will want to toggle until we change that.


Conversely, for GC profiling, given we are enabling it by default I don't think anyone will need or should to touch this.

I'd say its existence is mostly a support mechanism, where if we find any issue that we suspect may be related to the feature, we have the means to disable it.

(This is why I left timeline_enabled under advanced as well).

LMK your thoughts, if you still think this distinction is too subtle, I don't particularly mind moving it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. I think it makes sense and just wanted to understand the pattern you were going with here so I could follow it in the future :)

"Current Ruby version (#{RUBY_VERSION}) has a VM bug where enabling GC profiling would cause "\
'crashes (https://bugs.ruby-lang.org/issues/18464). GC profiling has been disabled.'
)
return false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this too aggressive since it only manifests when using Ractors (which bring other problems in those versions as well?).

Allocation suffers from the same issue and does not enforce false for this bug. We should either enforce there as well or we could have a separate flag governing disablement of both features related to ractor bugs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll concede it's a bit aggressive, and I shared my reasoning in the Additional notes for the commit.

TBH I think this feature is less important than allocation/heap, and so while we could put them on the same bucket, I... somewhat think the others deserve the extra attention, while this one doesn't quite?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair fair 😄 And as you mention, we can revisit this at any point in the future anyway.

@ivoanjo
Copy link
Member Author

ivoanjo commented Apr 2, 2024

Reminder to self: Remove separate configurations we have for GC profiling testing once this gets merged in

@ivoanjo
Copy link
Member Author

ivoanjo commented Apr 3, 2024

Merging this in! Thanks for the feedback (in this PR and in the review doc!)

@ivoanjo ivoanjo merged commit ab04fca into master Apr 3, 2024
211 checks passed
@ivoanjo ivoanjo deleted the ivoanjo/prof-9370-enable-gc-profiling-by-default branch April 3, 2024 08:35
@github-actions github-actions bot added this to the 1.22.0 milestone Apr 3, 2024
@ivoanjo ivoanjo changed the title [PROF-9370] Enable Garbage Collection profiling by default [PROF-9370] Enable Garbage Collection profiling by default (when profiling is enabled) Apr 3, 2024
ivoanjo added a commit that referenced this pull request Apr 3, 2024
**What does this PR do?**

This PR removes the benchmarking configuration for testing GC
profiling.

This is because in #3558
GC profiling was enabled by default and so the `only-profiling`
configuration now includes GC profiling (and thus we don't need
a separate one).

**Motivation:**

Remove benchmarking configurations that no longer make sense to have.

**Additional Notes:**

N/A

**How to test the change?**

Validate this configuration is no longer tested.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Involves Datadog core libraries profiling Involves Datadog profiling
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants