Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[prof-heap] Fix issues stemming from rb_gc_force_recycle #3366

Merged
merged 6 commits into from
Jan 9, 2024

Conversation

AlexJF
Copy link
Contributor

@AlexJF AlexJF commented Jan 5, 2024

What does this PR do?

When trying our new heap profiler in real applications running in Ruby 2.7 we noticed the profiler quickly shuts down due to detecting duplicate sampling of objects with the same id. This is not supposed to happen since after Ruby 2.7 the object id system was reworked to ensure uniqueness and we rely on this as part of our object liveness tracking.

Turns out Rubies previous to 3.1 support a feature called "force recycling" where an object slot can be manually given back to the heap outside of a GC cycle by calling rb_gc_force_recycle. However, the implementation of this feature is buggy and fails to clean-up objects ids or finalizers associated with objects that undergo this recycling (in a way that will actually slowly leak map entries over time). This feature also brought enough problems to the management of garbage collection code that it was eventually removed from Ruby 3.1 and later and replaced with a no-op (https://bugs.ruby-lang.org/issues/18290).

Therefore, in Rubies < 3.1, if an object we were tracking were recycled and a new object uses its slot, this new object would "inherit" the id of the original and, from the perspective of our profiler, we'd be blind to this behaviour and continue reporting the original object as being alive.

Fortunately, as figured out by @ivoanjo in #3360 (comment), this implementation is buggy in a way that allows us to detect this recycling actually happened. When an object id is requested for an object, that object has the FL_SEEN_OBJ_ID flag forcibly set. This is supposed to be an invariant as evidenced by the assert in https://github.com/ruby/ruby/blob/4a8d7246d15b2054eacb20f8ab3d29d39a3e7856/gc.c#L4050. However, when a new object uses a force recycled slot, it will inherit the object id from the object that was previously using that slow but will NOT be marked with the FL_SEEN_OBJ_ID flag. Consequently, when we're checking the liveness of a heap tracked object, if we're able to successfully map an id to a reference and then check the flags of that reference, if we don't see FL_SEEN_OBJ_ID then we can assume the object we were tracking was implicitly freed and clean its record.

We have to be careful though. Since allocations that re-use recycled slots still trigger allocation tracepoints, it is possible that we decide to start tracking an object that was allocated on such a recycled slot. In this event, that object will already be missing the FL_SEEN_OBJ_ID flag when we start tracking it. If we did nothing, next time we checked liveness using the workaround described in the previous paragraph we'd immediately discard it as dead even though it might very well be alive. Hence, when we detect a missing FL_SEEN_OBJ_ID flag on sampling, we need to force it back in, thus "fixing" the bug in the runtime. This also had the side-effect of reducing the number of map entries leaked in the obj_to_id and id_to_obj objspace tables.

Motivation:
Handle problems introduced by gc_force_recycle in Rubies < 3.1.

Additional Notes:

How to test the change?

For Datadog employees:

  • If this PR touches code that signs or publishes builds or packages, or handles
    credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.
  • This PR doesn't touch any of that.

Unsure? Have a question? Request a review!

@github-actions github-actions bot added the profiling Involves Datadog profiling label Jan 5, 2024
@AlexJF AlexJF mentioned this pull request Jan 5, 2024
2 tasks
@AlexJF AlexJF force-pushed the alexjf/heap-profiler-force-recycle-fix branch from cb01363 to 2d89d6d Compare January 5, 2024 19:22
Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really happy to see this working! Left a few notes :)

ext/ddtrace_profiling_native_extension/extconf.rb Outdated Show resolved Hide resolved
ext/ddtrace_profiling_native_extension/heap_recorder.c Outdated Show resolved Hide resolved
ext/ddtrace_profiling_native_extension/heap_recorder.c Outdated Show resolved Hide resolved
ext/ddtrace_profiling_native_extension/heap_recorder.c Outdated Show resolved Hide resolved
lib/datadog/profiling/component.rb Outdated Show resolved Hide resolved
spec/datadog/profiling/stack_recorder_spec.rb Outdated Show resolved Hide resolved
Base automatically changed from alexjf/heap-profiler-misc-fixes to master January 8, 2024 11:41
@AlexJF AlexJF force-pushed the alexjf/heap-profiler-force-recycle-fix branch 3 times, most recently from a4583e0 to f80cedf Compare January 8, 2024 17:25
[prof-heap] Remove internal include
@AlexJF AlexJF force-pushed the alexjf/heap-profiler-force-recycle-fix branch from f80cedf to 304c77e Compare January 8, 2024 17:40
@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (cfeff93) 98.24% compared to head (5e3af5a) 98.24%.
Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff            @@
##           master    #3366    +/-   ##
========================================
  Coverage   98.24%   98.24%            
========================================
  Files        1254     1254            
  Lines       73218    73350   +132     
  Branches     3430     3437     +7     
========================================
+ Hits        71931    72064   +133     
+ Misses       1287     1286     -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@AlexJF AlexJF marked this pull request as ready for review January 8, 2024 18:31
@AlexJF AlexJF requested a review from a team as a code owner January 8, 2024 18:31
Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Really happy that this seems to have gone from a really annoying issue to a small roadbump :)

lib/datadog/profiling/component.rb Outdated Show resolved Hide resolved
spec/datadog/profiling/stack_recorder_spec.rb Outdated Show resolved Hide resolved
@AlexJF AlexJF merged commit ffcf68c into master Jan 9, 2024
218 checks passed
@AlexJF AlexJF deleted the alexjf/heap-profiler-force-recycle-fix branch January 9, 2024 12:46
@github-actions github-actions bot added this to the 1.19.0 milestone Jan 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
profiling Involves Datadog profiling
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants