[PROF-2710] Fix missing instrumentation when using Thread.start/fork #1340

ivoanjo · 2021-02-02T12:23:01Z

When a Thread is started with either start or fork (they are aliases), then our CThread#initialize patch never gets run and thus these threads are not properly instrumented.

This API is used in Ruby's own built-in timeout gem (see https://github.com/ruby/timeout/blob/14d421252b54027108da183783a7a26d0cf5aac1/lib/timeout.rb#L87) although I'm not entirely sure how common it is in the community.

The behavior of skipping #initialize is by design, but it seems a rather weird API; here's what the docs say:

Basically the same as ::new. However, if class Thread is subclassed,
then calling start in that subclass will not invoke the subclass's
initialize method.

(https://rubyapi.org/3.0/o/thread#method-c-start)

To fix this, we introduce a new module, which wraps these two operations to provide the expected behavior of installing the CPU instrumentation.

NOTE:
Due to the way the missing cpu time instrumentation warning works, Thread.start/fork will still trigger it, even if Threads are
instrumented correctly.

Since #1310 is still under review, for now I'm leaving it as a FIXME (and marking this PR as a draft), but this should be fixed before merging.

(I can't think of a 100% accurate way of fixing it, so possibly the fix will be along the lines of "track how long the thread
has lived, and if it was just created, don't warn yet -- wait a few seconds before considering it as missing instrumentation").

Hopefully at some point #1333 will become a better solution to all this messing around.

ivoanjo · 2021-02-03T11:31:49Z

Note: CI is broken on Ruby 3.0 because of the issue fixed in #1345 . I'll rebase this PR after that one gets merged to add ruby2_keywords to start, which fixes the breaking test.

marcotc · 2021-02-03T16:22:12Z

lib/ddtrace/profiling/collectors/stack.rb

@@ -187,6 +187,9 @@ def build_backtrace_location(_id, base_label, lineno, path)

        private

+        # FIXME: Threads started with `start`/`fork` can wrongly trigger this warning, because unlike with Thread#new
+        # we can't distinguish between missing instrumentation and "thread was just started and hasn't had time to
+        # run update_native_ids"


Ugh, over the lifetime of an application this will likely get triggered a few times.

Would it make sense to give a thread one opportunity to fulfill @warned_about_missing_cpu_time_instrumentation, something like setting a sentinel value in the thread object that tracks "we've given you one chance to run update_native_ids, but next time I check, I'll warn if you didn't". But this suggestion has the side-effect of missing the warning on short-lived threads.

What do you think?

Agreed. It's a shame that it ends up making the detection miss a few cases, but hopefully we're mostly done with the whack-a-mole game so it's not going to come up often.

I'll wait until #1310 gets merged to add this change, as there's already quite a bit of branch dependencies around, and after that I'll unmark as draft.

marcotc

Looks very good!
I left a comment for discussion, but I don't think it blocks this specific PR.

When a `Thread` is started with either `start` or `fork` (they are aliases), then our `CThread#initialize` patch never gets run and thus these threads are not properly instrumented. This API is used in Ruby's own built-in `timeout` gem (see <https://github.com/ruby/timeout/blob/14d421252b54027108da183783a7a26d0cf5aac1/lib/timeout.rb#L87>) although I'm not entirely sure how common it is in the community. The behavior of skipping `#initialize` is by design, but it seems a rather weird API; here's what the docs say: > Basically the same as ::new. However, if class Thread is subclassed, > then calling start in that subclass will not invoke the subclass's > initialize method. (<https://rubyapi.org/3.0/o/thread#method-c-start>) To fix this, we introduce a new module, which wraps these two operations to provide the expected behavior of installing the CPU instrumentation. **NOTE**: Due to the way the missing cpu time instrumentation warning works, Thread.start/fork will still trigger it, even if Threads are instrumented correctly. Since #1310 is still under review, for now I'm leaving it as a `FIXME`, but this should be fixed before merging. (I can't think of a 100% accurate way of fixing it, so possibly the fix will be along the lines of "track how long the thread has lived, and if it was just created, don't warn yet -- wait a few seconds before considering it as missing instrumentation"). Hopefully at some point #1333 will become a better solution to all this messing around.

The fix here is exactly the same as the one for Thread#initialize: use the `ruby2_keywords` as described in <https://eregon.me/blog/2021/02/13/correct-delegation-in-ruby-2-27-3.html>.

In its current state, we discovered that the missing instrumentation warning could be triggered when threads were started with `Thread.start`/`Thread.fork`, even when our instrumentation was available, due to a race between the start of a thread and our sampling. We had some ideas to fix it -- something along the lines of "tag the thread with a timestamp, and don't warn until a reasonable amount of time has passed, so that the thread will have time to run the initialization". But when working on another task, I realized that threads created via C extensions (such as ethon, that relies on FFI) also skip our instrumentation. For these, we do not yet have any solution (but there's a couple of ideas on how we could go about gathering the thread id for these threads). Thus, because we can't really tell if a thread was started by a C extension, it no longer makes sense to make this warning a user-visible message that asks customers to file bug reports to us, because in those cases we won't be able to do anything about it. And finally, if the warning is not going to be user-visible anymore, I don't think it's worth the extra effort/complexity to solve the possible race when using `Thread.start`/`Thread.fork`. In the end, this warning becomes a "hint" we'll keep around for development/debugging.

ivoanjo · 2021-03-01T11:50:04Z

Update on this one:

I've rebased this branch on top of the current feature/profiling state
I've integrated the Ruby 3 fixes
I've hid the instrumentation missing message, rather than fixing it. See the commit message in fa8c193 for the rationale behind my decision.

Thus, it's ready for review again! @marcotc or @delner can I bother one of you for a second look? :)

marcotc

[PROF-2710] Fix missing instrumentation when using Thread.start/fork

ivoanjo requested a review from delner February 2, 2021 12:23

ivoanjo force-pushed the ivoanjo/warn-on-missing-cpu-time-instrumentation branch from 02d3805 to 8ca9c84 Compare February 2, 2021 12:55

ivoanjo force-pushed the fix/missing-thread-instrumentation branch from 988bf34 to 7aaf243 Compare February 2, 2021 13:00

marcotc reviewed Feb 3, 2021

View reviewed changes

ivoanjo force-pushed the ivoanjo/warn-on-missing-cpu-time-instrumentation branch 2 times, most recently from ed2e124 to c614edc Compare February 11, 2021 11:13

Base automatically changed from ivoanjo/warn-on-missing-cpu-time-instrumentation to feature/profiling February 11, 2021 11:55

ivoanjo added 3 commits March 1, 2021 10:49

Fix Ruby 3 support for Thread.start monkey patch

f8d0f19

The fix here is exactly the same as the one for Thread#initialize: use the `ruby2_keywords` as described in <https://eregon.me/blog/2021/02/13/correct-delegation-in-ruby-2-27-3.html>.

ivoanjo force-pushed the fix/missing-thread-instrumentation branch from 7aaf243 to fa8c193 Compare March 1, 2021 11:45

ivoanjo marked this pull request as ready for review March 1, 2021 11:46

ivoanjo requested a review from a team March 1, 2021 11:46

marcotc approved these changes Mar 3, 2021

View reviewed changes

ivoanjo merged commit ff10623 into feature/profiling Mar 8, 2021

ivoanjo deleted the fix/missing-thread-instrumentation branch March 8, 2021 09:13

ivoanjo added a commit that referenced this pull request Mar 11, 2021

Merge pull request #1340 from DataDog/fix/missing-thread-instrumentation

bb96846

[PROF-2710] Fix missing instrumentation when using Thread.start/fork

ivoanjo added a commit that referenced this pull request Mar 16, 2021

Merge pull request #1340 from DataDog/fix/missing-thread-instrumentation

57e14a8

[PROF-2710] Fix missing instrumentation when using Thread.start/fork

ivoanjo added a commit that referenced this pull request Mar 29, 2021

Merge pull request #1340 from DataDog/fix/missing-thread-instrumentation

1c78d34

[PROF-2710] Fix missing instrumentation when using Thread.start/fork

ivoanjo added a commit that referenced this pull request Apr 12, 2021

Merge pull request #1340 from DataDog/fix/missing-thread-instrumentation

2e48e43

[PROF-2710] Fix missing instrumentation when using Thread.start/fork

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PROF-2710] Fix missing instrumentation when using Thread.start/fork #1340

[PROF-2710] Fix missing instrumentation when using Thread.start/fork #1340

ivoanjo commented Feb 2, 2021

ivoanjo commented Feb 3, 2021

marcotc Feb 3, 2021

ivoanjo Feb 4, 2021

marcotc left a comment

ivoanjo commented Mar 1, 2021

marcotc left a comment

[PROF-2710] Fix missing instrumentation when using Thread.start/fork #1340

[PROF-2710] Fix missing instrumentation when using Thread.start/fork #1340

Conversation

ivoanjo commented Feb 2, 2021

ivoanjo commented Feb 3, 2021

marcotc Feb 3, 2021

Choose a reason for hiding this comment

ivoanjo Feb 4, 2021

Choose a reason for hiding this comment

marcotc left a comment

Choose a reason for hiding this comment

ivoanjo commented Mar 1, 2021

marcotc left a comment

Choose a reason for hiding this comment