Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROF-2710] Fix missing instrumentation when using Thread.start/fork #1340

Merged
merged 3 commits into from Mar 8, 2021

Conversation

ivoanjo
Copy link
Member

@ivoanjo ivoanjo commented Feb 2, 2021

When a Thread is started with either start or fork (they are aliases), then our CThread#initialize patch never gets run and thus these threads are not properly instrumented.

This API is used in Ruby's own built-in timeout gem (see https://github.com/ruby/timeout/blob/14d421252b54027108da183783a7a26d0cf5aac1/lib/timeout.rb#L87) although I'm not entirely sure how common it is in the community.

The behavior of skipping #initialize is by design, but it seems a rather weird API; here's what the docs say:

Basically the same as ::new. However, if class Thread is subclassed,
then calling start in that subclass will not invoke the subclass's
initialize method.

(https://rubyapi.org/3.0/o/thread#method-c-start)

To fix this, we introduce a new module, which wraps these two operations to provide the expected behavior of installing the CPU instrumentation.

NOTE:
Due to the way the missing cpu time instrumentation warning works, Thread.start/fork will still trigger it, even if Threads are
instrumented correctly.

Since #1310 is still under review, for now I'm leaving it as a FIXME (and marking this PR as a draft), but this should be fixed before merging.

(I can't think of a 100% accurate way of fixing it, so possibly the fix will be along the lines of "track how long the thread
has lived, and if it was just created, don't warn yet -- wait a few seconds before considering it as missing instrumentation").

Hopefully at some point #1333 will become a better solution to all this messing around.

@ivoanjo ivoanjo requested a review from delner February 2, 2021 12:23
@ivoanjo ivoanjo force-pushed the ivoanjo/warn-on-missing-cpu-time-instrumentation branch from 02d3805 to 8ca9c84 Compare February 2, 2021 12:55
@ivoanjo ivoanjo force-pushed the fix/missing-thread-instrumentation branch from 988bf34 to 7aaf243 Compare February 2, 2021 13:00
@ivoanjo
Copy link
Member Author

ivoanjo commented Feb 3, 2021

Note: CI is broken on Ruby 3.0 because of the issue fixed in #1345 . I'll rebase this PR after that one gets merged to add ruby2_keywords to start, which fixes the breaking test.

@@ -187,6 +187,9 @@ def build_backtrace_location(_id, base_label, lineno, path)

private

# FIXME: Threads started with `start`/`fork` can wrongly trigger this warning, because unlike with Thread#new
# we can't distinguish between missing instrumentation and "thread was just started and hasn't had time to
# run update_native_ids"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, over the lifetime of an application this will likely get triggered a few times.

Would it make sense to give a thread one opportunity to fulfill @warned_about_missing_cpu_time_instrumentation, something like setting a sentinel value in the thread object that tracks "we've given you one chance to run update_native_ids, but next time I check, I'll warn if you didn't". But this suggestion has the side-effect of missing the warning on short-lived threads.

What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. It's a shame that it ends up making the detection miss a few cases, but hopefully we're mostly done with the whack-a-mole game so it's not going to come up often.

I'll wait until #1310 gets merged to add this change, as there's already quite a bit of branch dependencies around, and after that I'll unmark as draft.

Copy link
Member

@marcotc marcotc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good!
I left a comment for discussion, but I don't think it blocks this specific PR.

@ivoanjo ivoanjo force-pushed the ivoanjo/warn-on-missing-cpu-time-instrumentation branch 2 times, most recently from ed2e124 to c614edc Compare February 11, 2021 11:13
Base automatically changed from ivoanjo/warn-on-missing-cpu-time-instrumentation to feature/profiling February 11, 2021 11:55
When a `Thread` is started with either `start` or `fork` (they are
aliases), then our `CThread#initialize` patch never gets run and thus
these threads are not properly instrumented.

This API is used in Ruby's own built-in `timeout` gem (see
<https://github.com/ruby/timeout/blob/14d421252b54027108da183783a7a26d0cf5aac1/lib/timeout.rb#L87>)
although I'm not entirely sure how common it is in the community.

The behavior of skipping `#initialize` is by design, but it seems a
rather weird API; here's what the docs say:

> Basically the same as ::new. However, if class Thread is subclassed,
> then calling start in that subclass will not invoke the subclass's
> initialize method.

(<https://rubyapi.org/3.0/o/thread#method-c-start>)

To fix this, we introduce a new module, which wraps these two
operations to provide the expected behavior of installing the CPU
instrumentation.

**NOTE**:
Due to the way the missing cpu time instrumentation warning works,
Thread.start/fork will still trigger it, even if Threads are
instrumented correctly.

Since #1310 is still
under review, for now I'm leaving it as a `FIXME`, but this
should be fixed before merging.

(I can't think of a 100% accurate way of fixing it, so possibly
the fix will be along the lines of "track how long the thread
has lived, and if it was just created, don't warn yet -- wait
a few seconds before considering it as missing instrumentation").

Hopefully at some point
#1333
will become a better solution to all this messing around.
The fix here is exactly the same as the one for Thread#initialize:
use the `ruby2_keywords` as described in
<https://eregon.me/blog/2021/02/13/correct-delegation-in-ruby-2-27-3.html>.
In its current state, we discovered that the missing instrumentation
warning could be triggered when threads were started with
`Thread.start`/`Thread.fork`, even when our instrumentation was
available, due to a race between the start of a thread and our
sampling.

We had some ideas to fix it -- something along the lines of "tag the
thread with a timestamp, and don't warn until a reasonable amount of
time has passed, so that the thread will have time to run the
initialization".

But when working on another task, I realized that threads created via
C extensions (such as ethon, that relies on FFI) also skip our
instrumentation. For these, we do not yet have any solution (but
there's a couple of ideas on how we could go about gathering the
thread id for these threads).

Thus, because we can't really tell if a thread was started by a C
extension, it no longer makes sense to make this warning a
user-visible message that asks customers to file bug reports to us,
because in those cases we won't be able to do anything about it.

And finally, if the warning is not going to be user-visible anymore,
I don't think it's worth the extra effort/complexity to solve the
possible race when using `Thread.start`/`Thread.fork`.

In the end, this warning becomes a "hint" we'll keep around for
development/debugging.
@ivoanjo ivoanjo force-pushed the fix/missing-thread-instrumentation branch from 7aaf243 to fa8c193 Compare March 1, 2021 11:45
@ivoanjo ivoanjo marked this pull request as ready for review March 1, 2021 11:46
@ivoanjo ivoanjo requested a review from a team March 1, 2021 11:46
@ivoanjo
Copy link
Member Author

ivoanjo commented Mar 1, 2021

Update on this one:

  • I've rebased this branch on top of the current feature/profiling state
  • I've integrated the Ruby 3 fixes
  • I've hid the instrumentation missing message, rather than fixing it. See the commit message in fa8c193 for the rationale behind my decision.

Thus, it's ready for review again! @marcotc or @delner can I bother one of you for a second look? :)

Copy link
Member

@marcotc marcotc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@ivoanjo ivoanjo merged commit ff10623 into feature/profiling Mar 8, 2021
@ivoanjo ivoanjo deleted the fix/missing-thread-instrumentation branch March 8, 2021 09:13
ivoanjo added a commit that referenced this pull request Mar 11, 2021
[PROF-2710] Fix missing instrumentation when using Thread.start/fork
ivoanjo added a commit that referenced this pull request Mar 16, 2021
[PROF-2710] Fix missing instrumentation when using Thread.start/fork
ivoanjo added a commit that referenced this pull request Mar 29, 2021
[PROF-2710] Fix missing instrumentation when using Thread.start/fork
ivoanjo added a commit that referenced this pull request Apr 12, 2021
[PROF-2710] Fix missing instrumentation when using Thread.start/fork
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants