New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PROF-2710] Fix missing instrumentation when using Thread.start/fork #1340
Conversation
02d3805
to
8ca9c84
Compare
988bf34
to
7aaf243
Compare
Note: CI is broken on Ruby 3.0 because of the issue fixed in #1345 . I'll rebase this PR after that one gets merged to add |
@@ -187,6 +187,9 @@ def build_backtrace_location(_id, base_label, lineno, path) | |||
|
|||
private | |||
|
|||
# FIXME: Threads started with `start`/`fork` can wrongly trigger this warning, because unlike with Thread#new | |||
# we can't distinguish between missing instrumentation and "thread was just started and hasn't had time to | |||
# run update_native_ids" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, over the lifetime of an application this will likely get triggered a few times.
Would it make sense to give a thread one opportunity to fulfill @warned_about_missing_cpu_time_instrumentation
, something like setting a sentinel value in the thread object that tracks "we've given you one chance to run update_native_ids, but next time I check, I'll warn if you didn't". But this suggestion has the side-effect of missing the warning on short-lived threads.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. It's a shame that it ends up making the detection miss a few cases, but hopefully we're mostly done with the whack-a-mole game so it's not going to come up often.
I'll wait until #1310 gets merged to add this change, as there's already quite a bit of branch dependencies around, and after that I'll unmark as draft.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very good!
I left a comment for discussion, but I don't think it blocks this specific PR.
ed2e124
to
c614edc
Compare
When a `Thread` is started with either `start` or `fork` (they are aliases), then our `CThread#initialize` patch never gets run and thus these threads are not properly instrumented. This API is used in Ruby's own built-in `timeout` gem (see <https://github.com/ruby/timeout/blob/14d421252b54027108da183783a7a26d0cf5aac1/lib/timeout.rb#L87>) although I'm not entirely sure how common it is in the community. The behavior of skipping `#initialize` is by design, but it seems a rather weird API; here's what the docs say: > Basically the same as ::new. However, if class Thread is subclassed, > then calling start in that subclass will not invoke the subclass's > initialize method. (<https://rubyapi.org/3.0/o/thread#method-c-start>) To fix this, we introduce a new module, which wraps these two operations to provide the expected behavior of installing the CPU instrumentation. **NOTE**: Due to the way the missing cpu time instrumentation warning works, Thread.start/fork will still trigger it, even if Threads are instrumented correctly. Since #1310 is still under review, for now I'm leaving it as a `FIXME`, but this should be fixed before merging. (I can't think of a 100% accurate way of fixing it, so possibly the fix will be along the lines of "track how long the thread has lived, and if it was just created, don't warn yet -- wait a few seconds before considering it as missing instrumentation"). Hopefully at some point #1333 will become a better solution to all this messing around.
The fix here is exactly the same as the one for Thread#initialize: use the `ruby2_keywords` as described in <https://eregon.me/blog/2021/02/13/correct-delegation-in-ruby-2-27-3.html>.
In its current state, we discovered that the missing instrumentation warning could be triggered when threads were started with `Thread.start`/`Thread.fork`, even when our instrumentation was available, due to a race between the start of a thread and our sampling. We had some ideas to fix it -- something along the lines of "tag the thread with a timestamp, and don't warn until a reasonable amount of time has passed, so that the thread will have time to run the initialization". But when working on another task, I realized that threads created via C extensions (such as ethon, that relies on FFI) also skip our instrumentation. For these, we do not yet have any solution (but there's a couple of ideas on how we could go about gathering the thread id for these threads). Thus, because we can't really tell if a thread was started by a C extension, it no longer makes sense to make this warning a user-visible message that asks customers to file bug reports to us, because in those cases we won't be able to do anything about it. And finally, if the warning is not going to be user-visible anymore, I don't think it's worth the extra effort/complexity to solve the possible race when using `Thread.start`/`Thread.fork`. In the end, this warning becomes a "hint" we'll keep around for development/debugging.
7aaf243
to
fa8c193
Compare
Update on this one:
Thus, it's ready for review again! @marcotc or @delner can I bother one of you for a second look? :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[PROF-2710] Fix missing instrumentation when using Thread.start/fork
[PROF-2710] Fix missing instrumentation when using Thread.start/fork
[PROF-2710] Fix missing instrumentation when using Thread.start/fork
[PROF-2710] Fix missing instrumentation when using Thread.start/fork
When a
Thread
is started with eitherstart
orfork
(they are aliases), then ourCThread#initialize
patch never gets run and thus these threads are not properly instrumented.This API is used in Ruby's own built-in
timeout
gem (see https://github.com/ruby/timeout/blob/14d421252b54027108da183783a7a26d0cf5aac1/lib/timeout.rb#L87) although I'm not entirely sure how common it is in the community.The behavior of skipping
#initialize
is by design, but it seems a rather weird API; here's what the docs say:(https://rubyapi.org/3.0/o/thread#method-c-start)
To fix this, we introduce a new module, which wraps these two operations to provide the expected behavior of installing the CPU instrumentation.
NOTE:
Due to the way the missing cpu time instrumentation warning works, Thread.start/fork will still trigger it, even if Threads are
instrumented correctly.
Since #1310 is still under review, for now I'm leaving it as a
FIXME
(and marking this PR as a draft), but this should be fixed before merging.(I can't think of a 100% accurate way of fixing it, so possibly the fix will be along the lines of "track how long the thread
has lived, and if it was just created, don't warn yet -- wait a few seconds before considering it as missing instrumentation").
Hopefully at some point #1333 will become a better solution to all this messing around.