You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
tl;dr TracePoint logger is too slow to compute caller filepaths
Rotoscope is implemented in C using the Ruby TracePoint API. By default, the :call and :return Ruby trace events allow you to retrieve the filepath and line number (via rb_tracearg_lineno and rb_tracearg_path in C). While this is quite fast, it has the unfortunate side effect of pointing to the callee of the method (i.e. where the method is received), rather than the caller.
The path of the caller is highly important to maintain an accurate "blacklist/whitelist" option so we don't log every single method invocation inside of our framework or hot paths that we don't care to know the internals of. Introducing a blacklist dramatically improves the performance abilities of tracing method invocations.
Solution Attempts
In an attempt to remain performant and accurate, I examined a rudimentary implementation where I installed a :line TracePoint (performant, hah) and kept the last read line in memory. This does technically work but it isn't easily possible to use this to determine the return of a method, since it will only know the deepest level of the stack, not when a frame pop occurs.
At the suggestion of @dylanahsmith, I tried another variation where I invoked Kernel.caller_locations to compute the backtrace of the current trace, and extract the filepath/lineno. This does produce 100% correct results, but is incredibly slow in comparison to the other attempts.
Using the tracearg methods, I'm able to complete a Shopify/shopify CI run in ~10 minutes with Rotoscope wrapping all tests. Using caller_locations and bumping up the CI timeouts lets most of the containers finish in 2.5 hours. Boo. This is the version currently in the master branch.
I've compiled a table below where you can see the retreived caller results of the different attempts:
We can shortcut the complexity of Kernel.caller_locations on C-invocations and instead use the default TracePoint methods, since the top-level stack won't be able to point to the method definition, and it falls back to the caller instead.
Next Steps
Examining how rb_tracearg_filepath is implemented, we can see rb_vm_get_ruby_level_next_cfp is invoked. Looking at the definition of the method, the logic is relatively simple: increment the frame pointer until a Ruby frame is found. In our case, we want to find a Ruby frame, and return the frame after that one. While the method is not static (a miracle?!), it is unfortunately declared inside of internal.h and uses a significant amount of internal data structures that are prohibitively expensive to copy the definitions of as a workaround (e.g. rb_control_frame_t has a field for rb_iseq_t, which itself holds a field for rb_iseq_constant_body, and that struct looks like this.
At this point, it's not particularly obvious to me how I can get around this performance problem without forking Ruby to implement a method that returns the second Ruby VM frame, but I'd like to avoid that if possible.
I've added a comment to the already-open issue in the Ruby bugtracker that detailed this problem, but haven't received any traction on it. The original issue was opened over four years ago.
Even though rb_vm_get_ruby_level_next_cfp isn't static, it still isn't an export symbol
nm -gU `which ruby`
and the lack of static just means that the function is accessible from other files in the ruby source code.
However, that symbol is still available for debugging purposes. Debug information might be the least fragile way to hack around this limitation. We just need to make sure debugging information isn't stripped in CI, which is the only place we need this for.
Problem
tl;dr TracePoint logger is too slow to compute caller filepaths
Rotoscope is implemented in C using the Ruby TracePoint API. By default, the
:call
and:return
Ruby trace events allow you to retrieve the filepath and line number (via rb_tracearg_lineno and rb_tracearg_path in C). While this is quite fast, it has the unfortunate side effect of pointing to the callee of the method (i.e. where the method is received), rather than the caller.The path of the caller is highly important to maintain an accurate "blacklist/whitelist" option so we don't log every single method invocation inside of our framework or hot paths that we don't care to know the internals of. Introducing a blacklist dramatically improves the performance abilities of tracing method invocations.
Solution Attempts
In an attempt to remain performant and accurate, I examined a rudimentary implementation where I installed a
:line
TracePoint (performant, hah) and kept the last read line in memory. This does technically work but it isn't easily possible to use this to determine the return of a method, since it will only know the deepest level of the stack, not when a frame pop occurs.At the suggestion of @dylanahsmith, I tried another variation where I invoked
Kernel.caller_locations
to compute the backtrace of the current trace, and extract the filepath/lineno. This does produce 100% correct results, but is incredibly slow in comparison to the other attempts.Using the tracearg methods, I'm able to complete a Shopify/shopify CI run in ~10 minutes with Rotoscope wrapping all tests. Using
caller_locations
and bumping up the CI timeouts lets most of the containers finish in 2.5 hours. Boo. This is the version currently in themaster
branch.I've compiled a table below where you can see the retreived caller results of the different attempts:
We can shortcut the complexity of
Kernel.caller_locations
on C-invocations and instead use the default TracePoint methods, since the top-level stack won't be able to point to the method definition, and it falls back to the caller instead.Next Steps
Examining how
rb_tracearg_filepath
is implemented, we can seerb_vm_get_ruby_level_next_cfp
is invoked. Looking at the definition of the method, the logic is relatively simple: increment the frame pointer until a Ruby frame is found. In our case, we want to find a Ruby frame, and return the frame after that one. While the method is notstatic
(a miracle?!), it is unfortunately declared inside ofinternal.h
and uses a significant amount of internal data structures that are prohibitively expensive to copy the definitions of as a workaround (e.g.rb_control_frame_t
has a field forrb_iseq_t
, which itself holds a field forrb_iseq_constant_body
, and that struct looks like this.At this point, it's not particularly obvious to me how I can get around this performance problem without forking Ruby to implement a method that returns the second Ruby VM frame, but I'd like to avoid that if possible.
I've added a comment to the already-open issue in the Ruby bugtracker that detailed this problem, but haven't received any traction on it. The original issue was opened over four years ago.
/cc @dylanahsmith @airhorns @nick-mcdonald @stephenminded
The text was updated successfully, but these errors were encountered: