-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cause to error stack trace #1472
Conversation
a6a1279
to
360db9b
Compare
b2e14a4
to
5a3a3e6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code change itself looks reasonable, but the non-trivial performance change is a bit worrying. Exceptions should be exceptional, but it wouldn't be the first time I saw services that were chugging along with many, many exceptions, and thus a slowdown of 3 orders of magnitude can have a non-trivial impact. I think it's worth waiting for a second option on this, /cc @delner ?
For what it's worth, on my machine I get 10.513k (± 8.2%) i/s on Ruby 2.5 for our handcrafted implementation. And I took a stab at making it a bit faster (diff below) and got it to 31.659k (± 4.0%) i/s. So those 0.33k i/s you reported seem to be quite slow, let me know what you get with:
diff --git a/lib/ddtrace/error.rb b/lib/ddtrace/error.rb
index 896a7ce3..9dbefb91 100644
--- a/lib/ddtrace/error.rb
+++ b/lib/ddtrace/error.rb
@@ -30,23 +30,26 @@ module Datadog
backtrace.join("\n") if backtrace
end
elsif Gem::Version.new(RUBY_VERSION) < Gem::Version.new('2.6.0')
+ puts "USING FULL IMPL"
+
# Backports Ruby >= 2.6 output to older versions.
# This exposes the 'cause' chain in the stack trace,
# allowing for complete visibility of the error stack.
def full_backtrace(ex)
- backtrace = []
+ backtrace = String.new
backtrace_for(ex, backtrace)
# Avoid circular causes
- causes = Set.new
- causes.add(ex)
+ causes = Hash.new
+ causes[ex] = true
- while (cause = ex.cause) && !causes.include?(cause)
+ while (cause = ex.cause) && !causes.key?(cause)
backtrace_for(cause, backtrace)
- causes.add(cause)
+ causes[cause] = true
end
- backtrace.join("\n")
+
+ backtrace
end
# Outputs the following format for exceptions:
@@ -61,23 +64,21 @@ module Datadog
trace = ex.backtrace
return unless trace
- error_line, *caller_lines = trace
-
- if error_line
+ if trace[0]
# Add Exception information to error line
- error_line = "#{error_line}: #{ex.message} (#{ex.class})"
- backtrace << error_line
+ backtrace << "#{trace[0]}: #{ex.message} (#{ex.class})"
end
- if caller_lines
+ if trace[1]
# Ident stack trace for caller lines, to separate
# them from the main error lines.
- caller_lines = caller_lines.map do |line|
- " from #{line}"
+ trace[1..-1].each do |line|
+ backtrace << "\n from "
+ backtrace << line
end
-
- backtrace.concat(caller_lines)
end
+
+ backtrace << "\n"
end
lib/ddtrace/error.rb
Outdated
def backtrace_for(ex, backtrace) | ||
trace = ex.backtrace | ||
return unless trace | ||
|
||
error_line, *caller_lines = trace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: The names backtrace and trace here are slightly confusing at first glance. Perhaps accumulated_backtrace
instead of just backtrace
to make it clearer what it is?
We talked briefly with @marcotc about the performance, and we thought the stack trace with causes is really adding value, but the cost from the microbenchmark is worrying indeed, so we thought that could be a configurable. TBH, I 'm not fond of inflating the inventory of configurables too much. But that's a contingency, and even an option needs a default, so we're back to square one and have to decide which behaviour to have on the older rubies. For now I probably lack some vision of how to translate this benchmark result into a real impact within an app. Maybe we could measure that and gather the impact relative to a whole app? Without that hard-ish data I'd go with preserving the old behaviour and perf profile, and giving the user an opportunity to opt in the new, richer behaviour by a) upgrading Ruby or b) activating the configurable (at a then known cost/benefit for him). In terms of overall experience this also has the benefit of not having a opt-out configurable that would stay if the user forgets to remove it upon them ultimately upgrading Ruby. |
Thank you guys for pressing the issue! Here are the new numbers:
The new implementation runs are around 24% of the original speed, but provides full error context. Interestingly, the native implementation is 2x slower than our Ruby-based one 😮, so I extended the reach of our custom implementation to all Ruby versions we support. I'll take a look into the CRuby implementation some time in the future and see if there are any gains to be had upstream. IMO, I'm happy with these numbers, given the large visibility increase on error for the performance impact. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that this version is good enough™️ for on always. Also pushing our implementation of full_message
upstream seems a nice contribution: faster, and easier to maintain :) :)
wrapper_error_message = /error_spec.rb:\d+:in.*wrapper': wrapper layer \(RuntimeError\)/ | ||
caller_stack = /from.*error_spec.rb:\d+:in `call'/ | ||
root_error_message = /error_spec.rb:\d+:in.*root': root cause \(RuntimeError\)/ | ||
wrapper_stack = /from.*error_spec.rb:\d+:in `wrapper'/ | ||
|
||
expect(error.backtrace) | ||
.to match(/ | ||
#{wrapper_error_message}.* | ||
#{caller_stack}.* | ||
#{root_error_message}.* | ||
#{wrapper_stack}.* | ||
#{caller_stack}.* | ||
/mx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MInor: There's nothing wrong with this current version, but TBH I'd favor being a bit more strict/hardcoded here. If we were to move the example code to a separate file, we could couple on the exact lines and filename, and expect here on a much more exact string.
I never like too fancy assertions ahaha :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm taking the free pass this time: I think the assertion is not the clearest, but neither it is unreadable. Assertion errors also were legible from my experience developing this feature.
I agree that a separate file would allow for fixed line numbers, but it creates context switch between test files. Here we don't have that context switch, but we do have less legible test assertions. ⚖️
Fixes #1265
This PR changes the format of Ruby exceptions reported in
ddtrace
spans to match the latest default language format.ddtrace
will now include all exception causes and format the output to exactly match the Ruby native format, introduced in Ruby 2.6.The stack trace order was reversed between Ruby 2.5 and 2.7, but
ddtrace
will follow the current Ruby 3.0 format, using natural ordering on all Ruby versions supported.For recent Ruby versions (>= 2.6), the output is retrieved from a single VM call (
Exception#full_message
), thus incurring a much lower cost compared to the backported versions for Ruby < 2.6.Ruby 2.0 does not support
Exception#cause
, thus this change does not affect the output for that version.Benchmarks
A benchmark is committed as part of this PR (RSpec test named
Datadog::Error.build_from with an exception with a cause benchmark
) and here are the results for different versions of Ruby, each taking a different code path in this PR:39.933k i/s39.933k i/s*92.231k i/s0.33k i/s95.634k i/s8.034k i/sNew amended numbers, after performance improvement work:
Exception#full_message
Ruby 2.0 stays relatively unchanged from before: it performs the same work in both implementations.
Ruby 2.6 to 3.0 (and beyond) invokesRuby 2.6 to 3.0 have moved to our Ruby-based implementation, due to better performance, explained below.Exception#full_message
, which generates the full stack trace natively. This method is roughly 11 times slower than before, but provides all the information required for stack trace analysis.Ruby 2.1 to 2.5 uses a Ruby-based implementation to generate the same output as
Exception#full_message
. This allows for a consistent format between all versions of Ruby (except 2.0), at the expense of performance. This method is roughly 4-4.5 times slower than before.Given this method should only happen on exceptional code paths, having a richer output would be preferable to a more performance, but less informative dataset.