Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cause to error stack trace #1472

Merged
merged 6 commits into from
May 10, 2021
Merged

Add cause to error stack trace #1472

merged 6 commits into from
May 10, 2021

Conversation

marcotc
Copy link
Member

@marcotc marcotc commented Apr 20, 2021

Fixes #1265

This PR changes the format of Ruby exceptions reported in ddtrace spans to match the latest default language format.

ddtrace will now include all exception causes and format the output to exactly match the Ruby native format, introduced in Ruby 2.6.

def root
  raise 'root cause'
end

def wrapper
  root
rescue
  raise 'wrapper layer'
end

wrapper

# /tmp/test.rb:8:in `rescue in wrapper': wrapper layer (RuntimeError)
# 	from /tmp/test.rb:5:in `wrapper'
# 	from /tmp/test.rb:11:in `<main>'
# /tmp/test.rb:2:in `root': root cause (RuntimeError)
# 	from /tmp/test.rb:6:in `wrapper'
# 	from /tmp/test.rb:11:in `<main>'

The stack trace order was reversed between Ruby 2.5 and 2.7, but ddtrace will follow the current Ruby 3.0 format, using natural ordering on all Ruby versions supported.

For recent Ruby versions (>= 2.6), the output is retrieved from a single VM call (Exception#full_message), thus incurring a much lower cost compared to the backported versions for Ruby < 2.6.

Ruby 2.0 does not support Exception#cause, thus this change does not affect the output for that version.

Benchmarks

A benchmark is committed as part of this PR (RSpec test named Datadog::Error.build_from with an exception with a cause benchmark) and here are the results for different versions of Ruby, each taking a different code path in this PR:

Ruby version Top-level error only (before) Full error stack (after) Implementation
2.0 39.933k i/s 39.933k i/s* Incomplete: only top-level error
2.5 92.231k i/s 0.33k i/s Our implementation to generate a full error stack report
2.6 95.634k i/s 8.034k i/s VM method call to retrieve a full error stack report

New amended numbers, after performance improvement work:

Ruby version Top-level error only (before) Full error stack (after) Implementation
2.0 61.005k i/s 51.075k i/s* Incomplete: only top-level error
2.5 93.955k i/s 22.729k i/s Our implementation to generate a full error stack report
2.6 97.763k i/s 21.709k i/s Our implementation to generate a full error stack report
2.6 97.763k i/s 10.360k i/s Exception#full_message

Ruby 2.0 stays relatively unchanged from before: it performs the same work in both implementations.

Ruby 2.6 to 3.0 (and beyond) invokes Exception#full_message, which generates the full stack trace natively. This method is roughly 11 times slower than before, but provides all the information required for stack trace analysis. Ruby 2.6 to 3.0 have moved to our Ruby-based implementation, due to better performance, explained below.

Ruby 2.1 to 2.5 uses a Ruby-based implementation to generate the same output as Exception#full_message. This allows for a consistent format between all versions of Ruby (except 2.0), at the expense of performance. This method is roughly 4-4.5 times slower than before.

Given this method should only happen on exceptional code paths, having a richer output would be preferable to a more performance, but less informative dataset.

@marcotc marcotc requested a review from a team April 20, 2021 22:31
@marcotc marcotc self-assigned this Apr 20, 2021
@marcotc marcotc force-pushed the error-stack branch 2 times, most recently from a6a1279 to 360db9b Compare April 20, 2021 22:38
@marcotc marcotc added this to In progress in Active work Apr 20, 2021
@marcotc marcotc marked this pull request as draft April 20, 2021 23:09
@marcotc marcotc removed the request for review from a team April 20, 2021 23:09
@delner delner added this to Long Term Priority in Planning Apr 21, 2021
@marcotc marcotc force-pushed the error-stack branch 2 times, most recently from b2e14a4 to 5a3a3e6 Compare April 21, 2021 19:35
@marcotc marcotc added the feature-request A request for a new feature or change to an existing one label Apr 30, 2021
@marcotc marcotc marked this pull request as ready for review April 30, 2021 18:08
@marcotc marcotc moved this from In progress to In review in Active work Apr 30, 2021
@marcotc marcotc requested review from lloeki and a team May 3, 2021 16:48
@delner delner removed this from Long Term Priority in Planning May 3, 2021
Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code change itself looks reasonable, but the non-trivial performance change is a bit worrying. Exceptions should be exceptional, but it wouldn't be the first time I saw services that were chugging along with many, many exceptions, and thus a slowdown of 3 orders of magnitude can have a non-trivial impact. I think it's worth waiting for a second option on this, /cc @delner ?

For what it's worth, on my machine I get 10.513k (± 8.2%) i/s on Ruby 2.5 for our handcrafted implementation. And I took a stab at making it a bit faster (diff below) and got it to 31.659k (± 4.0%) i/s. So those 0.33k i/s you reported seem to be quite slow, let me know what you get with:

diff --git a/lib/ddtrace/error.rb b/lib/ddtrace/error.rb
index 896a7ce3..9dbefb91 100644
--- a/lib/ddtrace/error.rb
+++ b/lib/ddtrace/error.rb
@@ -30,23 +30,26 @@ module Datadog
           backtrace.join("\n") if backtrace
         end
       elsif Gem::Version.new(RUBY_VERSION) < Gem::Version.new('2.6.0')
+        puts "USING FULL IMPL"
+
         # Backports Ruby >= 2.6 output to older versions.
         # This exposes the 'cause' chain in the stack trace,
         # allowing for complete visibility of the error stack.
         def full_backtrace(ex)
-          backtrace = []
+          backtrace = String.new
           backtrace_for(ex, backtrace)

           # Avoid circular causes
-          causes = Set.new
-          causes.add(ex)
+          causes = Hash.new
+          causes[ex] = true

-          while (cause = ex.cause) && !causes.include?(cause)
+          while (cause = ex.cause) && !causes.key?(cause)
             backtrace_for(cause, backtrace)
-            causes.add(cause)
+            causes[cause] = true
           end

-          backtrace.join("\n")
+
+          backtrace
         end

         # Outputs the following format for exceptions:
@@ -61,23 +64,21 @@ module Datadog
           trace = ex.backtrace
           return unless trace

-          error_line, *caller_lines = trace
-
-          if error_line
+          if trace[0]
             # Add Exception information to error line
-            error_line = "#{error_line}: #{ex.message} (#{ex.class})"
-            backtrace << error_line
+            backtrace << "#{trace[0]}: #{ex.message} (#{ex.class})"
           end

-          if caller_lines
+          if trace[1]
             # Ident stack trace for caller lines, to separate
             # them from the main error lines.
-            caller_lines = caller_lines.map do |line|
-              "        from #{line}"
+            trace[1..-1].each do |line|
+              backtrace << "\n from "
+              backtrace << line
             end
-
-            backtrace.concat(caller_lines)
           end
+
+          backtrace << "\n"
         end

Comment on lines 60 to 64
def backtrace_for(ex, backtrace)
trace = ex.backtrace
return unless trace

error_line, *caller_lines = trace
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: The names backtrace and trace here are slightly confusing at first glance. Perhaps accumulated_backtrace instead of just backtrace to make it clearer what it is?

@ivoanjo ivoanjo requested a review from delner May 5, 2021 10:19
@lloeki
Copy link
Contributor

lloeki commented May 5, 2021

We talked briefly with @marcotc about the performance, and we thought the stack trace with causes is really adding value, but the cost from the microbenchmark is worrying indeed, so we thought that could be a configurable.

TBH, I 'm not fond of inflating the inventory of configurables too much. But that's a contingency, and even an option needs a default, so we're back to square one and have to decide which behaviour to have on the older rubies.
The microbenchmark highlights a performance difference but doesn't quite help to really decide.

For now I probably lack some vision of how to translate this benchmark result into a real impact within an app. Maybe we could measure that and gather the impact relative to a whole app?

Without that hard-ish data I'd go with preserving the old behaviour and perf profile, and giving the user an opportunity to opt in the new, richer behaviour by a) upgrading Ruby or b) activating the configurable (at a then known cost/benefit for him). In terms of overall experience this also has the benefit of not having a opt-out configurable that would stay if the user forgets to remove it upon them ultimately upgrading Ruby.

@marcotc
Copy link
Member Author

marcotc commented May 7, 2021

Thank you guys for pressing the issue!
I follow @ivoanjo's suggestions, and added a few more minor optimizations. I believe that we now have an implementation that has reasonable performance impact, and we (hopefully 🤞) don't have to make hard decisions on how to safely roll this out anymore.

Here are the new numbers:

Ruby version Top-level error only (before) Full error stack (after) Implementation
2.0 61.005k i/s 51.075k i/s* Incomplete: only top-level error
2.5 93.955k i/s 22.729k i/s Our implementation to generate a full error stack report
2.6 97.763k i/s 21.709k i/s Our implementation to generate a full error stack report
2.6 97.763k i/s 10.360k i/s Exception#full_message

The new implementation runs are around 24% of the original speed, but provides full error context.

Interestingly, the native implementation is 2x slower than our Ruby-based one 😮, so I extended the reach of our custom implementation to all Ruby versions we support.

I'll take a look into the CRuby implementation some time in the future and see if there are any gains to be had upstream.

IMO, I'm happy with these numbers, given the large visibility increase on error for the performance impact.

Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that this version is good enough™️ for on always. Also pushing our implementation of full_message upstream seems a nice contribution: faster, and easier to maintain :) :)

Comment on lines +77 to +89
wrapper_error_message = /error_spec.rb:\d+:in.*wrapper': wrapper layer \(RuntimeError\)/
caller_stack = /from.*error_spec.rb:\d+:in `call'/
root_error_message = /error_spec.rb:\d+:in.*root': root cause \(RuntimeError\)/
wrapper_stack = /from.*error_spec.rb:\d+:in `wrapper'/

expect(error.backtrace)
.to match(/
#{wrapper_error_message}.*
#{caller_stack}.*
#{root_error_message}.*
#{wrapper_stack}.*
#{caller_stack}.*
/mx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MInor: There's nothing wrong with this current version, but TBH I'd favor being a bit more strict/hardcoded here. If we were to move the example code to a separate file, we could couple on the exact lines and filename, and expect here on a much more exact string.

I never like too fancy assertions ahaha :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm taking the free pass this time: I think the assertion is not the clearest, but neither it is unreadable. Assertion errors also were legible from my experience developing this feature.

I agree that a separate file would allow for fixed line numbers, but it creates context switch between test files. Here we don't have that context switch, but we do have less legible test assertions. ⚖️

@marcotc marcotc merged commit bb36490 into master May 10, 2021
Active work automation moved this from In review to Merged & awaiting release May 10, 2021
@marcotc marcotc deleted the error-stack branch May 10, 2021 18:13
@github-actions github-actions bot added this to the 0.49.0 milestone May 10, 2021
@ivoanjo ivoanjo moved this from Merged & awaiting release to Released in Active work Aug 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request A request for a new feature or change to an existing one
Projects
Active work
  
Released
Development

Successfully merging this pull request may close these issues.

Error cause is not available in traces
3 participants