Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code extraction performance improvements #604

Merged
merged 20 commits into from Jul 23, 2020

Conversation

imjoehaines
Copy link
Member

@imjoehaines imjoehaines commented Jul 15, 2020

Goal

This PR improves the performance of extracting code from files in a stacktrace

The performance improvements are especially relevant when a stacktrace has the same file multiple times (e.g. when chaining private methods) and when the files are long

Exactly how much better this is is impossible to say without looking at a specific stacktrace — it can be more than a 3x speedup if it mostly consists of the same few files and they are very long, but every stacktrace is different 🌈. I didn't observe any noticable slow down when a stackrace contained exclusively different files, but the new code does involve a bit more work in the worst case scenario

Changeset

Added

  • Code extraction is now handled by a CodeExtractor class. This exists to keep track of which files in the stacktrace we need to get code from (we will sometimes remove files from the processed stacktrace) and so that it can do a single pass per file, extracting all the regions of code that it needs in one go
  • The Stacktraceclass is now a module as it was always basically one method and was immediately converted to an array when it was called
  • A bunch of new tests and fixtures for code extraction and some untested parts of the stacktrace processing

Linked issues

Related to #481

Review

For the submitter, initial self-review:

  • Commented on code changes inline explain the reasoning behind the approach
  • Reviewed the test cases added for completeness and possible points for discussion
  • A changelog entry was added for the goal of this pull request
  • Check the scope of the changeset - is everything in the diff required for the pull request?
  • This pull request is ready for:
    • Initial review of the intended approach, not yet feature complete
    • Structural review of the classes, functions, and properties modified
    • Final review

For the pull request reviewer(s), this changeset has been reviewed for:

  • Consistency across platforms for structures or concepts added or modified
  • Consistency between the changeset and the goal stated above
  • Internal consistency with the rest of the library - is there any overlap between existing interfaces and any which have been added?
  • Usage friction - is the proposed change in usage cumbersome or complicated?
  • Performance and complexity - are there any cases of unexpected O(n^3) when iterating, recursing, flat mapping, etc?
  • Concurrency concerns - if components are accessed asynchronously, what issues will arise
  • Thoroughness of added tests and any missing edge cases
  • Idiomatic use of the language

This also uses 'to_a' to ensure the order of code is correct. Hashes are
compared without regard for order because usually the order doesn't
matter. I don't think the API cares about the order because we explicitly
number the lines, but it's useful to validate our code is doing what is
expected
We now open each file in a stacktrace once to save opening the same file
multiple times if it is in the stacktrace multiple times. It's quite
common to have the same file in multiple stacktrace lines, e.g. when
a private method raises. The performance gain is especially noticeable
when a a really long file is repeated in the stacktrace as we have to
iterate through each line to find the ones we're interested in

e.g. a trace may look like this:

'/some/file line 5 in "foo"'
'/some/file line 10 in "bar"'
'/some/file line 15 in "baz"'

Previously we would:
1. Open the file for frame 1
2. Read each line up to line 11
3. Trim lines down to lines 2-8
4. Open the file for frame 2
5. Read each line up to line 14
6. Trim lines down to lines 7-13
7. Open the file for frame 3
8. Read each line up to line 19
9. Trim lines down to lines 12-18

Now we:
1. Open the file once
2. Read from line 1 until line 19
3. Add the lines each frame cares about:
    a. Lines 1-11 for frame 1
    b. Lines 3-14 for frame 2
    c. Lines 8-19 for frame 3
3. Trim lines down:
    a. To lines 2-8 for frame 1
    b. To lines 7-13 for frame 2
    c. To lines 12-18 for frame 3

Trimming is necessary because we need to record more lines than we
actually want to keep, so that we can always record 7 lines of context,
even when the frame starts at line 1 or the last line in the file

The new way is still quite inefficient and requires iterating over the
stack frames multiple times, however this passes all tests and can be
improved after profiling it against the old method
There's no Stacktrace.new anymore, so we get one less frame
These will end up being converted into unreadable paths, e.g.
"spec/fixtures/crashes/file1.rb" is readable from the root of
bugsnag-ruby but will be stripped to "file1.rb" by the project_root,
which is unreadable as it's relative to a different directory

This test ensures we read files based on the expanded file path,
rather than the final path we send to the Bugsnag API
This exposed a bug in the library, which may have never been reported
because it was never hit or because it's quite subtle (the stacktrace
would be truncated, but that may not be obvious)

Previously we were using 'return' which returns from the method,
not the 'map' so we'd remove the entire stacktrace if one of the
files ended up empty

It's possible this condition is never hit and therefore no one has
encountered the bug to report it, but it's easy enough to fix and
now there's a test to ensure it does the right thing
Most other notifiers set 'code' to null/nil in this case and there's
not really a reason not to. If we omit it entirely we _could_ break
someone's code if they assume the key 'code' always exists
:s

I think this used to make sense in one permutation, but:

11 / 2 == 5.5
ceil(5.5) == 6
6 + 1 == 7

So we were previously using MAXIMUM_LINES_TO_KEEP in a very
obsfucated way...
This is largely covered by 'stacktrace_spec', but these tests can be a
bit more specific
These were required when Stacktrace was a class and needed to be
'new'-ed, but now it returns an array already
@imjoehaines imjoehaines marked this pull request as ready for review July 15, 2020 14:50
lib/bugsnag/stacktrace.rb Outdated Show resolved Hide resolved
This is no longer needed because, other than misconfiguration, there's
no way for the filename to be empty here

it was implemented in a479b6e, which also let users add their own filters
for the stacktrace file name. This would have been necessary then, but seems
like it's not anymore — the only way to trigger this in a real backtrace is
by setting the project root to a filename, which isn't a realistic use-case

See #604 (review)
@imjoehaines imjoehaines merged commit d3c6d2b into next Jul 23, 2020
@imjoehaines imjoehaines deleted the code-extraction-performance-improvements branch July 23, 2020 15:58
@imjoehaines imjoehaines mentioned this pull request Jul 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants