Performance: memomize LinesClassifier::no_cov_line #662

BMorearty · 2018-03-09T03:34:08Z

In my measurements the majority of the time in SimpleCov was being
spent in this one-line function.

In my tests on a large project at Airbnb, this change makes SimpleCov run
from 2.5x to 3.75x faster when processing results.

(From 154.2 seconds down to 60.6 seconds in one test.
From 16.6 seconds down to 4.4 seconds in another.)

Admittedly, memoizing it does change its behavior but only in the edge
case where someone runs SimpleCov twice in a single test run, changing
the +nocov_token+ between the two runs. This is such an odd edge case I
decided not to worry about it.

In my measurements the majority of the time in SimpleCov was being spent in this one-line function. In my tests on a large project, this change makes SimpleCov run from 2.5x to 3.75x faster. Admittedly, memoizing it does change its behavior but only in the edge case where someone runs SimpleCov twice in a single test run, changing the +nocov_token+ between the two runs. This is such an odd edge case I decided not to worry about it.

PragTob

Hi there,

thanks for the contribution!

Could you share a benchmark or parts of it for this? I guess you can't share the airbnb code base ;P

I'm just curious (plus like to keep benchmarks around and repeatable) it seems odd for such a little line to cause so much overhead, but as it's evaluated a lot and a new regex is instantiated every time I guess it makes sense...

I agree, the edge case is negligible / people shouldn't change there nocov tokens. I'd even debate the ability to change this tokens as not really necessary imo (and then this could even be in a constant...). But that's nod a change for now.

Thanks a bunch!

BMorearty · 2018-03-09T15:40:15Z

Haha I'd be happy to share the airbnb code base with you if you come work here! 😛

But hopefully this will help: our single largest Ruby project has tens of thousands of files and millions of lines of code. Our CI runs it with a test splitter that splits it into 720 parallel runs on 24 machines. (!)

If we use track_files "**/*.rb", each worker calls no_cov_line 3,607,312 times.

If we don't use track_files, each worker calls no_cov_line 456,353 times. (I'm removing the call to track_files and will run that in a parallel step instead so each build worker doesn't have to do all that redundant work.)

Either way, all those constructions of RegExp objects take a heck of a long time:

Before memoizing:

[4] pry(main)> Benchmark.realtime { 3_607_312.times { SimpleCov::LinesClassifier.no_cov_line } }
=> 50.070164845
[5] pry(main)> Benchmark.realtime { 456_353.times { SimpleCov::LinesClassifier.no_cov_line } }
=> 6.306209134

After memoizing:

[4] pry(main)> Benchmark.realtime { 3_607_312.times { SimpleCov::LinesClassifier.no_cov_line } }
=> 0.446957834
[5] pry(main)> Benchmark.realtime { 456_353.times { SimpleCov::LinesClassifier.no_cov_line } }
=> 0.052228564

BMorearty · 2018-03-09T15:41:19Z

P.S. Those benchmarks were in Ruby 2.1.8

BMorearty · 2018-03-09T20:03:35Z

Please hold off on merging this until I resolve an issue I'm getting in our CI, although I can't repro it locally:

.../vendor/bundle/ruby/2.1.0/gems/simplecov-0.15.1.abnb/lib/simplecov/lines_classifier.rb:23:in `=~': invalid byte sequence in UTF-8 (ArgumentError)

BMorearty · 2018-03-09T21:47:02Z

The error is happening because if line =~ self.class.no_cov_line is executing on this string from the builder gem: https://github.com/jimweirich/builder/blob/master/test/performance.rb#L17

The string is "text = "This is a test of the new xml markup. I\xF1t\xEBrn\xE2ti\xF4n\xE0liz\xE6ti\xF8n\n" * 10000\n"

I can verify this in pry:

[1] pry(main)> str = "text = \"This is a test of the new xml markup. I\xF1t\xEBrn\xE2ti\xF4n\xE0liz\xE6ti\xF8n\\n\" * 10000\n"
=> "text = \"This is a test of the new xml markup. I\xF1t\xEBrn\xE2ti\xF4n\xE0liz\xE6ti\xF8n\\n\" * 10000\n"
[2] pry(main)> str =~ /SimpleCov::LinesClassifier.no_cov_line/
ArgumentError: invalid byte sequence in UTF-8

What I'm curious about is why, on SimpleCov head, this file is being processed at all. It seems like a bug. I was previously using 0.14.1 and it did not process this file because our filters are:

    add_filter [
      "/config/",
      "/db/",
      "/script/",
      "/spec/",
      "/test/",
      "/vendor/",
    ]

and the filename is /tmp/d20180309-12273-5g2awg/monorail-solano-ghe/vendor/bundle/ruby/2.1.0/gems/builder-2.1.2/test/performance.rb. Since this filename includes the string /vendor/, simplecov should reject it before calling SimpleCov::LinesClassifier#classify.

I am familiar with the change in 0.15.1 to add support for regexp filters but it does not seem like that would affect it.

Is this a bug that was introduced after 0.14.1? Is it calling SimpleCov::LinesClassifier#classify on files that it shouldn't?

BMorearty · 2018-03-10T00:31:25Z

Update: feel free to merge this. The issue I'm seeing is unrelated to this change. I get the same behavior when I upgrade from 0.14.1 to 0.15.1 without this change. Haven't yet figured out why.

PragTob · 2018-03-10T09:28:39Z

@BMorearty thanks for the input. It does seem like we might have had some form of regression regarding filters but I'm unsure and honestly I don't have the most time to spend on simplecov but I might be able to investigate and repro. Similarly, maybe I get to wip up a quick benchmark for our line classification.

PragTob · 2018-03-10T10:37:26Z

Ok wipped up a quick benchmark and it looks good:

# this is master
tobi@speedy ~/github/simplecov $ ruby benchmarks/result.rb 
Warming up --------------------------------------
generating a simplecov result
                         7.000  i/100ms
Calculating -------------------------------------
generating a simplecov result
                         72.193  (± 2.8%) i/s -    364.000  in   5.048140s
tobi@speedy ~/github/simplecov $ git checkout -
M	Gemfile
Switched to branch 'BMorearty-performance-improvement'
tobi@speedy ~/github/simplecov $ ruby benchmarks/result.rb 
Warming up --------------------------------------
generating a simplecov result
                        26.000  i/100ms
Calculating -------------------------------------
generating a simplecov result
                        270.671  (± 1.8%) i/s -      1.378k in   5.092701s

PragTob

Thanks again 🎉

here, have some bunnies for your good contributions!

BMorearty force-pushed the performance-improvement branch from 2d31d4b to 93b82b2 Compare March 9, 2018 03:35

PragTob reviewed Mar 9, 2018

View reviewed changes

PragTob mentioned this pull request Mar 10, 2018

Benchmark result generations statistics #665

Merged

PragTob approved these changes Mar 10, 2018

View reviewed changes

PragTob merged commit 6250313 into simplecov-ruby:master Mar 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance: memomize LinesClassifier::no_cov_line #662

Performance: memomize LinesClassifier::no_cov_line #662

BMorearty commented Mar 9, 2018

PragTob left a comment

BMorearty commented Mar 9, 2018

BMorearty commented Mar 9, 2018

BMorearty commented Mar 9, 2018

BMorearty commented Mar 9, 2018

BMorearty commented Mar 10, 2018 •

edited

Loading

PragTob commented Mar 10, 2018

PragTob commented Mar 10, 2018

PragTob left a comment

Performance: memomize LinesClassifier::no_cov_line #662

Performance: memomize LinesClassifier::no_cov_line #662

Conversation

BMorearty commented Mar 9, 2018

PragTob left a comment

Choose a reason for hiding this comment

BMorearty commented Mar 9, 2018

BMorearty commented Mar 9, 2018

BMorearty commented Mar 9, 2018

BMorearty commented Mar 9, 2018

BMorearty commented Mar 10, 2018 • edited Loading

PragTob commented Mar 10, 2018

PragTob commented Mar 10, 2018

PragTob left a comment

Choose a reason for hiding this comment

BMorearty commented Mar 10, 2018 •

edited

Loading