Running cops in parallel #117

bbatsov · 2013-05-04T06:38:38Z

As we add more and more checks the performance naturally regresses a bit. Luckily the things we're doing are pretty parallelizable - we can simply split the cops into 3 groups, run them in 3 threads and just merge the results in the end. Even though Ruby's threading is not exactly its strong suite I still think we might be able to get some benefit out of this.

JRuby is inching closer to supporting Ripper(https://gist.github.com/enebo/4548938) and there the performance boost should be even bigger.

This issue is here mostly for discussion purposes. If anyone has the time to run a few simple experiments with basic parallelization and share the results that'd be great.

bolandrm · 2013-05-04T16:05:48Z

I'm going to run some benchmarks!

bolandrm · 2013-05-04T18:04:42Z

Taking a look, I think that the time it takes to run the cops is trivial compared to the I/O time to read each file. The performance was actually worse when trying to parallelize the cops (this may be different with JRuby).

However, parallelizing the file reads, I was able to check the rubocop source 8 seconds faster by switching 2 lines of code (using the Parallel gem):

# ...

require 'parallel'

# ...

#target_files(args).each do |file|
Parallel.each(target_files(args)) do |file|

# ...

For some reason it's not incrementing the files checked when it runs in parallel.

With a little more tinkering, I may be able to set it up so that the -p flag would use the parallel method. This would allow the brave people to use it while not causing problems for anyone else.

bbatsov · 2013-05-04T18:22:40Z

This sounds very promising. I support your suggestion about the -p flag.
—
Cheers,
Bozhidar

On Sat, May 4, 2013 at 9:04 PM, Ryan Boland notifications@github.com
wrote:

Taking a look, I think that the time it takes to run the cops is trivial compared to the I/O time to read each file. The performance was actually worse when trying to parallelize the cops (this may be different with JRuby).
However, parallelizing the file reads, I was able to check the rubocop source 8 seconds faster by switching 2 lines of code (using the Parallel gem):
require 'parallel'
# ...
#target_files(args).each do |file|
Parallel.each(target_files(args)) do |file|
#...
For some reason it's not incrementing the files checked when it runs in parallel.

With a little more tinkering, I may be able to set it up so that the -p flag would use the parallel method. This would allow the brave people to use it while not causing problems for anyone else.

Reply to this email directly or view it on GitHub:
#117 (comment)

bbatsov · 2013-05-05T09:40:26Z

Btw, @bolandrm, you might also take a look at Celluloid. I haven't used it, but I've hearing it's a pretty good gem if you're looking for a higher level threading API.

jurriaan · 2013-05-06T19:48:10Z

@bolandrm If you use Parallel, you should use Parallel.map and return report at the end of the block.
That way you can loop Report#entries when processing is finished and get the correct number of offenses :)
Don't try to change non-local variables in the block, that causes trouble.

What's the speedup if you use processes instead of threads?

jurriaan · 2013-05-06T20:19:17Z

@bolandrm See jurriaan/rubocop@de02724b75d1 for how to fix incrementing files and offences

bolandrm · 2013-05-06T20:21:15Z

@jurriaan Good call, thanks.

I was thinking that Parallel used processes by default (if possible on the particular system). Their documentation isn't that great. I was going to check out Celluloid.

jurriaan · 2013-05-06T20:25:48Z

Celluloid looks interesting, never used it though. Looking forward to your results!

whitequark · 2013-05-14T19:30:18Z

You really do not need Celluloid here... Parallel is more than enough. I would suggest simply marshalling the offences, then using Parallel#map and then displaying them in the main process.

jurriaan · 2013-05-15T05:30:36Z

@bolandrm Are you working on this right now? I could look into this if you want to?

bbatsov · 2013-05-15T05:48:32Z

@jurriaan If you can cook something up soon it would be great. I would have done that myself by now, but I'm currently tied up porting cops to Parser.

bolandrm · 2013-05-15T09:11:25Z

@jurriaan please feel free to take this over. I was planning on finishing it eventually but I'm really busy at the moment.

On May 15, 2013, at 1:48 AM, Bozhidar Batsov notifications@github.com wrote:

@jurriaan If you can cook something up soon it would be great. I would have done that myself by now, but I'm currently tied up porting cops to Parser.

—
Reply to this email directly or view it on GitHub.

bbatsov · 2013-07-03T14:26:54Z

I'll be closing this one, since @edzhelyov and I have an idea that would render the need for running cops in parallel obsolete.

jurriaan · 2013-07-03T14:29:46Z

Fine, I didn't have the time to implement it, and the code was changing too much for me. What's the new idea? :)

bbatsov · 2013-07-03T14:35:00Z

We'll traverse the AST only once in a dispatch-like class, that would propagate the parser events to the interested cops(all cops that are implemented as parser processors). Since most of the execution time is currently spend in traversing the AST over and over in each cop that approach should speed up things significantly. There are a few cops for which this is not applicable, but overall there should be a huge speed improvement.

jurriaan · 2013-07-03T14:45:30Z

Sounds great! 👍

yujinakayama · 2013-07-03T14:52:21Z

@bbatsov 👍 I was also thinking the behavior “traversing the AST over and over in each cop” wastes too much CPU. Though, in exchange for the waste, we could implement each cop without considering side effects on other cops.

Actually I was about to refactor the inspection and parsing logic of CLI, but probably I should leave them for now?

bbatsov · 2013-07-03T21:19:25Z

The inspection and parsing logic should definitely be moved out of the CLI anyways. Probably you should discuss your refactoring plans with @edzhelyov. Maybe it would be best if you wrote an email about them on the mailing list.

tibbon · 2015-07-13T18:23:56Z

Any chance of reopening this? One of my projects has ~2300 files. It takes around 2 minutes to run Rubocop currently with our configuration on my 2.6 GHz Intel Core i7 Macbook Pro, and only using one out of 8 cores seems suboptimial.

jcoglan · 2015-10-08T12:20:43Z

Is there any interest in revisiting this? Our codebase has around 1,600 files to check, and writing a quick script to split the file list into several groups and check each group in a separate process roughly halves the time it takes to check our codebase (1m30s down to 45s).

jonas054 · 2015-10-08T13:20:48Z

@jcoglan What about the caching mechanism introduced in v0.34.2? Doesn't that solve your problem?

jcoglan · 2015-10-12T16:06:54Z

@jonas054 Yes, I've tried that now and it makes a huge difference. Thanks :)

pt-stripe · 2016-12-15T08:31:58Z

Any chance of revisiting this decision?

$ rubocop --version
0.44.1
$ rubocop -L | time xargs -P`sysctl -n hw.ncpu` -n250 rubocop -a
...
      221.84 real      1473.85 user        21.34 sys
$ rubocop -L | time xargs -P`sysctl -n hw.ncpu` -n250 rubocop -a
...
      216.42 real      1469.30 user        20.55 sys
$ time rubocop -a
...

real    11m57.188s
user    11m43.502s
sys     0m11.112s

On our codebase it only takes 3.5 minutes to run 8x in parallel but it takes 12 minutes to do it serially. I'm sure with inbuilt support for parallelizing it can get even better.

bbatsov · 2016-12-15T10:16:22Z

On our codebase it only takes 3.5 minutes to run 8x in parallel but it takes 12 minutes to do it serially. I'm sure with inbuilt support for parallelizing it can get even better.

Adding built-in support for parallel execution requires a lot of changes. If this wasn't the case we would have done it by now. :-) Frankly, I doubt we'll every get there - I'm certainly not eager to work on this myself.

This change tries to solve the tricky parallel execution problem by spawning off a number of processes/threads to do file inspection, sharing the work between them, without collecting any output. When all processes are finished, the original process runs the full inspection again, taking advantage of result caching.

jurriaan mentioned this issue May 15, 2013

[WiP] Parallel Rubocop, fixes #117 #175

Closed

bbatsov closed this as completed Jul 3, 2013

josh mentioned this issue Dec 16, 2016

WIP Parallel Rubocop #3794

Closed

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running cops in parallel #117

Running cops in parallel #117

bbatsov commented May 4, 2013

bolandrm commented May 4, 2013

bolandrm commented May 4, 2013

bbatsov commented May 4, 2013

With a little more tinkering, I may be able to set it up so that the -p flag would use the parallel method. This would allow the brave people to use it while not causing problems for anyone else.

bbatsov commented May 5, 2013

jurriaan commented May 6, 2013

jurriaan commented May 6, 2013

bolandrm commented May 6, 2013

jurriaan commented May 6, 2013

whitequark commented May 14, 2013

jurriaan commented May 15, 2013

bbatsov commented May 15, 2013

bolandrm commented May 15, 2013

bbatsov commented Jul 3, 2013

jurriaan commented Jul 3, 2013

bbatsov commented Jul 3, 2013

jurriaan commented Jul 3, 2013

yujinakayama commented Jul 3, 2013

bbatsov commented Jul 3, 2013

tibbon commented Jul 13, 2015

jcoglan commented Oct 8, 2015

jonas054 commented Oct 8, 2015

jcoglan commented Oct 12, 2015

pt-stripe commented Dec 15, 2016 •

edited

bbatsov commented Dec 15, 2016

Running cops in parallel #117

Running cops in parallel #117

Comments

bbatsov commented May 4, 2013

bolandrm commented May 4, 2013

bolandrm commented May 4, 2013

bbatsov commented May 4, 2013

With a little more tinkering, I may be able to set it up so that the -p flag would use the parallel method. This would allow the brave people to use it while not causing problems for anyone else.

bbatsov commented May 5, 2013

jurriaan commented May 6, 2013

jurriaan commented May 6, 2013

bolandrm commented May 6, 2013

jurriaan commented May 6, 2013

whitequark commented May 14, 2013

jurriaan commented May 15, 2013

bbatsov commented May 15, 2013

bolandrm commented May 15, 2013

bbatsov commented Jul 3, 2013

jurriaan commented Jul 3, 2013

bbatsov commented Jul 3, 2013

jurriaan commented Jul 3, 2013

yujinakayama commented Jul 3, 2013

bbatsov commented Jul 3, 2013

tibbon commented Jul 13, 2015

jcoglan commented Oct 8, 2015

jonas054 commented Oct 8, 2015

jcoglan commented Oct 12, 2015

pt-stripe commented Dec 15, 2016 • edited

bbatsov commented Dec 15, 2016

pt-stripe commented Dec 15, 2016 •

edited