Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize RBI parsing to speed up the check-shims command #1056

Merged
merged 3 commits into from
Jul 14, 2022

Conversation

Morriar
Copy link
Collaborator

@Morriar Morriar commented Jul 12, 2022

Motivation

Running tapioca check-shims can take up to 7min in Core (~13 when running on CI). The culprit is parsing the RBI files.

This PR proposes to speed up the process by parallelizing RBI parsing.

Implementation

I reused Executor that we already use to parallelize RBI generation.

Tests

No behavior change.

I tested the speed-up on Tapioca itself:

Before:

$ time bundle exec tapioca check-shims

Loading Sorbet payload...  Done (5.36s)
Loading shim RBIs from sorbet/rbi/shims...  Done (0.01s)
Loading gem RBIs from sorbet/rbi/gems...  Done (17.93s)
Looking for duplicates...  Done (0.02s)

real	0m24.291s
user	0m23.251s
sys	0m0.564s

After:

$ time bundle exec tapioca check-shims -w 10
Loading Sorbet payload...  Done (1.51s)
Loading shim RBIs from sorbet/rbi/shims...  Done (0.21s)
Loading gem RBIs from sorbet/rbi/gems...  Done (7.16s)
Looking for duplicates...  Done (0.03s)

real	0m9.561s
user	0m31.153s
sys	0m3.086s

So a > 50% improvement here.

I also tested on Core:

Before:

$ time bundle exec tapioca check-shims
Loading Sorbet payload...  Done (10.05s)
Loading shim RBIs from sorbet/rbi/shims...  Done (1.32s)
Loading gem RBIs from sorbet/rbi/gems...  Done (165.56s)
Loading dsl RBIs from sorbet/rbi/dsl...  Done (227.36s)
Loading annotation RBIs from sorbet/rbi/annotations...  Done (0.08s)
Looking for duplicates...  Done (0.5s)

real	6m52.147s
user	6m48.205s
sys	0m3.855s

After:

$ time bundle exec tapioca check-shims -w 10
Loading Sorbet payload...  Done (6.49s)
Loading shim RBIs from sorbet/rbi/shims...  Done (3.36s)
Loading gem RBIs from sorbet/rbi/gems...  Done (105.36s)
Loading dsl RBIs from sorbet/rbi/dsl...  Done (78.48s)
Loading annotation RBIs from sorbet/rbi/annotations...  Done (3.02s)
Looking for duplicates...  Done (0.44s)

real	3m24.315s
user	9m19.503s
sys	1m1.437s

So a > 50% improvement here too.

@Morriar Morriar requested a review from a team July 12, 2022 17:56
@Morriar Morriar self-assigned this Jul 12, 2022
@Morriar Morriar added the enhancement New feature or request label Jul 12, 2022
Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>
index_rbis(index, "shim", @shim_rbi_dir, number_of_workers: @number_of_workers)
index_rbis(index, "gem", @gem_rbi_dir, number_of_workers: @number_of_workers)
index_rbis(index, "dsl", @dsl_rbi_dir, number_of_workers: @number_of_workers)
index_rbis(index, "annotation", @annotations_rbi_dir, number_of_workers: @number_of_workers)

duplicates = duplicated_nodes_from_index(index, shim_rbi_dir: @shim_rbi_dir, todo_rbi_file: @todo_rbi_file)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering for a future improvement, could the duplicates for each file be stored in a cache similar to what RuboCop does?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe.

A fast and elegant way would be to ask Sorbet through a custom LSP request.

@@ -250,5 +248,14 @@ def update_gem_rbis_strictnesses(errors, gem_dir)
def gem_name_from_rbi_path(path)
T.must(File.basename(path, ".rbi").split("@").first)
end

sig { params(block: T.proc.void).returns(Numeric) }
def with_timer(&block)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use Benchmark instead? I think it does exactly the same. Also, do we actually need to measure this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use Benchmark instead?

Good idea!

Also, do we actually need to measure this?

I was actually thinking to extend the concept to other commands so we can easily see where Tapioca is slow or slower.

lib/tapioca/helpers/rbi_files_helper.rb Show resolved Hide resolved
lib/tapioca/helpers/rbi_files_helper.rb Outdated Show resolved Hide resolved
Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>
Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>
@Morriar Morriar merged commit c4e01fc into main Jul 14, 2022
@Morriar Morriar deleted the at-check-shims-parallel branch July 14, 2022 17:00
@shopify-shipit shopify-shipit bot temporarily deployed to production July 14, 2022 18:56 Inactive
@shopify-shipit shopify-shipit bot temporarily deployed to 0-9-stable August 22, 2022 21:36 Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants