Add `--jobs` (parallel git fetch) #539

m-o-e · 2022-01-06T11:28:14Z

Resolves #371
Resolves #239

This PR parallelizes git fetching which gives me a ~40% speed-up
for shards install on average.

$ git clone https://github.com/amberframework/amber.git
$ cd amber

$ time shards --jobs=1
18.579s total

$ time shards --jobs=8
10.143s total

Only the fetching is parallelized, post-install still runs in serial.
I've experimented with parallelizing that too but the speed gains
were negligible and I'm not sure if it's always safe.
It's only implemented for git so far (no hg)
No test-coverage, yet!
Although parallelism is enabled during test runs, the existing specs don't seem to actually exercise it.

src/config.cr

src/resolvers/git.cr

straight-shoota · 2022-01-09T21:58:26Z

It would be great to have specs for parallel fetch.

Co-authored-by: Johannes Müller <straightshoota@gmail.com>

m-o-e · 2022-01-09T23:44:42Z

It would be great to have specs for parallel fetch.

Agree.

Now that it's enabled in spec it should get plenty coverage.
Do you think that is sufficient or should there be an isolated test?
(not sure how to create a meaningful one tbh, ideas welcome 🤔)

src/molinillo_solver.cr

m-o-e · 2022-02-06T19:07:14Z

After using this PR-version for a while I realized it would only parallelize
when a shard.lock exists, but not for the initial run after cloning a repo
without that file.

This is now fixed.

(had missed it in my previous testing because I'd muck
with shard.yml, delete lib, but never the shard.lock...)

src/molinillo_solver.cr

Sija · 2022-02-06T20:28:18Z

docs/shards.adoc

@@ -74,6 +74,7 @@ doesn't generate a conflict, thus generating a new _shard.lock_ file.
 --production:: same as _--frozen_ and _--without-development_
 --skip-postinstall:: Does not run postinstall of dependencies.
 --skip-executables:: Does not install executables.
+--parallel-fetch:: Number of git fetches to perform in parallel (default: 8).


IIUC this works for every resolver (implementing update_local_cache), not just git?
If so, I'd rename it to parallel-jobs or sth like that.

I thought fetch would make the most sense for all resolvers where this applies
since it's about the data-fetching and nothing else - each VCS might call that step
something different but ultimately it's a "fetch".

It's only implemented for git atm, that's why I mentioned that one in the description.
(would replace "git" with "VCS" later, when other impls are added)

We could change the arg-name but to me parallel-jobs feels less to the point than parallel-fetch? 🤔

It's only implemented for git atm, that's why I mentioned that one in the description.

I don't follow. You're calling Resolver#update_local_cache for every resolver regardless, so how come you state it's implemented only for git?

Heck... you are right.

I got confused because my first iteration was git-specific,
I totally forgot that it now indeed applies to all of them. 🙈

Ouch, that means this PR probably breaks hg.
The hg-side will need a change similar to this.

Sigh. This makes things more complicated.

I can port the change from git.cr to hg.cr - but I haven't wrapped my head
around the spec-suite enough to see how to make a test to validate it.

It looks like the current integration test only uses git URLs (no hg_url() in there).
That may be the reason why it didn't catch the breakage - or maybe
it just doesn't exercise the parallelism at all.

(I thought it does because it uses multiple deps in some places -
but the fact that it didn't break for hg... confuses me 🤔)

Hm. The best I could offer for now would be to actually exclude hg from
parallelism (like I thought it already was), by doing something like:

begin - dep.resolver.update_local_cache + dep.resolver.update_local_cache if dep.resolver.is_a? GitResolver ch.send(nil) active -= 1 rescue ex : Exception ch.send(ex) end end

But I feel like if the spec's didn't catch the hg issue then they
might also not sufficiently cover the git-side. So even though it works
splendidly for me under real usage, spec-coverage should probably
be addressed before merging this.

If someone more familiar with the spec-suite and/or hg
has an idea on how to best do that, please don't hesitate. 😬

Agree, but that will need test-coverage first.
For git I have some confidence that it works fine since I've been using it daily for a while, but
I've never encountered a hg-shard in the wild, so don't even have an URL to test that side manually.

Not sure when I'll have time to continue on this, so excluding Hg for now.

That's totally fine. Concurrent fetches are just an optimization without functional differences. There's no need to have this implemented for all resolvers immediately.
Wording should be more generic, though.

Resolving this convo as I believe all points are addressed.

Parallelism is restricted to only apply to GitResolver (because hg not implemented, yet)

Flag has been renamed to --jobs to keep it generic (for consistency with bundler
and for when we add hg in the future)

Help-text still mentions git explicitly - but I believe that makes sense as to not
confuse users into thinking it would apply to all resolvers.

Please re-open or reply on main thread if any of these need further discussion.

I'd prefer to keep the wording more general, but mention that it only works for git currently. For example:

--jobs:: Number of repository downloads to perform in parallel (default: 8). Currently only functions for git resolver.

Ok, updated in 670cfb7

Dropped "function" and "resolver" to make it shorter.

src/resolvers/resolver.cr

carlhoerberg · 2022-02-23T14:33:01Z

I purpose that you rename the argument to -j/--jobs, like make, bundler and #545 uses.

m-o-e · 2022-02-25T10:33:33Z

Good idea @carlhoerberg!

Renamed in bac393b

Fwiw, I left the help-screen as Number of git fetches to perform in parallel (default: 8)
to keep it clear what it actually does. If anyone would prefer a different text just let me know. :)

I'm torn on whether -j should also be added, left it out here for now.
(I guess for consistency with make/bundler we could add it - but otoh I don't think
anyone would blindly rely on -j to be present, and it may stand out a bit weirdly
as most of the other flags don't have a short version)

beta-ziliani · 2022-03-21T13:50:06Z

src/molinillo_solver.cr

+          begin
+            dep.resolver.update_local_cache if dep.resolver.is_a? GitResolver
+            ch.send(nil)
+            active -= 1


I'm afraid that this is not safe from races. Probably OK if shards is not compiled with preview_mt, but I wonder if we shouldn't use a Mutex here to be on the safe side. @bcardiff WDYT?

Hmmm. Thanks for catching this!
Maybe Atomic can help here (assuming it is MT-safe?).

Ah yes! Atomic is what we need here. Do you want to do the change? Let me know otherwise and we'll take care.

I think this is fine. Shards is currently not intended to be built with multithreading. And I wouldn't even see a good reason to even go for that.
With any luck we should eventually get a tool in stdlib for abstracing such simple conccurrency workflows (cf. crystal-lang/crystal#6468). There should really be no need to implement this from scratch every time you want to run a couple tasks concurrently.

Well, the change seems easy enough, so I've added it here.
Better safe than sorry.

And in fact, another fix right here.
I think this latter one could actually have struck even without MT.

beta-ziliani · 2022-03-21T20:44:52Z

src/molinillo_solver.cr

          rescue ex : Exception
            ch.send(ex)
+          ensure


Nice catch!

m-o-e added 2 commits January 6, 2022 12:15

Add --parallel-fetch

3dc739b

Update man-page

f2d7a0f

straight-shoota added the enhancement label Jan 9, 2022

straight-shoota reviewed Jan 9, 2022

View reviewed changes

src/config.cr Outdated Show resolved Hide resolved

src/resolvers/git.cr Outdated Show resolved Hide resolved

This was referenced Jan 9, 2022

Non-blocking dependencies fetching #371

Closed

Run git fetches and post installs in parallel #239

Closed

m-o-e and others added 2 commits January 9, 2022 23:48

Remove superfluous conditional

c49fa7b

Co-authored-by: Johannes Müller <straightshoota@gmail.com>

Always default parallelism to 8

1fcd9b9

Sija reviewed Jan 10, 2022

View reviewed changes

src/molinillo_solver.cr Show resolved Hide resolved

Parallelize initial fetch (when no lock exists)

5c3bdb1

Sija reviewed Feb 6, 2022

View reviewed changes

src/molinillo_solver.cr Outdated Show resolved Hide resolved

DRY

f3a3069

Sija reviewed Feb 6, 2022

View reviewed changes

Restrict parallelism to GitResolver

886fdbd

straight-shoota reviewed Feb 7, 2022

View reviewed changes

src/resolvers/resolver.cr Outdated Show resolved Hide resolved

Change def update_local_cache from abstract to empty

7f45434

Blacksmoke16 mentioned this pull request Feb 23, 2022

Allow building multiple build targets in parallel #545

Closed

Rename --parallel-fetch to --jobs

bac393b

Make help-text wording more general

670cfb7

straight-shoota approved these changes Mar 5, 2022

View reviewed changes

straight-shoota added 2 commits March 20, 2022 16:51

Merge branch 'master' into add-parallel-fetch

2393c4b

Remove private restriction on FossilResolver#update_local_cache

cad41bf

beta-ziliani reviewed Mar 21, 2022

View reviewed changes

m-o-e added 2 commits March 21, 2022 18:09

Fix potential race condition (MT)

aa65be1

Fix potential hang when Exception is raised

60c7788

m-o-e changed the title ~~Add --parallel-fetch~~ Add --jobs (parallel git fetch) Mar 21, 2022

beta-ziliani approved these changes Mar 21, 2022

View reviewed changes

src/molinillo_solver.cr

rescue ex : Exception

ch.send(ex)

ensure

Copy link

Member

beta-ziliani Mar 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

straight-shoota approved these changes Mar 22, 2022

View reviewed changes

straight-shoota added this to the v0.17.0 milestone Mar 22, 2022

straight-shoota merged commit 82c30a3 into crystal-lang:master Mar 22, 2022

m-o-e deleted the add-parallel-fetch branch March 22, 2022 21:08

straight-shoota added kind:feature and removed enhancement labels Mar 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `--jobs` (parallel git fetch) #539

Add `--jobs` (parallel git fetch) #539

m-o-e commented Jan 6, 2022 •

edited

Loading

straight-shoota commented Jan 9, 2022

m-o-e commented Jan 9, 2022

m-o-e commented Feb 6, 2022

Sija Feb 6, 2022

m-o-e Feb 6, 2022 •

edited

Loading

Sija Feb 6, 2022

m-o-e Feb 6, 2022

m-o-e Feb 6, 2022

m-o-e Feb 7, 2022

straight-shoota Feb 7, 2022 •

edited

Loading

m-o-e Feb 25, 2022

straight-shoota Feb 26, 2022

m-o-e Mar 5, 2022

carlhoerberg commented Feb 23, 2022 •

edited

Loading

m-o-e commented Feb 25, 2022 •

edited

Loading

beta-ziliani Mar 21, 2022

m-o-e Mar 21, 2022

beta-ziliani Mar 21, 2022

straight-shoota Mar 21, 2022 •

edited

Loading

m-o-e Mar 21, 2022

m-o-e Mar 21, 2022 •

edited

Loading

beta-ziliani Mar 21, 2022

Add --jobs (parallel git fetch) #539

Add --jobs (parallel git fetch) #539

Conversation

m-o-e commented Jan 6, 2022 • edited Loading

straight-shoota commented Jan 9, 2022

m-o-e commented Jan 9, 2022

m-o-e commented Feb 6, 2022

Choose a reason for hiding this comment

m-o-e Feb 6, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

straight-shoota Feb 7, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carlhoerberg commented Feb 23, 2022 • edited Loading

m-o-e commented Feb 25, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

straight-shoota Mar 21, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

m-o-e Mar 21, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Add `--jobs` (parallel git fetch) #539

Add `--jobs` (parallel git fetch) #539

m-o-e commented Jan 6, 2022 •

edited

Loading

m-o-e Feb 6, 2022 •

edited

Loading

straight-shoota Feb 7, 2022 •

edited

Loading

carlhoerberg commented Feb 23, 2022 •

edited

Loading

m-o-e commented Feb 25, 2022 •

edited

Loading

straight-shoota Mar 21, 2022 •

edited

Loading

m-o-e Mar 21, 2022 •

edited

Loading