-
-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add --jobs
(parallel git fetch)
#539
Add --jobs
(parallel git fetch)
#539
Conversation
It would be great to have specs for parallel fetch. |
Co-authored-by: Johannes Müller <straightshoota@gmail.com>
Agree. Now that it's enabled in spec it should get plenty coverage. |
After using this PR-version for a while I realized it would only parallelize This is now fixed. (had missed it in my previous testing because I'd muck |
docs/shards.adoc
Outdated
@@ -74,6 +74,7 @@ doesn't generate a conflict, thus generating a new _shard.lock_ file. | |||
--production:: same as _--frozen_ and _--without-development_ | |||
--skip-postinstall:: Does not run postinstall of dependencies. | |||
--skip-executables:: Does not install executables. | |||
--parallel-fetch:: Number of git fetches to perform in parallel (default: 8). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC this works for every resolver (implementing update_local_cache
), not just git?
If so, I'd rename it to parallel-jobs
or sth like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought fetch
would make the most sense for all resolvers where this applies
since it's about the data-fetching and nothing else - each VCS might call that step
something different but ultimately it's a "fetch".
It's only implemented for git atm, that's why I mentioned that one in the description.
(would replace "git" with "VCS" later, when other impls are added)
We could change the arg-name but to me parallel-jobs
feels less to the point than parallel-fetch
? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's only implemented for git atm, that's why I mentioned that one in the description.
I don't follow. You're calling Resolver#update_local_cache
for every resolver regardless, so how come you state it's implemented only for git?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heck... you are right.
I got confused because my first iteration was git-specific,
I totally forgot that it now indeed applies to all of them. 🙈
Ouch, that means this PR probably breaks hg.
The hg-side will need a change similar to this.
Sigh. This makes things more complicated.
I can port the change from git.cr to hg.cr - but I haven't wrapped my head
around the spec-suite enough to see how to make a test to validate it.
It looks like the current integration test only uses git URLs (no hg_url()
in there).
That may be the reason why it didn't catch the breakage - or maybe
it just doesn't exercise the parallelism at all.
(I thought it does because it uses multiple deps in some places -
but the fact that it didn't break for hg... confuses me 🤔)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. The best I could offer for now would be to actually exclude hg from
parallelism (like I thought it already was), by doing something like:
begin
- dep.resolver.update_local_cache
+ dep.resolver.update_local_cache if dep.resolver.is_a? GitResolver
ch.send(nil)
active -= 1
rescue ex : Exception
ch.send(ex)
end
end
But I feel like if the spec's didn't catch the hg issue then they
might also not sufficiently cover the git-side. So even though it works
splendidly for me under real usage, spec-coverage should probably
be addressed before merging this.
If someone more familiar with the spec-suite and/or hg
has an idea on how to best do that, please don't hesitate. 😬
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, but that will need test-coverage first.
For git I have some confidence that it works fine since I've been using it daily for a while, but
I've never encountered a hg-shard in the wild, so don't even have an URL to test that side manually.
Not sure when I'll have time to continue on this, so excluding Hg for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's totally fine. Concurrent fetches are just an optimization without functional differences. There's no need to have this implemented for all resolvers immediately.
Wording should be more generic, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolving this convo as I believe all points are addressed.
- Parallelism is restricted to only apply to GitResolver (because hg not implemented, yet)
- Flag has been renamed to
--jobs
to keep it generic (for consistency withbundler
and for when we add hg in the future) - Help-text still mentions git explicitly - but I believe that makes sense as to not
confuse users into thinking it would apply to all resolvers.
Please re-open or reply on main thread if any of these need further discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to keep the wording more general, but mention that it only works for git currently. For example:
--jobs:: Number of repository downloads to perform in parallel (default: 8). Currently only functions for git resolver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, updated in 670cfb7
Dropped "function" and "resolver" to make it shorter.
Good idea @carlhoerberg! Renamed in bac393b Fwiw, I left the help-screen as I'm torn on whether |
src/molinillo_solver.cr
Outdated
begin | ||
dep.resolver.update_local_cache if dep.resolver.is_a? GitResolver | ||
ch.send(nil) | ||
active -= 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid that this is not safe from races. Probably OK if shards
is not compiled with preview_mt
, but I wonder if we shouldn't use a Mutex
here to be on the safe side. @bcardiff WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm. Thanks for catching this!
Maybe Atomic can help here (assuming it is MT-safe?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes! Atomic is what we need here. Do you want to do the change? Let me know otherwise and we'll take care.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine. Shards is currently not intended to be built with multithreading. And I wouldn't even see a good reason to even go for that.
With any luck we should eventually get a tool in stdlib for abstracing such simple conccurrency workflows (cf. crystal-lang/crystal#6468). There should really be no need to implement this from scratch every time you want to run a couple tasks concurrently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the change seems easy enough, so I've added it here.
Better safe than sorry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And in fact, another fix right here.
I think this latter one could actually have struck even without MT.
rescue ex : Exception | ||
ch.send(ex) | ||
ensure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch!
Resolves #371
Resolves #239
This PR parallelizes git fetching which gives me a ~40% speed-up
for
shards install
on average.Only the fetching is parallelized, post-install still runs in serial.
I've experimented with parallelizing that too but the speed gains
were negligible and I'm not sure if it's always safe.
It's only implemented for git so far (no hg)
No test-coverage, yet!
Although parallelism is enabled during test runs, the existing specs don't seem to actually exercise it.