Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --jobs (parallel git fetch) #539

Merged
merged 14 commits into from
Mar 22, 2022
3 changes: 2 additions & 1 deletion docs/shards.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ Exit status:
*init*::
Initializes a default _shard.yml_ in the current folder.

*install* [--frozen] [--without-development] [--production] [--skip-postinstall] [--skip-executables]::
*install* [--frozen] [--without-development] [--production] [--skip-postinstall] [--skip-executables] [--parallel-fetch=N]::
Resolves and installs dependencies into the _lib_ folder. If not already
present, generates a _shard.lock_ file from resolved dependencies, locking
version numbers or Git commits.
Expand All @@ -74,6 +74,7 @@ doesn't generate a conflict, thus generating a new _shard.lock_ file.
--production:: same as _--frozen_ and _--without-development_
--skip-postinstall:: Does not run postinstall of dependencies.
--skip-executables:: Does not install executables.
--parallel-fetch:: Number of git fetches to perform in parallel (default: 8).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC this works for every resolver (implementing update_local_cache), not just git?
If so, I'd rename it to parallel-jobs or sth like that.

Copy link
Contributor Author

@m-o-e m-o-e Feb 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought fetch would make the most sense for all resolvers where this applies
since it's about the data-fetching and nothing else - each VCS might call that step
something different but ultimately it's a "fetch".

It's only implemented for git atm, that's why I mentioned that one in the description.
(would replace "git" with "VCS" later, when other impls are added)

We could change the arg-name but to me parallel-jobs feels less to the point than parallel-fetch? 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only implemented for git atm, that's why I mentioned that one in the description.

I don't follow. You're calling Resolver#update_local_cache for every resolver regardless, so how come you state it's implemented only for git?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heck... you are right.

I got confused because my first iteration was git-specific,
I totally forgot that it now indeed applies to all of them. 🙈

Ouch, that means this PR probably breaks hg.
The hg-side will need a change similar to this.

Sigh. This makes things more complicated.

I can port the change from git.cr to hg.cr - but I haven't wrapped my head
around the spec-suite enough to see how to make a test to validate it.

It looks like the current integration test only uses git URLs (no hg_url() in there).
That may be the reason why it didn't catch the breakage - or maybe
it just doesn't exercise the parallelism at all.

(I thought it does because it uses multiple deps in some places -
but the fact that it didn't break for hg... confuses me 🤔)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. The best I could offer for now would be to actually exclude hg from
parallelism (like I thought it already was), by doing something like:

          begin
-           dep.resolver.update_local_cache
+           dep.resolver.update_local_cache if dep.resolver.is_a? GitResolver
            ch.send(nil)
            active -= 1
          rescue ex : Exception
            ch.send(ex)
          end
        end

But I feel like if the spec's didn't catch the hg issue then they
might also not sufficiently cover the git-side. So even though it works
splendidly for me under real usage, spec-coverage should probably
be addressed before merging this.

If someone more familiar with the spec-suite and/or hg
has an idea on how to best do that, please don't hesitate. 😬

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, but that will need test-coverage first.
For git I have some confidence that it works fine since I've been using it daily for a while, but
I've never encountered a hg-shard in the wild, so don't even have an URL to test that side manually.

Not sure when I'll have time to continue on this, so excluding Hg for now.

Copy link
Member

@straight-shoota straight-shoota Feb 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's totally fine. Concurrent fetches are just an optimization without functional differences. There's no need to have this implemented for all resolvers immediately.
Wording should be more generic, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolving this convo as I believe all points are addressed.

  • Parallelism is restricted to only apply to GitResolver (because hg not implemented, yet)
  • Flag has been renamed to --jobs to keep it generic (for consistency with bundler
    and for when we add hg in the future)
  • Help-text still mentions git explicitly - but I believe that makes sense as to not
    confuse users into thinking it would apply to all resolvers.

Please re-open or reply on main thread if any of these need further discussion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to keep the wording more general, but mention that it only works for git currently. For example:

--jobs:: Number of repository downloads to perform in parallel (default: 8). Currently only functions for git resolver.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, updated in 670cfb7

Dropped "function" and "resolver" to make it shorter.

--

*list* [--tree]::
Expand Down
1 change: 1 addition & 0 deletions src/cli.cr
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ module Shards
self.skip_executables = true
end
opts.on("--local", "Don't update remote repositories, use the local cache only.") { self.local = true }
opts.on("--parallel-fetch=N", "Number of git fetches to perform in parallel (default: 8)") { |n| self.parallel_fetch = n.to_i }
# TODO: remove in the future
opts.on("--ignore-crystal-version", "Has no effect. Kept for compatibility, to be removed in the future.") { }
opts.on("-v", "--verbose", "Increase the log verbosity, printing all debug statements.") { self.set_debug_log_level }
Expand Down
3 changes: 3 additions & 0 deletions src/config.cr
Original file line number Diff line number Diff line change
Expand Up @@ -107,4 +107,7 @@ module Shards
class_property? local = false
class_property? skip_postinstall = false
class_property? skip_executables = false

# Defaults to 1 (disabled) when running in `crystal spec`
class_property parallel_fetch : Int32 = {{ @top_level.has_constant?("Spec") ? 1 : 8 }}
m-o-e marked this conversation as resolved.
Show resolved Hide resolved
end
31 changes: 31 additions & 0 deletions src/molinillo_solver.cr
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,38 @@ module Shards
end
end

private def prefetch_local_caches(lock_index, deps)
count = 0
active = 0
ch = Channel(Exception?).new(deps.size + 1)
deps.each do |dep|
next unless lock = lock_index[dep.name]?
next unless dep.matches?(lock.version)
count += 1
active += 1
while active > Shards.parallel_fetch
sleep 0.1
end
spawn do
begin
dep.resolver.update_local_cache
ch.send(nil)
active -= 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid that this is not safe from races. Probably OK if shards is not compiled with preview_mt, but I wonder if we shouldn't use a Mutex here to be on the safe side. @bcardiff WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. Thanks for catching this!
Maybe Atomic can help here (assuming it is MT-safe?).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes! Atomic is what we need here. Do you want to do the change? Let me know otherwise and we'll take care.

Copy link
Member

@straight-shoota straight-shoota Mar 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine. Shards is currently not intended to be built with multithreading. And I wouldn't even see a good reason to even go for that.
With any luck we should eventually get a tool in stdlib for abstracing such simple conccurrency workflows (cf. crystal-lang/crystal#6468). There should really be no need to implement this from scratch every time you want to run a couple tasks concurrently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the change seems easy enough, so I've added it here.
Better safe than sorry.

Copy link
Contributor Author

@m-o-e m-o-e Mar 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And in fact, another fix right here.
I think this latter one could actually have struck even without MT.

rescue ex : Exception
ch.send(ex)
end
end
end

count.times do
obj = ch.receive
raise obj if obj.is_a? Exception
m-o-e marked this conversation as resolved.
Show resolved Hide resolved
end
end

private def add_lock(base, lock_index, deps : Array(Dependency))
prefetch_local_caches(lock_index, deps) if Shards.parallel_fetch > 1

deps.each do |dep|
if lock = lock_index[dep.name]?
next unless dep.matches?(lock.version)
Expand Down
3 changes: 3 additions & 0 deletions src/resolvers/crystal.cr
Original file line number Diff line number Diff line change
Expand Up @@ -21,5 +21,8 @@ module Shards
def report_version(version : Version) : String
version.value
end

def update_local_cache
end
end
end
18 changes: 11 additions & 7 deletions src/resolvers/git.cr
Original file line number Diff line number Diff line change
Expand Up @@ -283,7 +283,7 @@ module Shards
end
end

private def update_local_cache
def update_local_cache
if cloned_repository? && origin_changed?
delete_repository
@updated_cache = false
Expand Down Expand Up @@ -317,7 +317,7 @@ module Shards
# This configuration can be overriden by defining the environment
# variable `GIT_ASKPASS`.
git_retry(err: "Failed to clone #{git_url}") do
run_in_current_folder "git clone -c core.askPass=true --mirror --quiet -- #{Process.quote(git_url)} #{Process.quote(local_path)}"
run_in_folder "git clone -c core.askPass=true --mirror --quiet -- #{Process.quote(git_url)} #{Process.quote(local_path)}"
end
end

Expand Down Expand Up @@ -411,12 +411,12 @@ module Shards
dependency_name = File.basename(path, ".git")
raise Error.new("Missing repository cache for #{dependency_name.inspect}. Please run without --local to fetch it.")
end
Dir.cd(path) do
run_in_current_folder(command, capture)
end
run_in_folder(command, path, capture)
end

private def run_in_current_folder(command, capture = false)
# Chdir to a folder and run command.
# Runs in current folder if `path` is nil.
private def run_in_folder(command, path : String? = nil, capture = false)
unless GitResolver.has_git_command?
raise Error.new("Error missing git command line tool. Please install Git first!")
end
Expand All @@ -425,7 +425,11 @@ module Shards

output = capture ? IO::Memory.new : Process::Redirect::Close
error = IO::Memory.new
status = Process.run(command, shell: true, output: output, error: error)
if path
status = Process.run(command, shell: true, output: output, error: error, chdir: path)
else
status = Process.run(command, shell: true, output: output, error: error)
end
m-o-e marked this conversation as resolved.
Show resolved Hide resolved

if status.success?
output.to_s if capture
Expand Down
2 changes: 1 addition & 1 deletion src/resolvers/hg.cr
Original file line number Diff line number Diff line change
Expand Up @@ -302,7 +302,7 @@ module Shards
end
end

private def update_local_cache
def update_local_cache
if cloned_repository? && origin_changed?
delete_repository
@updated_cache = false
Expand Down
3 changes: 3 additions & 0 deletions src/resolvers/path.cr
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,9 @@ module Shards
"#{version.value} at #{source}"
end

def update_local_cache
end

register_resolver "path", PathResolver
end
end
1 change: 1 addition & 0 deletions src/resolvers/resolver.cr
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,7 @@ module Shards
abstract def read_spec(version : Version) : String?
abstract def install_sources(version : Version, install_dir : String)
abstract def report_version(version : Version) : String
abstract def update_local_cache
m-o-e marked this conversation as resolved.
Show resolved Hide resolved

def parse_requirement(params : Hash(String, String)) : Requirement
if version = params["version"]?
Expand Down