Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change default scheduler for eachobsparallel #80

Merged
merged 2 commits into from
Apr 28, 2022

Conversation

lorenzoh
Copy link
Contributor

@lorenzoh lorenzoh commented Apr 27, 2022

This changes the default executor of eachobsparallel to use TaskPoolEx(basesize=1, background=true). I did some more testing and found that this peforms on-par with DataLoaders.jl.

I used the following (CPU- and IO-heavy) benchmark:

using FastAI

data, blocks = loaddataset("imagenette2-320", (Image, Label))

for obs in tqdm(MLUtils.eachobsparallel(data, executor = TaskPoolEx(basesize=1, background=true)))

for obs in tqdm(DataLoaders.eachobsparallel(data, executor = TaskPoolEx(basesize=1, background=true)))

Both finish in 4.5s, reaching 3000samples/s throughput on my machine with Julia 1.7.2, Threads.nthreads() == 12 (CPU: AMD Ryzen Threadripper 1920X 12-Core). Also tested on 1.8.0-beta3 with similar results.

This also fixes the issue with the previous default executor (ThreadedEx) that was so fast that it froze the primary thread and even other processes like the SSH server (as experienced by me and @darsnack).

Happy to say this was the last blocker before deprecating DataLoaders.jl and moving FastAI.jl to depend on MLUtils.jl (see FluxML/FastAI.jl#196)

@CarloLucibello
Copy link
Member

what's going on with CI never finishing?

@lorenzoh
Copy link
Contributor Author

Is that just for this PR?

If CI is running single-threaded, then starting a task pool with background only-tasks will never yield a valid task if only one task is available.

src/parallel.jl Outdated Show resolved Hide resolved
To get something that runs even on single-threaded Julia setups
@codecov-commenter
Copy link

Codecov Report

Merging #80 (331a807) into main (8ad403b) will not change coverage.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main      #80   +/-   ##
=======================================
  Coverage   89.89%   89.89%           
=======================================
  Files          14       14           
  Lines         475      475           
=======================================
  Hits          427      427           
  Misses         48       48           
Impacted Files Coverage Δ
src/parallel.jl 94.91% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8ad403b...331a807. Read the comment docs.

@lorenzoh lorenzoh merged commit d11cb69 into main Apr 28, 2022
@lorenzoh
Copy link
Contributor Author

Can we tag a new patch release for this?

@CarloLucibello
Copy link
Member

Doing it right now.

@CarloLucibello
Copy link
Member

Just a heads up: since eachobsparallel is not exported nor documented, it should be considered experimental/internal for the time being so any change to it should not be considered breaking. So if FastAI.jl depends on it should be a bit careful with versioning. This is also to say that we should consolidate as soon as possible a public interface (e.g. a parallel option in DataLoader, in eachobs, or anything else).

@CarloLucibello CarloLucibello deleted the lo/change-default-scheduler branch June 28, 2022 03:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants