-
Notifications
You must be signed in to change notification settings - Fork 19
disable_polyester_threads #86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Benchmarks with and without it below:
```
using Base.Threads
using BenchmarkTools
using Revise
using Polyester
\##
function inner(x,y,j)
for i ∈ axes(x,1)
y[i,j] = sin(x[i,j])
end
end
function inner_polyester(x,y,j)
@Batch for i ∈ axes(x,1)
y[i,j] = sin(x[i,j])
end
end
function inner_thread(x,y,j)
@threads for i ∈ axes(x,1)
y[i,j] = sin(x[i,j])
end
end
function sequential_sequential(x,y)
for j ∈ axes(x,2)
inner(x,y,j)
end
end
function sequential_polyester(x,y)
for j ∈ axes(x,2)
inner_polyester(x,y,j)
end
end
function sequential_thread(x,y)
for j ∈ axes(x,2)
inner_thread(x,y,j)
end
end
function threads_of_polyester(x,y)
@threads for j ∈ axes(x,2)
inner_polyester(x,y,j)
end
end
function threads_of_polyester_inner_disable(x,y)
@threads for j ∈ axes(x,2)
Polyester.disable_polyester_threads() do
inner_polyester(x,y,j)
end
end
end
function threads_of_thread(x,y)
@threads for j ∈ axes(x,2)
inner_thread(x,y,j)
end
end
function threads_of_thread(x,y)
@threads for j ∈ axes(x,2)
inner_thread(x,y,j)
end
end
function threads_of_sequential(x,y)
@threads for j ∈ axes(x,2)
inner(x,y,j)
end
end
# Big inner problem, repeated only a few times
y = rand(10000000,4);
x = rand(size(y)...);
@Btime inner($x,$y,1) # 73.319 ms (0 allocations: 0 bytes)
@Btime inner_polyester($x,$y,1) # 8.936 ms (0 allocations: 0 bytes)
@Btime inner_thread($x,$y,1) # 11.206 ms (49 allocations: 4.56 KiB)
@Btime sequential_sequential($x,$y) # 274.926 ms (0 allocations: 0 bytes)
@Btime sequential_polyester($x,$y) # 36.963 ms (0 allocations: 0 bytes)
@Btime sequential_thread($x,$y) # 49.373 ms (196 allocations: 18.25 KiB)
@Btime threads_of_polyester($x,$y) # 78.828 ms (58 allocations: 4.84 KiB)
@Btime threads_of_polyester_inner_disable($x,$y) # 70.182 ms (47 allocations: 4.50 KiB)
@Btime Polyester.disable_polyester_threads() do; threads_of_polyester($x,$y) end; # 71.141 ms (47 allocations: 4.50 KiB)
@Btime threads_of_sequential($x,$y) # 70.857 ms (46 allocations: 4.47 KiB)
@Btime threads_of_thread($x,$y) # 45.116 ms (219 allocations: 22.00 KiB)
# Small inner problem, repated many times
y = rand(1000,1000);
x = rand(size(y)...);
@Btime inner($x,$y,1) # 7.028 μs (0 allocations: 0 bytes)
@Btime inner_polyester($x,$y,1) # 1.917 μs (0 allocations: 0 bytes)
@Btime inner_thread($x,$y,1) # 7.544 μs (45 allocations: 4.44 KiB)
@Btime sequential_sequential($x,$y) # 6.790 ms (0 allocations: 0 bytes)
@Btime sequential_polyester($x,$y) # 2.070 ms (0 allocations: 0 bytes)
@Btime sequential_thread($x,$y) # 9.296 ms (49002 allocations: 4.46 MiB)
@Btime threads_of_polyester($x,$y) # 2.090 ms (42 allocations: 4.34 KiB)
@Btime threads_of_polyester_inner_disable($x,$y) # 1.065 ms (42 allocations: 4.34 KiB)
@Btime Polyester.disable_polyester_threads() do; threads_of_polyester($x,$y) end; # 997.918 μs (49 allocations: 4.56 KiB)
@Btime threads_of_sequential($x,$y) # 1.057 ms (48 allocations: 4.53 KiB)
@Btime threads_of_thread($x,$y) # 4.105 ms (42059 allocations: 4.25 MiB)
```
Codecov Report
@@ Coverage Diff @@
## master #86 +/- ##
=======================================
Coverage 88.46% 88.46%
=======================================
Files 2 2
Lines 416 416
=======================================
Hits 368 368
Misses 48 48 Continue to review full report at Codecov.
|
src/utility.jl
Outdated
| t, r = request_threads(num_threads()) | ||
| f() | ||
| foreach(free_threads!, r) | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this should be
t, r = request_threads(num_threads())
try
f()
finally
foreach(free_threads!, r)
endAlso, maybe this should be in PolyesterWeave itself?
It could still be re-exported here.
Should also document that it turns off threading for LoopVectorization.@tturbo and Octavian.matmul, even though neither depend on Polyester (but they do depend on PolyesterWeave).
Do benchmark my suggestion though.
If it is slower, feel free to ignore it.
The advantage of my suggestion is that, if your code throws an error, it'll still free the threads. Otherwise, they'll be turned off until you manually call PolyesterWeave.reset_workers!().
Polyester.@batch doesn't use try/finally either, meaning it requires the manual calls after you get errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do:
- documenting disabling threads in tturbo and matmul
- checking whether a try block is much slower
- move to PolyesterWeave after the above is done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After moving to PolyesterWeave, you can of course import and reexport in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The benchmarks did not show any slowdown with a try block (especially and obviously if the disable command is executed once in the outermost scope).
| @threads for j ∈ axes(x,2) | ||
| Polyester.disable_polyester_threads() do | ||
| inner_polyester(x,y,j) | ||
| end | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| @threads for j ∈ axes(x,2) | |
| Polyester.disable_polyester_threads() do | |
| inner_polyester(x,y,j) | |
| end | |
| end | |
| Polyester.disable_polyester_threads() do | |
| @threads for j ∈ axes(x,2) | |
| inner_polyester(x,y,j) | |
| end | |
| end |
May as well hoist the disabling and re-enabling out of the loop, to avoid doing atomic operations to the same address in parallel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although that is effectively what you're benchmarking with Polyester.disable_polyester_threads() do; threads_of_polyester($x,$y) end;, my concern is that people are just going to copy/paste the examples, which means they'll likely use this slightly worse form, because it is there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted! The two versions (inside and outside the loop) were given exactly so that this can be compared. I will add a note to the readme to warn people about using it outside the loop.
| @btime threads_of_polyester_inner_disable($x,$y) # 70.182 ms (47 allocations: 4.50 KiB) | ||
| @btime Polyester.disable_polyester_threads() do; threads_of_polyester($x,$y) end; # 71.141 ms (47 allocations: 4.50 KiB) | ||
| @btime threads_of_sequential($x,$y) # 70.857 ms (46 allocations: 4.47 KiB) | ||
| @btime threads_of_thread($x,$y) # 45.116 ms (219 allocations: 22.00 KiB) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing this is fastest because you have more than 4 threads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I thought it is a reasonable edge case to show.
|
|
||
| function inner_thread(x,y,j) | ||
| @threads for i ∈ axes(x,1) | ||
| y[i,j] = sin(x[i,j]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obviously just an example, but @tturbo should win these benchmarks (and would also be single threaded from this!). ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed! I will mention it as a caveat in the benchmarks readme.
|
The tests will probably fail due to the dependence of the new PolyesterWeave. See here JuliaSIMD/PolyesterWeave.jl#4 |
|
Great, thanks! |
As mentioned on discourse: https://discourse.julialang.org/t/nesting-threads-thread-and-polyester-batch-or-context-manager-to-limit-polyester-threads/84490/4
Benchmarks with and without it below: