You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As discussed on Slack, I get really bad or suboptimal performance for a simple reduction (Multithreaded Monte Carlo).
functionestimate_pi_floop_1(attempts)
hits =0@floopfor i in1:attempts
x =rand()
y =rand()
if (x^2+ y^2) <=1@reduce(hits +=1)
endendreturn4.0* (hits / attempts)
endfunctionestimate_pi_floop_2(attempts)
hits =0@floopfor i in1:attempts
x =rand()
y =rand()
if (x^2+ y^2) <=1@reduce(hits =0+1)
endendreturn4.0* (hits / attempts)
endfunctionestimate_pi_threads_partitioned(attempts)
nt = Threads.nthreads()
attempts_per_thread =ceil(Int, attempts ÷ nt)
hits =zeros(Int, nt)
Threads.@threadsfor i in1:nt
h =0for i in1:attempts_per_thread
x =rand()
y =rand()
if (x^2+ y^2) <=1
h +=1endend
hits[Threads.threadid()] = h
endreturn4.0* (sum(hits) / attempts)
end
julia>@btimeestimate_pi_floop_1(500_000_000)
2.664 s (125000108 allocations:1.86 GiB)
julia>@btimeestimate_pi_floop_2(500_000_000)
258.906 ms (64 allocations:3.88 KiB)
julia>@btimeestimate_pi_threads_partitioned(500_000_000)
208.475 ms (42 allocations:4.00 KiB)
So
using @reduce(hits = 0 + 1) over @reduce(hits += 1) makes a huge difference
even when using the former we don't get the performance of estimate_pi_threads_partitioned (which, IIUC, should be similar to what FLoops should produce under the hood).
Thanks for taking a look!
The text was updated successfully, but these errors were encountered:
As discussed on Slack, I get really bad or suboptimal performance for a simple reduction (Multithreaded Monte Carlo).
So
@reduce(hits = 0 + 1)
over@reduce(hits += 1)
makes a huge differenceestimate_pi_threads_partitioned
(which, IIUC, should be similar to what FLoops should produce under the hood).Thanks for taking a look!
The text was updated successfully, but these errors were encountered: