-
-
Notifications
You must be signed in to change notification settings - Fork 37
Open
Labels
multithreadingBase.Threads and related functionalityBase.Threads and related functionality
Description
On an i7-8550u, OpenBLAS is defaulting to 8 threads. I was comparing to RecursiveFactorizations.jl, and saw the performance is like:
using BenchmarkTools
import LinearAlgebra, RecursiveFactorization
ccall((:openblas_get_num_threads64_, Base.libblas_name), Cint, ())
LinearAlgebra.BLAS.set_num_threads(4)
BenchmarkTools.DEFAULT_PARAMETERS.seconds = 0.5
luflop(m, n) = n^3÷3 - n÷3 + m*n^2
luflop(n) = luflop(n, n)
bas_mflops = Float64[]
rec_mflops = Float64[]
ns = 50:50:800
for n in ns
A = rand(n, n)
bt = @belapsed LinearAlgebra.lu!($(copy(A)))
rt = @belapsed RecursiveFactorization.lu!($(copy(A)))
push!(bas_mflops, luflop(n)/bt/1e9)
push!(rec_mflops, luflop(n)/rt/1e9)
end
using Plots
plt = plot(ns, bas_mflops, legend=:bottomright, lab="OpenBLAS", title="LU Factorization Benchmark", marker=:auto)
plot!(plt, ns, rec_mflops, lab="RecursiveFactorization", marker=:auto)
xaxis!(plt, "size (N x N)")
yaxis!(plt, "GFLOPS")1 thread
4 threads
8 threads
Conclusion: the default that Julia chooses, 8 threads, is the worst, with 1 thread doing better. But using the number of physical cores, 4, is best.
So there were a lot of issues on Discourse and Slack #gripes where essentially "setting BLAS threads to 1 is better than the default!", but it looks like it's because the default should be the number of physical and not logical threads. I am actually very surprised it's not set that way, and so I was wondering why, and also where this default is set (I couldn't find it).
chriscoey, YingboMa, miguelraz, AzamatB, JeffBezanson and 17 moregoggle, miguelraz, AzamatB, cossio, vlandau and 3 more
Metadata
Metadata
Assignees
Labels
multithreadingBase.Threads and related functionalityBase.Threads and related functionality


