Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OpenBLAS] update multithreading cutoff #7189

Conversation

oscardssmith
Copy link
Contributor

400 is a much better cutoff than 50 more most modern machines. Note that 100 is way too small even for modern 4 core machines (I think the 50 limit was found pre AVX2 (and possibly pre fma)). 400 is probably a bit larger than optimal on small machines but only gives up ~13% performance single core compared to 8 core (and on laptops it will probably be better because the single core can turbo higher). It also mitigates the horrible performance cliff of using 16 or more threads on medium sized matrices (between roughly 400 and 1600). Of course the better answer would be to make it so BLAS's threading is integrated with julia's (and we use an appropriate number of threads based on the matrix size), but for now this is a pretty noticeable improvement.

julia> BLAS.set_num_threads(32)

julia> peakflops(400)
1.1644410982935661e10

julia> BLAS.set_num_threads(16)

julia> peakflops(400)
1.5580026746524042e10

julia> BLAS.set_num_threads(8)

julia> peakflops(400)
2.210268354206555e10

julia> BLAS.set_num_threads(4)

julia> peakflops(400)
1.937951340161483e10

julia> BLAS.set_num_threads(1)

julia> peakflops(400)
1.740427478902416e10

julia> BLAS.set_num_threads(32)

julia> peakflops(100)
1.9949726688744364e9

julia> BLAS.set_num_threads(16)

julia> peakflops(100)
2.9579541605843735e9

julia> BLAS.set_num_threads(8)

julia> peakflops(100)
4.373630506947512e9

julia> BLAS.set_num_threads(4)

julia> peakflops(100)
3.924300248211991e9

julia> BLAS.set_num_threads(1)

julia> peakflops(100)
1.0693014253788e10

400 is a much better cutoff than 50 more most modern machines. Note that 100 is way too small even for modern 4 core machines (I think the 50 limit was found pre AVX2 (and possibly pre fma)). 400 is probably a bit larger than optimal on small machines but only gives up ~13% performance single core compared to 8 core (and on laptops it will probably be better because the single core can turbo higher). It also mitigates the horrible performance cliff of using 16 or more threads on medium sized matrices (between roughly 400 and 1600). Of course the better answer would be to make it so BLAS's threading is integrated with julia's (and we use an appropriate number of threads based on the matrix size), but for now this is a pretty noticeable improvement.
```
julia> BLAS.set_num_threads(32)

julia> peakflops(400)
1.1644410982935661e10

julia> BLAS.set_num_threads(16)

julia> peakflops(400)
1.5580026746524042e10

julia> BLAS.set_num_threads(8)

julia> peakflops(400)
2.210268354206555e10

julia> BLAS.set_num_threads(4)

julia> peakflops(400)
1.937951340161483e10

julia> BLAS.set_num_threads(1)

julia> peakflops(400)
1.740427478902416e10

julia> BLAS.set_num_threads(32)

julia> peakflops(100)
1.9949726688744364e9

julia> BLAS.set_num_threads(16)

julia> peakflops(100)
2.9579541605843735e9

julia> BLAS.set_num_threads(8)

julia> peakflops(100)
4.373630506947512e9

julia> BLAS.set_num_threads(4)

julia> peakflops(100)
3.924300248211991e9

julia> BLAS.set_num_threads(1)

julia> peakflops(100)
1.0693014253788e10
@ViralBShah ViralBShah changed the title update multithreading cutoff [OpenBLAS] update multithreading cutoff Aug 8, 2023
@ViralBShah ViralBShah merged commit b02a6e7 into JuliaPackaging:master Aug 8, 2023
25 checks passed
@oscardssmith oscardssmith deleted the oscardssmith-change-threading-cuttoff branch August 8, 2023 19:54
ViralBShah pushed a commit to JuliaLang/julia that referenced this pull request Aug 8, 2023
ViralBShah added a commit to JuliaLang/julia that referenced this pull request Aug 9, 2023
KristofferC pushed a commit to JuliaLang/julia that referenced this pull request Aug 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants