Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors: No BLAS/LAPACK library loaded! #427

Closed
andreasvarga opened this issue Nov 9, 2023 · 5 comments
Closed

Errors: No BLAS/LAPACK library loaded! #427

andreasvarga opened this issue Nov 9, 2023 · 5 comments

Comments

@andreasvarga
Copy link

Recently, I tried to prepare a new version of PeriodicSystems by starting with several commits. Running the tests, the following error pops up repeatedly

Error: no BLAS/LAPACK library loaded!

(see here)

The cause of these error messages is apparently the new version of LinearSolve, where MKL is loaded now by default instead OpenBLAS (see the discussion here). I implemented several wrappers to routines from the SLICOT library, using the automatically generated SLICOT_jll. Apparently, some calls to BLAS/LAPACK are not covered there by MKL. I assume that other people using OrdinaryDiffEq may face the same issue.

I would appreciate if you could help me to overcome this situation, by indicating a way to preserve the behaviour before the last update. Many thanks in advance

@akirakyle
Copy link

I also wanted to chime in and say that I started experiencing a different, but related issue updating today which brought LinearSolve to v2.18.0. I started experience serious performance regressions due to #408. In my case I'm running a parallel workload via Distributed and realized that after updating, suddenly every worker was launching a bunch of threads causing my system to be completely oversubscribed. It turns out, by default local distributed workers have OPENBLAS_NUM_THREADS=1 set (JuliaLang/julia@a8b3994) but since MKL_NUM_THREADS is unset, MKL launches its default number of threads for the system on each worker.

I'll open a separate issue about setting MKL_NUM_THREADS=1 on worker threads.

However I'm also often comparing the difference between openblas and mkl for by use cases and hardware and #408 makes it non-obvious how I can control this behavior. Especially since I now see that the first half of my computation uses openblas, then the second half uses mkl since apparently I'm only calling certain functions in QuantumOptics that don't require OrdinaryDiffEq in the first computationally expensive phase.

Personally I think with how many downstream packages are affected by #408 and given that it seems there's not an official, documented way to switch back to openblas (sees JuliaLinearAlgebra/MKL.jl#90), it may be wise to revert it?

@ChrisRackauckas
Copy link
Member

[2999] signal (11.128): Segmentation fault
in expression starting at /home/runner/work/PeriodicSystems.jl/PeriodicSystems.jl/test/test_pschur.jl:12
mkl_lapack_xdlacpy at /home/runner/.julia/artifacts/d670351e2fcdac07d02cf73bda5f53e9bea796a6/lib/libmkl_core.so (unknown line)
mkl_lapack_dlacpy at /home/runner/.julia/artifacts/d670351e2fcdac07d02cf73bda5f53e9bea796a6/lib/libmkl_gnu_thread.so.2 (unknown line)
dlacpy_64 at /home/runner/.julia/artifacts/d670351e2fcdac07d02cf73bda5f53e9bea796a6/lib/libmkl_intel_lp64.so.2 (unknown line)
dlacpy_64 at /home/runner/.julia/artifacts/d670351e2fcdac07d02cf73bda5f53e9bea796a6/lib/libmkl_rt.so (unknown line)
mb03vw_ at /home/runner/.julia/artifacts/12f2530ed65eb689637e71283cbe1eeb31e2bddb/lib/libslicot.so (unknown line)
mb03vw! at /home/runner/work/PeriodicSystems.jl/PeriodicSystems.jl/src/SLICOTtools.jl:152
#phess!#887 at /home/runner/work/PeriodicSystems.jl/PeriodicSystems.jl/src/psfutils.jl:76
phess! at /home/runner/work/PeriodicSystems.jl/PeriodicSystems.jl/src/psfutils.jl:52 [inlined]
#phess#886 at /home/runner/work/PeriodicSystems.jl/PeriodicSystems.jl/src/psfutils.jl:41 [inlined]
phess at /home/runner/work/PeriodicSystems.jl/PeriodicSystems.jl/src/psfutils.jl:40
unknown function (ip: 0x7fc34afee006)
_jl_invoke at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2940
macro expansion at ./timing.jl:273 [inlined]
macro expansion at /home/runner/work/PeriodicSystems.jl/PeriodicSystems.jl/test/test_pschur.jl:80 [inlined]
macro expansion at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/usr/share/julia/stdlib/v1.9/Test/src/Test.jl:1498 [inlined]
top-level scope at /home/runner/work/PeriodicSystems.jl/PeriodicSystems.jl/test/test_pschur.jl:17
jl_toplevel_eval_flex at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/toplevel.c:903
jl_eval_module_expr at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/toplevel.c:203 [inlined]
jl_toplevel_eval_flex at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/toplevel.c:715
jl_toplevel_eval_flex at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/toplevel.c:856
ijl_toplevel_eval_in at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/toplevel.c:971
eval at ./boot.jl:370 [inlined]
include_string at ./loading.jl:1903
_jl_invoke at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2940
_include at ./loading.jl:1963
include at ./Base.jl:457
jfptr_include_35036.clone_1 at /opt/hostedtoolcache/julia/1.9.3/x64/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2940
jl_apply at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/julia.h:1880 [inlined]
jl_f__call_latest at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/builtins.c:774
include at /home/runner/work/PeriodicSystems.jl/PeriodicSystems.jl/test/runtests.jl:1
unknown function (ip: 0x7fc34af0c932)
_jl_invoke at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2940
jl_apply at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/julia.h:1880 [inlined]
do_call at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/interpreter.c:126
eval_value at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/interpreter.c:226
eval_stmt_value at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/interpreter.c:177 [inlined]
eval_body at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/interpreter.c:624
eval_body at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/interpreter.c:533
eval_body at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/interpreter.c:533
jl_interpret_toplevel_thunk at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/interpreter.c:762
jl_toplevel_eval_flex at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/toplevel.c:912
jl_eval_module_expr at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/toplevel.c:203 [inlined]
jl_toplevel_eval_flex at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/toplevel.c:715
jl_toplevel_eval_flex at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/toplevel.c:856
ijl_toplevel_eval_in at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/toplevel.c:971
eval at ./boot.jl:370 [inlined]
include_string at ./loading.jl:1903
_jl_invoke at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2940
_include at ./loading.jl:1963
include at ./client.jl:478
unknown function (ip: 0x7fc34af001a2)
_jl_invoke at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2940
jl_apply at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/julia.h:1880 [inlined]
do_call at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/interpreter.c:126
eval_value at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/interpreter.c:226
eval_stmt_value at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/interpreter.c:177 [inlined]
eval_body at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/interpreter.c:624
jl_interpret_toplevel_thunk at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/interpreter.c:762
jl_toplevel_eval_flex at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/toplevel.c:912
jl_toplevel_eval_flex at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/toplevel.c:856
ijl_toplevel_eval_in at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/toplevel.c:971
eval at ./boot.jl:370 [inlined]
exec_options at ./client.jl:280
_start at ./client.jl:522
jfptr__start_40034.clone_1 at /opt/hostedtoolcache/julia/1.9.3/x64/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2940
jl_apply at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/julia.h:1880 [inlined]
true_main at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/jlapi.c:573
jl_repl_entrypoint at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/jlapi.c:717
main at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/cli/loader_exe.c:59
unknown function (ip: 0x7fc351229d8f)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x4010b8)
Allocations: 31787268 (Pool: 31780712; Big: [655](https://github.com/andreasvarga/PeriodicSystems.jl/actions/runs/6798211169/job/18481922357#step:6:658)6); GC: 51
ERROR: LoadError: Package PeriodicSystems errored during testing (received signal: 11)
Stacktrace:
 [1] pkgerror(msg::String)
   @ Pkg.Types /opt/hostedtoolcache/julia/1.9.3/x64/share/julia/stdlib/v1.9/Pkg/src/Types.jl:69
 [2] test(ctx::Pkg.Types.Context, pkgs::Vector{Pkg.Types.PackageSpec}; coverage::Bool, julia_args::Cmd, test_args::Cmd, test_fn::Nothing, force_latest_compatible_version::Bool, allow_earlier_backwards_compatible_versions::Bool, allow_reresolve::Bool)
   @ Pkg.Operations /opt/hostedtoolcache/julia/1.9.3/x64/share/julia/stdlib/v1.9/Pkg/src/Operations.jl:2021
 [3] test
   @ /opt/hostedtoolcache/julia/1.9.3/x64/share/julia/stdlib/v1.9/Pkg/src/Operations.jl:1902 [inlined]
 [4] test(ctx::Pkg.Types.Context, pkgs::Vector{Pkg.Types.PackageSpec}; coverage::Bool, test_fn::Nothing, julia_args::Vector{String}, test_args::Cmd, force_latest_compatible_version::Bool, allow_earlier_backwards_compatible_versions::Bool, allow_reresolve::Bool, kwargs::Base.Pairs{Symbol, IOContext{Base.PipeEndpoint}, Tuple{Symbol}, NamedTuple{(:io,), Tuple{IOContext{Base.PipeEndpoint}}}})
   @ Pkg.API /opt/hostedtoolcache/julia/1.9.3/x64/share/julia/stdlib/v1.9/Pkg/src/API.jl:441
 [5] test(pkgs::Vector{Pkg.Types.PackageSpec}; io::IOContext{Base.PipeEndpoint}, kwargs::Base.Pairs{Symbol, Any, Tuple{Symbol, Symbol, Symbol}, NamedTuple{(:coverage, :julia_args, :force_latest_compatible_version), Tuple{Bool, Vector{String}, Bool}}})
   @ Pkg.API /opt/hostedtoolcache/julia/1.9.3/x64/share/julia/stdlib/v1.9/Pkg/src/API.jl:156
 [6] test(; name::Nothing, uuid::Nothing, version::Nothing, url::Nothing, rev::Nothing, path::Nothing, mode::Pkg.Types.PackageMode, subdir::Nothing, kwargs::Base.Pairs{Symbol, Any, Tuple{Symbol, Symbol, Symbol}, NamedTuple{(:coverage, :julia_args, :force_latest_compatible_version), Tuple{Bool, Vector{String}, Bool}}})
   @ Pkg.API /opt/hostedtoolcache/julia/1.9.3/x64/share/julia/stdlib/v1.9/Pkg/src/API.jl:171
 [7] top-level scope
   @ ~/work/_actions/julia-actions/julia-runtest/v1/test_harness.jl:15
 [8] include(fname::String)
   @ Base.MainInclude ./client.jl:478
 [9] top-level scope
   @ none:1

@ChrisRackauckas
Copy link
Member

Let me start by saying this will be reverted, but for the right reasons, not the wrong reasons. The right reason to revert it is the head case of this issue, which is that MKL.jl is subtly breaking, even though that is not supposed to be intended by MKL.jl and it's not documented. This addressed in JuliaLinearAlgebra/MKL.jl#138 and JuliaLinearAlgebra/MKL.jl#139 and until that is fixed we cannot use MKL.jl.

It's unclear to me whether the MKL thread setting is a "breaking change" to LinearSolve.jl, but whether or not it's not factoring into this decision and Distributed.jl should fix its system to be BLAS-independent.

That said, we will need to do something about SuiteSparse. While Base Julia builds SuiteSparse with LBT and thus links with OpenBLAS, this is not recommended by the author and notably leads to a 100x slowdown (DrTimothyAldenDavis/SuiteSparse#1). Given the way the SuiteSparse binaries are built, the only way to make it use MKL is to do the global LBT trigger, so that's something that's effectively blocking us here and putting us in a bind JuliaSparse/SparseArrays.jl#453 . As discussed in that issue, we will go with throwing a warning if people do sparse matrix factorizations with OpenBLAS, and hopefully down the line build specific SuiteSparseMKL binaries.

ChrisRackauckas added a commit that referenced this issue Nov 10, 2023
See the discussion in #427
@ChrisRackauckas
Copy link
Member

For future measure, the data is that there were about 50 happy responses, and 4 unhappy responses, 2 here and 2 not here. Many of the happy responses were in the 2x-10x range, a few in the 100x range when manually choosing UMFPACK as the factorization.

The 2 unhappy responses here were due to things subtly breaking with MKL which were previously not known. The 2 unhappy responses which were not reported here were both that it was tracked down that MKL is slower on AMD Epyc. What was shown to be slower was the matrix multiplications on AMD Epyc with MKL, while LU factorization was only in the 20%-25% slowdown range. Therefore it's very clear that for LU factorizations then, MKL is a major net win. These benchmarks of other folks showed that MKL is at least as fast (if not faster) on all other tested platforms (including other AMD chips), and using MKL is effectively just ignored on Mac M-series chips, so in the end we have enough data to say "MKL is about 2x-100x faster for factorizations on all CPUs except AMD Epyc where it's about 1.5x slower".

I want to highlight that because it seems that those who have negative responses are more inclined to share their griefs more publicly, so I want to make it clear that the data shows that using MKL helped almost everybody, and thus we need to figure out how to make better use of MKL without doing using MKL LBT global behavior since indeed that will lead to better results in almost every scenario.

This shows that the bound on the place where KLU is faster, i.e. at around the 10,000 x 10,000 mark and used in the defaults here https://github.com/SciML/LinearSolve.jl/blob/main/src/default.jl#L85, is actually just an artifact of OpenBLAS. If KLU is used, UMFPACK is generally much faster, and so around the 500x500 sparse matrix case it can already make sense to use UMPACK over KLU. Thus we should refine that default to take into account whether someone is using OpenBLAS, where the result is effectively "if using MKL, then use a fast algorithm, otherwise use the slow algorithm because it's at least not as slow as calling OpenBLAS" 😅.

In the end, I'm glad I gave this a try. If no one ever tries to move away from OpenBLAS, then we'll be 100x slower in sparse matrix operations than we should be indefinitely. However, it's clear we need to do this another way for it to be safer, and it's also clear what the benefits will be.

@akirakyle
Copy link

@ChrisRackauckas thank you for taking the time to explain the situation. I didn't know the context for why these changes were taking place when I suggested reverting them as I didn't come across JuliaSparse/SparseArrays.jl#453 when looking through the relevant commits and issues here. It sounds like a solid reason to default to MKL, but hopefully Distributed can first be fixed to have consistent threading behavior in order to minimize surprises like what I encountered. It's not difficult to work around at all, by either setting the env variable or via @everywhere BLAS.set_num_threads(1) but it was a bit frustrating for me as it took a bit to figure out exactly what was going on.

Thanks for continuing to push julia to be faster!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants