-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
This issue was reported on Nvidia Grace CPU. The reproducer is in Julia, so not entirely minimal, but hopefully it should give you an idea of how to reproduce it with a direct call to the library:
$ OPENBLAS_VERBOSE=2 julia +nightly -q
julia> using Statistics, LinearAlgebra, Random
julia> Random.seed!(42);
julia> A = randn(Float32, 10_000_000);
julia> B = copy(A);
julia> BLAS.set_num_threads(2)
Core: neoversev1
julia> two_threads = BLAS.dot(A .- mean(A), B .- mean(B))
9.991972f6
julia> BLAS.set_num_threads(1)
julia> one_thread = BLAS.dot(A .- mean(A), B .- mean(B))
9.987286f6
julia> two_threads ≈ one_thread
false
julia> expected_result = sum((A .- mean(A)) .* (B .- mean(B)))
9.99456f6
julia> expected_result ≈ one_thread
false
julia> expected_result ≈ two_threads
true
The call to BLAS.dot
reduces to a call to cblas_sdot
, so that should be the offending OpenBLAS kernel. This is using OpenBLAS v0.3.29, haven't tried v0.3.30. For reproducing the issue, the vector needs to have 10 million elements (doesn't reproduce with 1 million or less), the two vectors need to have the same elements (but need not to be the same vector in memory, this is why I used B = copy(A)
), and subtracting their average value is important, without that it doesn't reproduce, so this means there are many small numbers in the array.
This is specific to the neoversev1
type of the kernel, armv8
and neoversen1
don't seem to have this accuracy issue (and no one else reported this issue before, this code is coming from Julia's test suite, so many people have been running it for a long time):
$ OPENBLAS_VERBOSE=2 OPENBLAS_CORETYPE=armv8 julia +nightly -q
julia> using Statistics, LinearAlgebra, Random
julia> Random.seed!(42);
julia> A = randn(Float32, 10_000_000);
julia> B = copy(A);
julia> BLAS.set_num_threads(2)
Core: armv8
julia> two_threads = BLAS.dot(A .- mean(A), B .- mean(B))
9.991984f6
julia> BLAS.set_num_threads(1)
julia> one_thread = BLAS.dot(A .- mean(A), B .- mean(B))
9.991984f6
julia> two_threads ≈ one_thread
true
julia> expected_result = sum((A .- mean(A)) .* (B .- mean(B)))
9.99456f6
julia> expected_result ≈ one_thread
true
julia> expected_result ≈ two_threads
true
julia>
$ OPENBLAS_VERBOSE=2 OPENBLAS_CORETYPE=neoversen1 julia +nightly -q
julia> using Statistics, LinearAlgebra, Random
julia> Random.seed!(42);
julia> A = randn(Float32, 10_000_000);
julia> B = copy(A);
julia> BLAS.set_num_threads(2)
Core: neoversen1
julia> two_threads = BLAS.dot(A .- mean(A), B .- mean(B))
9.994247f6
julia> BLAS.set_num_threads(1)
julia> one_thread = BLAS.dot(A .- mean(A), B .- mean(B))
9.993646f6
julia> two_threads ≈ one_thread
true
julia> expected_result = sum((A .- mean(A)) .* (B .- mean(B)))
9.99456f6
julia> expected_result ≈ one_thread
true
julia> expected_result ≈ two_threads
true
Originally reported at JuliaLang/julia#59664.