Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception access violation / openblas / Julia 1.6.1 #40963

Closed
mttvtt opened this issue May 26, 2021 · 25 comments
Closed

Exception access violation / openblas / Julia 1.6.1 #40963

mttvtt opened this issue May 26, 2021 · 25 comments
Labels
domain:linear algebra Linear algebra kind:bug Indicates an unexpected problem or unintended behavior system:windows Affects only Windows

Comments

@mttvtt
Copy link

mttvtt commented May 26, 2021

Hi all,

I've just started programming in Julia to do numerical simulations, and while testing a function using included I got this message. does anyone have a clue about it? I've tried to search on the web, but since I'm a noob with Julia I didn't really get the point of possible solutions, and this seems very win10 related, so I hope that someone can give me explanations about this. thanks a lot in advance!

Exception: EXCEPTION_ACCESS_VIOLATION at 0x1fd608e0 -- at 0x1fd608e0 -- OLATION with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x1fd608e0 -- at 0x1fd608e0 -- OLATION with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x1fd608e0 -- at 0x1fd608e0 -- OLATION\AppData\Local\Programs\Julia-1.6.1\bin\libopenblas64_.DLL (unknown line)
their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x1fd608e0 -- at 0x1fd608e0 -- OLATION\AppData\Local\Programs\Julia-1.6.1\bin\libopenblas64_.DLL (unknown line)
in expression starting at REPL[10]:1
in expression starting at REPL[10]:1
cal\Programs\Julia-1.6.1\bin\libopenblas64_.DLL (unknown line)
their entirety). Thanks.
Exception:

@KristofferC
Copy link
Sponsor Member

In order to look more into this, it is almost required for us to have a way of reproducing the problem. Could you post the code that you use that causes the error?

@ViralBShah
Copy link
Member

Also which processor, OS, and is it on cloud, etc.

@ViralBShah ViralBShah added the domain:linear algebra Linear algebra label May 28, 2021
@PetrKryslUCSD
Copy link

PetrKryslUCSD commented Jun 19, 2021

Same thing here. Unfortunately, the code is involved. I will try to derive a reduced code example.
Win 10. Serial code.

@PetrKryslUCSD
Copy link

My guess is that this happens in the thread management code: notice the four times repeated, jumbled together, printout.

@haampie
Copy link
Contributor

haampie commented Jun 22, 2021

Reported by @RaghuSivapuram in JuliaLinearAlgebra/IterativeSolvers.jl#301:

using LinearAlgebra, IterativeSolvers, SparseArrays

A = sprand(19400, 19400, 0.1) + 1.0im * sprand(19400, 19400, 0.1)
F = rand(ComplexF64, 19400)
x = zeros(ComplexF64, 19400)

minres!(x, A, F; maxiter=20) # blows up.

@RaghuSivapuram
Copy link

using LinearAlgebra, IterativeSolvers, SparseArrays

A = sprand(19400, 19400, 0.1) + 1.0im * sprand(19400, 19400, 0.1)
F = rand(ComplexF64, 19400)
x = zeros(ComplexF64, 19400)

minres!(x, A, F; maxiter=20) # blows up.

On Windows, Julia 1.6.1, 16GB RAM, Intel Core i7-7500U CPU

@ViralBShah
Copy link
Member

FWIW, I can't reproduce @RaghuSivapuram 's issue on my Mac (but I don't have a core i7).

@PetrKryslUCSD
Copy link

I can reproduce on Win 10 with i-7. On the same machine with WSL2 this code runs fine.

@ViralBShah ViralBShah added system:windows Affects only Windows kind:bug Indicates an unexpected problem or unintended behavior labels Jun 22, 2021
@aviks
Copy link
Member

aviks commented Jun 22, 2021

I can reproduce this on Win10 with Julia 1.6.1

Julia Version 1.6.1
Commit 6aaedecc44 (2021-04-23 05:59 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake)

@inkydragon
Copy link
Sponsor Member

inkydragon commented Sep 25, 2021

I got it!

Simplified test code:

arrSzie = 10000  # No error
arrSzie = 10001  # Has error
arrType = ComplexF64

using LinearAlgebra: dot
using LinearAlgebra.BLAS: dotc

x = zeros(arrType, arrSzie);
y = zeros(arrType, arrSzie);

# dot(x, y)  # LinearAlgebra\src\matmul.jl:10
dotc(length(x), x, stride(x, 1), y, stride(y, 1))  # LinearAlgebra\src\blas.jl:451

Then dotc ccall ...

# LinearAlgebra\src\blas.jl:409
ccall((@blasfunc(:cblas_zdotc_sub), libblas), Cvoid,

https://github.com/JuliaLang/julia/blob/v1.6.3/stdlib/LinearAlgebra/src/blas.jl#L400

Some interesting findings:

  • The smallest array size that can reproduce the error is 10001, 10000 works fine.
  • The type of array must be ComplexF64, ComplexF32, Float64 works fine.
  • Sparse arrays have nothing to do with this error.
  • Errors can be reproduced on native windows, not in WSL, not in macOS.
  • v1.7.0-rc1 ccall libblastrampoline instead of libblas.

Using WinDBG, I obtained the call stack when the error occurred, found that it was the function libopenblas64_!cblas_zdotc_sub64_ that cause the problem, and then using Debugger.jl to simplify the code.

Crash msg & WinDBG stacktrace

WinDBG

(9044.877c): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
libopenblas64_!DCABS164_+0xc4fca0:
00000000`151f08e0 f20f105af8      movsd   xmm3,mmword ptr [rdx-8] ds:00000000`00000000=????????????????


# ---- Exception context
0:000> .ecxr
Unable to get exception context, HRESULT 0x8000FFFF


# ---- Exception Analysis
0:000> !analyze -v
*******************************************************************************
*                                                                             *
*                        Exception Analysis                                   *
*                                                                             *
*******************************************************************************


KEY_VALUES_STRING: 1

    Key  : AV.Dereference
    Value: NullPtr

    Key  : AV.Fault
    Value: Read

    Key  : Analysis.CPU.mSec
    Value: 2093

    Key  : Analysis.DebugAnalysisManager
    Value: Create

    Key  : Analysis.Elapsed.mSec
    Value: 33742

    Key  : Analysis.Init.CPU.mSec
    Value: 859

    Key  : Analysis.Init.Elapsed.mSec
    Value: 416905

    Key  : Analysis.Memory.CommitPeak.Mb
    Value: 77

    Key  : Timeline.OS.Boot.DeltaSec
    Value: 838112

    Key  : Timeline.Process.Start.DeltaSec
    Value: 416

    Key  : WER.OS.Branch
    Value: vb_release

    Key  : WER.OS.Timestamp
    Value: 2019-12-06T14:06:00Z

    Key  : WER.OS.Version
    Value: 10.0.19041.1

    Key  : WER.Process.Version
    Value: 1.6.3.0


NTGLOBALFLAG:  70

PROCESS_BAM_CURRENT_THROTTLED: 0

PROCESS_BAM_PREVIOUS_THROTTLED: 0

APPLICATION_VERIFIER_FLAGS:  0

EXCEPTION_RECORD:  (.exr -1)
ExceptionAddress: 00000000151f08e0 (libopenblas64_!DCABS164_+0x0000000000c4fca0)
   ExceptionCode: c0000005 (Access violation)
  ExceptionFlags: 00000000
NumberParameters: 2
   Parameter[0]: 0000000000000000
   Parameter[1]: 0000000000000000
Attempt to read from address 0000000000000000

FAULTING_THREAD:  0000877c

PROCESS_NAME:  julia.exe

READ_ADDRESS:  0000000000000000 

ERROR_CODE: (NTSTATUS) 0xc0000005 - 0x%p            0x%p                    %s

EXCEPTION_CODE_STR:  c0000005

EXCEPTION_PARAMETER1:  0000000000000000

EXCEPTION_PARAMETER2:  0000000000000000

STACK_TEXT:  
00000000`00c3a9a0 00000000`151f09df     : 00000000`00d302e8 00007ffe`eb18e249 00000000`00d30200 00000000`00000010 : libopenblas64_!DCABS164_+0xc4fca0
00000000`00c3aa50 00000000`145733ab     : 00000000`00000080 00007ffe`eb196d0f 00000000`00d30000 00000000`50000063 : libopenblas64_!DCABS164_+0xc4fd9f
00000000`00c3aa90 00000000`1457391b     : 00000000`00000070 00000010`00000000 00000001`00000000 00000000`2e042a00 : libopenblas64_!openblas_get_parallel_64_+0x2db
00000000`00c3ab40 00000000`14573f62     : 00000000`00000000 00000000`2e042a00 00000000`00000007 00000000`06040002 : libopenblas64_!openblas_get_parallel_64_+0x84b
00000000`00c3ab90 00000000`151f0abc     : 00000000`40000060 00000000`00000000 00000000`00000000 00000000`00000000 : libopenblas64_!openblas_set_num_threads64_+0x3e2
00000000`00c3cb20 00000000`1433acbe     : 00000000`00000000 00000000`0275fa92 00000000`00000000 00000000`00000000 : libopenblas64_!DCABS164_+0xc4fe7c
00000000`00c3cdd0 00000000`45ef9900     : 00000000`0e5e4550 00000000`00000001 00000000`00c3d090 00000000`45ef951b : libopenblas64_!cblas_zdotc_sub64_+0x4e
00000000`00c3ce20 00000000`45efb92c     : 00000000`a05b7130 00000000`a05bfbc0 00000000`a05bfc00 00000000`a05bfc60 : 0x45ef9900
00000000`00c3cf00 00000000`45efd0bd     : 00000010`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x45efb92c
00000000`00c3d0a0 00000000`45efd84c     : 0000dff7`133764e0 00000000`0ed23670 00000000`00c3d2e0 00000000`0275fa92 : 0x45efd0bd
00000000`00c3d240 00000000`45efd8c6     : 00007ffe`e9977d60 00000000`00000102 00000000`0ad7a110 00000000`0ec6fa80 : 0x45efd84c
00000000`00c3d2a0 00000000`0273229d     : 00000000`00c3d340 00000000`00c3d358 00000000`00c3d3c0 00000000`0273229d : 0x45efd8c6
00000000`00c3d300 00000000`02731e95     : 00000000`0280ef1f 00000000`00000000 00000000`00000020 00000000`00000004 : libjulia_internal!jl_clear_implicit_imports+0xa0d
00000000`00c3d3b0 00000000`027327e9     : 00000000`0eb23190 00000000`00c3d640 00000000`00c3d500 00000000`00000005 : libjulia_internal!jl_clear_implicit_imports+0x605
00000000`00c3d420 00000000`02733426     : 00000000`0eb23190 00000000`00c3d640 00000000`6b53d45c 00000000`00000001 : libjulia_internal!jl_clear_implicit_imports+0xf59
00000000`00c3d620 00000000`0274fa28     : 00000000`0e7d68f0 00000000`027386d5 00000000`00c3d720 00000000`00000001 : libjulia_internal!jl_interpret_toplevel_thunk+0xf6
00000000`00c3d710 00000000`02750562     : 00000000`00000000 00000000`00000000 00000000`00000001 00000000`0e8db670 : libjulia_internal!jl_toplevel_eval_flex+0x2c8
00000000`00c3d840 00000000`027514b0     : 00000000`0ad7a110 00000000`00000187 00000000`00c3d9b0 00000000`00c3da28 : libjulia_internal!jl_toplevel_eval_flex+0xe02
00000000`00c3d970 00000000`6b4b5198     : 00000000`6b6b84f0 00000000`0e7d67d0 00000000`0001ffff 00000000`00000000 : libjulia_internal!jl_toplevel_eval_in+0xb0
00000000`00c3db00 00000000`6b4b5736     : 00000000`0e4f3850 00000000`00000002 00000000`6d492250 00000000`00000000 : sys!jl_sysimg_fvars_base+0x1033df8
00000000`00c3de20 00000000`6b0baaa2     : 00000000`00000004 00000000`00c3e080 00000000`0e793cf0 00000000`00c3e018 : sys!jl_sysimg_fvars_base+0x1034396
00000000`00c3df80 00000000`6b0d8653     : 00000000`6d2250e0 00000000`00c3e020 00000000`00000002 00000000`00000000 : sys!jl_sysimg_fvars_base+0xc39702
00000000`00c3dfe0 00000000`6b0d8a4f     : 00000000`00c3e1d0 00000000`6b7cf8e0 00000000`0e5dc220 00000000`0ad7a110 : sys!jl_sysimg_fvars_base+0xc572b3
00000000`00c3e160 00000000`6aeb390b     : 00000000`6c612bf0 00000000`00000000 00000000`6c612e70 00000000`00000187 : sys!jl_sysimg_fvars_base+0xc576af
00000000`00c3e1a0 00000000`6aeb39b2     : 00000000`000073d8 00000000`00000001 00000000`00000001 00000000`00000000 : sys!jl_sysimg_fvars_base+0xa3256b
00000000`00c3e2c0 00000000`02726187     : 00000000`00000080 00000000`00c3e360 00000000`00000000 00000000`00000001 : sys!jl_sysimg_fvars_base+0xa32612
00000000`00c3e2f0 00000000`6b3555ba     : 00000000`643d6276 00010000`000000d0 00000000`0e5e2850 00000000`6b6bb930 : libjulia_internal!jl_f__call_latest+0x47
00000000`00c3e340 00000000`6b36003a     : 00000000`00c3e7e0 00000000`6b360017 00000000`00000011 00000000`15040011 : sys!jl_sysimg_fvars_base+0xed421a
00000000`00c3e730 00000000`6ae8a280     : 00000000`00000000 00000000`00000000 00000000`00da9e90 00000000`00000001 : sys!jl_sysimg_fvars_base+0xedec9a
00000000`00c3f820 00000000`6ae8a40f     : 00000000`00000000 00000000`00da4c80 00000000`00000001 00000000`00000104 : sys!jl_sysimg_fvars_base+0xa08ee0
00000000`00c3fad0 00000000`02773676     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : sys!jl_sysimg_fvars_base+0xa0906f
00000000`00c3fb00 00000000`02774096     : 00000000`00000001 00000000`00c3fe88 00007ffe`eb19a9a0 00000000`68f82237 : libjulia_internal!jl_call2+0x376
00000000`00c3fe20 00000000`00401a64     : 00000000`00000000 00000000`00da4c88 00000000`00000000 00007ffe`eb190000 : libjulia_internal!repl_entrypoint+0x96
00000000`00c3fe80 00007ffe`e9797034     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : julia+0x1a64
00000000`00c3ff30 00007ffe`eb1c2651     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : KERNEL32!BaseThreadInitThunk+0x14
00000000`00c3ff60 00000000`00000000     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x21


SYMBOL_NAME:  libopenblas64_!DCABS164_+c4fca0

MODULE_NAME: libopenblas64_

IMAGE_NAME:  libopenblas64_.DLL

STACK_COMMAND:  dt ntdll!LdrpLastDllInitializer BaseDllName ; dt ntdll!LdrpFailureData ; ~0s ; .cxr ; kb

FAILURE_BUCKET_ID:  NULL_POINTER_READ_c0000005_libopenblas64_.DLL!DCABS164_

OS_VERSION:  10.0.19041.1

BUILDLAB_STR:  vb_release

OSPLATFORM_TYPE:  x64

OSNAME:  Windows 10

FAILURE_ID_HASH:  {be6f32a7-c6ce-d1ab-8dd2-44f2631c1d51}

Followup:     MachineOwner
---------

I can reproduce this on Win10 with Julia 1.6.3

\>ver
Microsoft Windows [Version 10.0.19043.1237]

julia> versioninfo()
Julia Version 1.6.3
Commit ae8452a9e0 (2021-09-23 17:34 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake)

I cannot reproduce this on Win10 with Julia 1.7.0-rc1 or WSL or macOS.

Click to see Julia `versioninfo()`

Win10 with Julia 1.7.0-rc1

julia> versioninfo()
Julia Version 1.7.0-rc1
Commit 9eade6195e (2021-09-12 06:45 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, skylake)

WSL

~$ uname -a
Linux wo 4.4.0-19041-Microsoft #1237-Microsoft Sat Sep 11 14:32:00 PST 2021 x86_64 x86_64 x86_64 GNU/Linux

julia> versioninfo()
Julia Version 1.6.3
Commit ae8452a9e0 (2021-09-23 17:34 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake)

macOS

~ uname -a
Darwin wo 20.6.0 Darwin Kernel Version 20.6.0: Wed Jun 23 00:26:31 PDT 2021; root:xnu-7195.141.2~5/RELEASE_X86_64 x86_64

Built by Homebrew (v1.6.2_3)

julia> versioninfo()
Julia Version 1.6.2
Commit 1b93d53fc4* (2021-07-14 15:36 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin20.4.0)
  CPU: Intel(R) Core(TM) i5-8259U CPU @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, skylake)

@PetrKryslUCSD
Copy link

I got it!

Simplified test code:

Wow, fantastic detective work, @inkydragon !

@ViralBShah
Copy link
Member

ViralBShah commented Sep 25, 2021

1.6 ships with openblas 0.3.10, whereas 1.7 ships with 0.3.13. A quick check may be to copy the openblas dll from julia 1.7 into julia 1.6 and see if the problem goes away.

@inkydragon
Copy link
Sponsor Member

Julia + OpenBLAS test

copy the openblas dll from julia 1.7 into julia 1.6 and see if the problem goes away.

Use julia's libopenblas64_.dll

Julia Version 1.6.3 Commit ae8452a9e0 (2021-09-23 17:34 UTC)

julia> using LinearAlgebra; LinearAlgebra.BLAS.openblas_get_config()
        "OpenBLAS 0.3.10  USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell MAX_THREADS=32"
Size:   33457636 Bytes (31 MiB)
MD5:    24bfb4665fedc77cc31b3bd5e0ca861b
SHA256: 148bd70ff543ccb9b90afb53aa89e1d3548acf1b628535d8b39690cf7efe6469


Julia Version 1.7.0-rc1 Commit 9eade6195e (2021-09-12 06:45 UTC)

julia> using LinearAlgebra.BLAS: @blasfunc
julia> strip(unsafe_string(ccall((@blasfunc(openblas_get_config), Base.libblas_name), Ptr{UInt8}, () )))
        "OpenBLAS 0.3.13  USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell MAX_THREADS=32"
Size:   34975772 Bytes (33 MiB)
MD5:    36c52ed8728076d5e367884a3b345bb9
SHA256: e15785330b0a2c2d80494df474767b147e4798d38d28a38a8670532cef879a00

Use "Simplified test code":

  • 1.6.3 Unmodified: crash
  • 1.7.0 Unmodified: normal
  • 1.6.3 + OpenBLAS 0.3.13: normal
  • 1.7.0 + OpenBLAS 0.3.10: Silent crash. NO output.

Use xianyi's OpenBLAS release (libopenblas.dll)

ccall DLL Test code

arrSzie = 10001  # Has error
arrType = ComplexF64
blasPath = raw"D:\OpenBLAS-0.3.13-x64\bin\libopenblas.dll"
blasPath = raw"D:\OpenBLAS-0.3.12-x64\bin\libopenblas.dll"
blasPath = raw"D:\OpenBLAS-0.3.10-x64\bin\libopenblas.dll"

using LinearAlgebra.BLAS: BlasInt
using Libdl: dlopen

x = zeros(arrType, arrSzie);
y = zeros(arrType, arrSzie);

result = Ref{ComplexF64}()
dlopen(blasPath)
ccall((:cblas_zdotc_sub, :libopenblas), Cvoid,
    (BlasInt, Ptr{ComplexF64}, BlasInt, Ptr{ComplexF64}, BlasInt, Ptr{ComplexF64}),
    length(x), x, stride(x, 1), y, stride(x, 1), result)
result[]
Julia 1.6.3 Julia 1.7.0-rc1
OpenBLAS-0.3.10-x64 crash crash
OpenBLAS-0.3.12-x64
OpenBLAS-0.3.13-x64

Note:

  • OpenBLAS 0.3.11 is broken, skip.
  • You may restart julia before test a new version of OpenBLAS DLL.
Click to see Crash msg

Julia 1.6.3 + OpenBLAS-0.3.10-x64

julia> ccall((:cblas_zdotc_sub, :libopenblas), Cvoid,
           (BlasInt, Ptr{ComplexF64}, BlasInt, Ptr{ComplexF64}, BlasInt, Ptr{ComplexF64}),
           length(x), x, stride(x, 1), y, stride(x, 1), result)

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x47a321a0 --  at 0x47a321a0 -- OLATION with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x47a321a0 --  at 0x47a321a0 -- OLATION with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x47a321a0 --  at 0x47a321a0 -- OLATION with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x47a321a0 --  at 0x47a321a0 -- OLATION at 0x47a321a0 --  at 0x47a321a0 -- OLATION with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x47a321a0 -- DCABS1 at D:\proj\Julia\julia@github\40963-openblas\OpenBLAS-0.3.10-x64\bin\libopenblas.dll (unknown line)
DCABS1 at D:\proj\Julia\julia@github\DCABS1 at D:\proj\Julia\julia@github\40963-openblas\OpenBLAS-0.3.10-x64\bin\libopenblas.dll (unknown line)

Julia 1.7.0-rc1 + OpenBLAS-0.3.10-x64

julia> ccall((:cblas_zdotc_sub, :libopenblas), Cvoid,
           (BlasInt, Ptr{ComplexF64}, BlasInt, Ptr{ComplexF64}, BlasInt, Ptr{ComplexF64}),
           length(x), x, stride(x, 1), y, stride(x, 1), result)

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception:

@ViralBShah
Copy link
Member

ViralBShah commented Sep 26, 2021

Perhaps we need this: "add emms to reset fpu registers before assembler routines by mattip · Pull Request #2881 · xianyi/OpenBLAS · GitHub" OpenMathLib/OpenBLAS#2881

@inkydragon
Copy link
Sponsor Member

inkydragon commented Sep 26, 2021

Perhaps we need this: OpenMathLib/OpenBLAS#2881

Build CBLAS with MSYS2 MinGW64: make -j6 TARGET=HASWELL BINARY=64 ONLY_CBLAS=1.

Build v0.3.10 + cherry-pick OpenMathLib/OpenBLAS#2881.

  • git checkout v0.3.10
  • git cherry-pick a5b164946ccc9dec037d4e0a1cd2f2202b1c918a
  • git cherry-pick 403eb513a0616020e7238b531bad739f6baef43a

Still crash.


Between v0.3.10 and v0.3.12, the OpenBLAS main branch's symbol export is incomplete, missing function cblas_zdotc_sub. (You can check the list of exported symbols for v0.3.11)

So it was hard to find out the commit that fixed the crash directly by git bisect.

I'm going to find the commits that fixed the symbol export first, then look for the commits that fixed the crash.

@ViralBShah
Copy link
Member

It would also be good to check this with MKL.jl on 1.7. I am assuming it shouldn't be a problem - but good to know. Install MKL.jl on 1.7, then do using MKL and then run the tests in this issue. That should run through MKL.

@inkydragon
Copy link
Sponsor Member

Julia 1.7.0-rc1 + MKL.jl works well.

julia> using LinearAlgebra
julia> BLAS.get_config()
LinearAlgebra.BLAS.LBTConfig
Libraries:
└ [ILP64] libopenblas64_.dll

julia> using MKL
julia> BLAS.get_config()
LinearAlgebra.BLAS.LBTConfig
Libraries:
└ [ILP64] mkl_rt.1.dll


julia> arrSzie = 10000;  # No error
julia> arrSzie = 10001;  # Has error
julia> arrType = ComplexF64;
julia> using LinearAlgebra: dot
julia> using LinearAlgebra.BLAS: dotc
julia> x = zeros(arrType, arrSzie);
julia> y = zeros(arrType, arrSzie);

julia> dot(x, y)
0.0 + 0.0im

julia> dotc(length(x), x, stride(x, 1), y, stride(y, 1))
0.0 + 0.0im


julia> x = ones(arrType, arrSzie);
julia> y = ones(arrType, arrSzie);

julia> dot(x, y)
10001.0 + 0.0im

julia> dotc(length(x), x, stride(x, 1), y, stride(y, 1))
10001.0 + 0.0im


julia> versioninfo()
Julia Version 1.7.0-rc1
Commit 9eade6195e (2021-09-12 06:45 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, skylake)

(@v1.7) pkg> st
      Status `C:\Users\woclass\.julia\environments\v1.7\Project.toml`
  [42fd0dbc] IterativeSolvers v0.9.1
  [33e6dc65] MKL v0.4.2
  [2f01184e] SparseArrays

@inkydragon
Copy link
Sponsor Member

git bisect result

Test with DLL Export Viewer v1.66 + Julia 1.6.3 + ccall DLL Test code.

Has zdotc_sub Not Crash Commit message
v0.3.12 Fix typos
... - - -
v0.3.11 Update from develop for 0.3.11
... - - -
b205323 Fix mssing dummy parameter (imag part of alpha) of zdot_thread_function
fb3d80c4 rebase
... - - -
v0.3.10 Merge develop into 0.3.0 for 0.3.10 release

cherry-picking

  • git checkout v0.3.10
  • git cherry-pick b2053239fc3
  • make -j6 TARGET=HASWELL BINARY=64 ONLY_CBLAS=1

NO crash!

Test Julia 1.6.3+1.7.0

output

julia> arrSzie = 10001;
julia> arrType = ComplexF64;
julia> blasPath = raw"V:\OpenBLAS\libopenblas.dll";
julia> using LinearAlgebra.BLAS: BlasInt
julia> using Libdl: dlopen

julia> dlopen(blasPath)
Ptr{Nothing} @0x00007ffe5ceb0000
julia> strip(unsafe_string(ccall((:openblas_get_config, :libopenblas), Ptr{UInt8}, () )))
"OpenBLAS 0.3.10 NO_LAPACK NO_LAPACKE NO_AFFINITY HASWELL MAX_THREADS=6"

julia> x = zeros(arrType, arrSzie);
julia> y = zeros(arrType, arrSzie);
julia> result = Ref{ComplexF64}();
julia> ccall((:cblas_zdotc_sub, :libopenblas), Cvoid,
           (BlasInt, Ptr{ComplexF64}, BlasInt, Ptr{ComplexF64}, BlasInt, Ptr{ComplexF64}),
           length(x), x, stride(x, 1), y, stride(x, 1), result)

julia> result[]
0.0 + 0.0im


julia> x = ones(arrType, arrSzie);
julia> y = ones(arrType, arrSzie);
julia> ccall((:cblas_zdotc_sub, :libopenblas), Cvoid,
           (BlasInt, Ptr{ComplexF64}, BlasInt, Ptr{ComplexF64}, BlasInt, Ptr{ComplexF64}),
           length(x), x, stride(x, 1), y, stride(x, 1), result)

julia> result[]
10001.0 + 0.0im

@inkydragon
Copy link
Sponsor Member

It looks like julia is using the compiled product of OpenBLAS_jll.jl directly, so how should I submit the patch?

@ViralBShah
Copy link
Member

You basically add a patch to all the 0.3.10 build recipes in https://github.com/JuliaPackaging/Yggdrasil/tree/master/O/OpenBLAS. You'll see that there is a patches directory for each variant in there, and you just add a new patch. The build_tarballs.jl is the build script. When you create a PR, it will automatically build the binaries and do the rest.

You also have to add the patch to the julia source (for people who build from source). That goes in here: https://github.com/JuliaLang/julia/tree/master/deps/patches

@inkydragon
Copy link
Sponsor Member

Build Julia v1.6.3 + cherry-pick b2053239fc3. It seems to work well locally.

Some questions:

  1. Which branch should I fork and then merge into which branch?
  2. Should this patch be back ported to a previous version in Yggdrasil/O/OpenBLAS repo.
    (Of course, some testing needs to be done first.)

For question 1, My guess: JuliaLang:backports-release-1.6 <== inkydragon:blas-zdot (based on v1.6.3)
It looks like it would be better to fork the backports branch directly.

Questions 2: I don't know, but I will first add a patch to 0.3.10 and test the compatibility of this patch with other versions.

@ViralBShah
Copy link
Member

That's a question for @KristofferC

@KristofferC
Copy link
Sponsor Member

Yes a PR against backports-release-1.6 will work well.

inkydragon added a commit to inkydragon/julia that referenced this issue Sep 27, 2021
inkydragon added a commit to inkydragon/julia that referenced this issue Sep 27, 2021
@ViralBShah
Copy link
Member

About 2., on Yggdrasil, it should be on master. Yggdrasil will build a new release for openblas 0.3.10. After that we have to update the package versions in julia/deps in the julia sources.

inkydragon added a commit to fork4jl/Yggdrasil that referenced this issue Sep 27, 2021
vchuravy pushed a commit to JuliaPackaging/Yggdrasil that referenced this issue Oct 1, 2021
vchuravy pushed a commit that referenced this issue Oct 2, 2021
…TION (#42397)

* [OpenBLAS] cherry pick one patch to fix `zdot` crash

xref issue: #40963
cherry pick: OpenMathLib/OpenBLAS@b205323

* [test/LinearAlgebra] test Issue #40963

* [OpenBLAS] Update version

* [OpenBLAS] Update checksums
KristofferC pushed a commit that referenced this issue Nov 11, 2021
…TION (#42397)

* [OpenBLAS] cherry pick one patch to fix `zdot` crash

xref issue: #40963
cherry pick: OpenMathLib/OpenBLAS@b205323

* [test/LinearAlgebra] test Issue #40963

* [OpenBLAS] Update version

* [OpenBLAS] Update checksums
@ViralBShah
Copy link
Member

I believe this should be fixed, but please reopen if still an issue.

simeonschaub pushed a commit to simeonschaub/Yggdrasil that referenced this issue Feb 23, 2022
staticfloat pushed a commit that referenced this issue Dec 23, 2022
…TION (#42397)

* [OpenBLAS] cherry pick one patch to fix `zdot` crash

xref issue: #40963
cherry pick: OpenMathLib/OpenBLAS@b205323

* [test/LinearAlgebra] test Issue #40963

* [OpenBLAS] Update version

* [OpenBLAS] Update checksums
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:linear algebra Linear algebra kind:bug Indicates an unexpected problem or unintended behavior system:windows Affects only Windows
Projects
None yet
Development

No branches or pull requests

8 participants