-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
This is the bug report from hell. It occurs only when the moons (openblas_nolapack, JavaCPP) and the stars (Win11 + Java + SBT + Play Framework) align.
I originally filed this in JavaCPP bytedeco/javacpp-presets#1203 where the workaround was "don't use openblas". Ok. That worked while MKL was available, but apparently that's now being dropped by JavaCPP ( bytedeco/javacpp-presets#1575 (comment) ), so I'm rapidly heading up that creek and have no idea where I left the paddle.
I can readily reproduce the issue on my local Win11 machine (and there's a small repro project in the JavaCPP issue). Great. It hangs on:
Debug: Loading C:\Users\admin\.javacpp\cache\openblas-0.3.28-1.5.12-20250124.032029-44-windows-x86_64.jar\org\bytedeco\openblas\windows-x86_64\libopenblas_nolapack.dll
I have even attached WinDbg to the process, identified the relevant thread, and taken a stack dump:
0:152> k
# Child-SP RetAddr Call Site
00 00000054`a8ffa5d8 00007ffa`358c66a1 ntdll!NtFsControlFile+0x14
01 00000054`a8ffa5e0 00007ffa`357624ba KERNELBASE!PeekNamedPipe+0xf1
02 00000054`a8ffa6b0 00007ffa`357044a0 ucrtbase!common_stat_handle_file_opened<_stat64>+0x15e
03 00000054`a8ffa760 00007ffa`357618a8 ucrtbase!<lambda_3e61fc1153d2eec3991e8733eecb5419>::operator()+0x58
04 00000054`a8ffa7d0 00007ffa`35761b58 ucrtbase!__crt_seh_guarded_call<int>::operator()<<lambda_d6a03b27cb314eb65d447ab85fffcbf2>,<lambda_3e61fc1153d2eec3991e8733eecb5419> &,<lambda_8d9723598c44aced2bc47669cc68e4e1> >+0x44
05 00000054`a8ffa800 00007ff9`8a480143 ucrtbase!common_fstat<_stat64>+0xc0
06 00000054`a8ffa880 00007ff9`8a3ee4ae libgfortran_5!gfortrani_xrealloc+0xb273
07 00000054`a8ffa920 00007ff9`8a59034e libgfortran_5!gfortrani_init_units+0x5e
08 00000054`a8ffa960 00007ff9`8a2dc7f2 libgfortran_5!ynf+0x6e
09 00000054`a8ffa990 00007ff9`8a2d12dd libgfortran_5!backtrace_vector_release+0x202
0a 00000054`a8ffa9d0 00007ffa`38418b8f libgfortran_5+0x12dd
0b 00000054`a8ffaa20 00007ffa`3845d63d ntdll!LdrpCallInitRoutine+0x6b
0c 00000054`a8ffaa90 00007ffa`3845d3ee ntdll!LdrpInitializeNode+0x1c9
0d 00000054`a8ffabe0 00007ffa`3845d460 ntdll!LdrpInitializeGraphRecurse+0x42
0e 00000054`a8ffac20 00007ffa`3841db1d ntdll!LdrpInitializeGraphRecurse+0xb4
0f 00000054`a8ffac60 00007ffa`38418e30 ntdll!LdrpPrepareModuleForExecution+0xc5
10 00000054`a8ffaca0 00007ffa`384090cc ntdll!LdrpLoadDllInternal+0x20c
11 00000054`a8ffad40 00007ffa`3841a74a ntdll!LdrpLoadDll+0xb0
12 00000054`a8ffaf00 00007ffa`3587b732 ntdll!LdrLoadDll+0xfa
13 00000054`a8ffaff0 00007ffa`358777d1 KERNELBASE!LoadLibraryExW+0x172
14 00000054`a8ffb060 00007ffa`358d20ef KERNELBASE!LoadLibraryExA+0x31
15 00000054`a8ffb0a0 00007ff9`8f01225a KERNELBASE!LoadLibraryA+0x3f
16 00000054`a8ffb0d0 00007ff9`8edcecc0 jvm!c2v_getFlagValue+0x214cca
17 00000054`a8ffb130 00007ffa`20a23fa5 jvm!JVM_LoadLibrary+0xd0
18 00000054`a8ffb9a0 000001bd`368602d6 java!Java_jdk_internal_loader_NativeLibraries_load+0x89
19 00000054`a8ffbb30 000001cd`c28fcc41 0x000001bd`368602d6
1a 00000054`a8ffbb38 000001cd`c2066de0 0x000001cd`c28fcc41
1b 00000054`a8ffbb40 000001cd`c220d6a0 0x000001cd`c2066de0
1c 00000054`a8ffbb48 000001ce`1dcae438 0x000001cd`c220d6a0
1d 00000054`a8ffbb50 000001bd`00000000 0x000001ce`1dcae438
1e 00000054`a8ffbb58 000001cd`00000001 0x000001bd`00000000
1f 00000054`a8ffbb60 00000000`00000000 0x000001cd`00000001
o3-micro-high has this to say about that:
The stack trace shows that the hang occurs during the native library’s initialization—specifically, within the Fortran runtime code (libgfortran) that OpenBLAS uses. Key points include:
Where the hang occurs:
The thread is stuck in a Windows system call (NtFsControlFile via PeekNamedPipe) that’s invoked during a file-status check (common_stat_handle_file_opened). This call is part of the Fortran runtime’s unit initialization (e.g. in gfortrani_init_units).Implication:
The Fortran runtime (libgfortran_5) appears to be performing some file/pipe I/O (likely to initialize Fortran I/O units) that hangs when it calls into the Windows API. This suggests that something in the initialization code is either incompatible with Windows 11 or is misbehaving in the context of your Java process.Where the library is loaded:
The hang happens during the JVM’s library loading process (via JVM_LoadLibrary), so it’s not your Java code per se but the native library’s (OpenBLAS’s) initialization that is problematic.In summary, the dump indicates that the native library (openblas_nolapack), via its Fortran runtime initialization, is hanging on a Windows file-system call. This points to a potential issue with the library’s (or Fortran runtime’s) initialization code on Windows 11, such as waiting indefinitely for a file/pipe status that never returns. You might need to check for known compatibility issues with OpenBLAS on Windows 11 or consider using a different build/configuration of the library.
Any suggestions? Thanks!
Versions: it (still) happens with openblas-0.3.28.