Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crash on OS X 10.8.3 when using DYNAMIC_ARCH=1 USE_THREAD=1 #221

Closed
ViralBShah opened this issue May 21, 2013 · 29 comments
Closed

crash on OS X 10.8.3 when using DYNAMIC_ARCH=1 USE_THREAD=1 #221

ViralBShah opened this issue May 21, 2013 · 29 comments
Milestone

Comments

@ViralBShah
Copy link
Contributor

OpenBLAS is crashing when built for the julia distribution on the mac, when used in multi-threaded mode. The relevant flags are:

DYNAMIC_ARCH=1 NO_AFFINITY=1 INTERFACE64=1 BINARY=64 USE_THREAD=1

I am using OS X 10.8.3, clang 4.2, and gfortran 4.8.0 from brew. Single threaded works fine, but multi-threaded crashes. I believe this crash is related to DYNAMIC_ARCH=1, since that flag is only used when building openblas for the julia binary distribution on OS X, and is otherwise not used.

@xianyi
Copy link
Collaborator

xianyi commented May 21, 2013

Hi @ViralBShah ,

Is it failed at building stage or running stage?

Xianyi

@ViralBShah
Copy link
Contributor Author

I get a segfault when trying to multiply two matrices - at runtime.

@xianyi
Copy link
Collaborator

xianyi commented May 30, 2013

Hi @ViralBShah ,

I just tried gcc & gfortran 4.7 with DYNAMIC_ARCH=1 NO_AFFINITY=1 INTERFACE64=1 BINARY=64 USE_THREAD=1 on OS X 10.8.3. The dgemm works fine.

Could you try it from http://xianyi.github.io/OpenBLAS/download/0.2.6/openblas_0.2.6_osx.tar.gz ?

I will try clang.

Xianyi

@ViralBShah
Copy link
Contributor Author

Hi @xianyi

Is this the same as the released 0.2.6 openblas, or something you have patched?

@ViralBShah
Copy link
Contributor Author

May be similar issue as #225

@ViralBShah
Copy link
Contributor Author

Also, I do not observe the error running multi-threaded dgemm if I am using INTERFACE64=0 BINARY=64.

Sorry for these reports that are not very specific.

@ViralBShah
Copy link
Contributor Author

I deleted everything and started from scratch, and this no longer crashes.

@ViralBShah
Copy link
Contributor Author

Oops - this still crashes. Turns out I had added OPENBLAS_NUM_THREADS=1 in my .bashrc.

@xianyi
Copy link
Collaborator

xianyi commented Jun 6, 2013

Hi @ViralBShah ,

Please try the tarball. It is built on latest develop branch.

Xianyi

@ViralBShah
Copy link
Contributor Author

Will do @xianyi. I also see the crash on 32-bit, now that I unset my environment variable for num threads.

@ViralBShah
Copy link
Contributor Author

I built the develop branch for now, and the crash persists. I will try the tarball tomorrow.

@ViralBShah
Copy link
Contributor Author

I am unable to try your provided openblas due to lack of LAPACK. Could you build it with LAPACK as well? There is a problem with libgfortran as well, since I am on 4.8.1, but I figure that I can just use the 4.8.1 files here.

@ViralBShah
Copy link
Contributor Author

If you build with gcc -static-libgcc -Wl,-Bstatic -lstdc++ -lgfortran -lquadmath -Wl,-Bdynamic -lm, then all the gfortran dependencies can also be statically included in libopenblas.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46539

@ViralBShah
Copy link
Contributor Author

This is the openblas build output (on develop branch). Perhaps on mac, the max number of threads can be kept at something like 16.

 OpenBLAS build complete.

  OS               ... Darwin             
  Architecture     ... x86_64               
  BINARY           ... 64bit                 
  Use 64 bits int    (equivalent to "-i8" in Fortran)      
  C compiler       ... CLANG  (command line : clang -mmacosx-version-min=10.6)
  Fortran compiler ... GFORTRAN  (command line : gfortran)
-n   Library Name     ... libopenblasp-r0.2.6.a
 (Multi threaded; Max num-threads is 256)

@xianyi
Copy link
Collaborator

xianyi commented Jun 13, 2013

Hi @ViralBShah ,

Could you try static link OpenBLAS library, e.g. libopenblas.a ?

Xianyi

@ViralBShah
Copy link
Contributor Author

Good idea. I compiled it into a dynamic library with
gfortran -fPIC -shared -Wl,-all_load libopenblasp-r0.2.6.a -o libopenblasp-r0.2.6.dylib

This works fine and there are no crashes.

@xianyi
Copy link
Collaborator

xianyi commented Jun 14, 2013

Interesting. Could you double check the result? Is it multi-threaded and DYNAMIC_ARCH=1?

@ViralBShah
Copy link
Contributor Author

I am using your provided library, which is built with DYNAMIC_ARCH and multi-threading. More so, I checked that I am using multiple cores. Julia has a peakflops() command, which does a matrix multiply and computes the flop rate, and I do get the expected results with 2 cores.

The only difference between your build and mine seems to be that you are using gfortran 4.7 and I am using 4.8.

@xianyi
Copy link
Collaborator

xianyi commented Jun 14, 2013

Could you try to convert your compiled static library as following, too?

gfortran -fPIC -shared -Wl,-all_load libopenblasp-r0.2.6.a -o libopenblasp-r0.2.6.dylib

I want to make sure whether this is a bug about generating dylib.

Xianyi

@ViralBShah
Copy link
Contributor Author

Trying that on the develop branch, it fails as before. It seems that the dylib generation is fine.

@ViralBShah
Copy link
Contributor Author

This is the stacktrace

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00007fff5f3f1800
0x00000001049511e4 in gemm_driver ()
(gdb) bt
#0  0x00000001049511e4 in gemm_driver ()
#1  0x00000001049511ca in dgemm_thread_nn ()
#2  0x000000010481ef73 in dgemm_ ()

Is openmp more stable / higher performance than the pthreads version?

@xianyi
Copy link
Collaborator

xianyi commented Jun 17, 2013

@ViralBShah , Could you try to enable DEBUG=1?

@ViralBShah
Copy link
Contributor Author

This is what I get inside julia:

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00007fff5f3f1038
0x0000000105278160 in gemm_driver () at common_x86_64.h:145
145 common_x86_64.h: No such file or directory.
    in common_x86_64.h
(gdb) bt
#0  0x0000000105278160 in gemm_driver () at common_x86_64.h:145
#1  0x0000000105278096 in dgemm_thread_nn (args=0x7fff5fbfe278, range_m=0x0, range_n=0x0, sa=0x10fe0a000, sb=0x10ff0a000, mypos=0) at level3_thread.c:705
#2  0x0000000105031bce in dgemm_ (TRANSA=0x7fff5fbfe460 "N?_?", TRANSB=0x7fff5fbfe450 "N/?", M=0x7fff5fbfe440, N=0x7fff5fbfe430, K=0x7fff5fbfe420, alpha=0x7fff5fbfe410, a=0x10c100000, ldA=0x7fff5fbfe400, b=0x10c100000, ldB=0x7fff5fbfe3f0, beta=0x7fff5fbfe3e0, c=0x10df85000, ldC=0x7fff5fbfe3d0) at gemm.c:430

@ViralBShah
Copy link
Contributor Author

These are the compilers I am using:

viral-laptop 11:45:47 {master} ~/julia$ clang -v
Apple LLVM version 4.2 (clang-425.0.28) (based on LLVM 3.2svn)
Target: x86_64-apple-darwin12.4.0
Thread model: posix
viral-laptop 11:45:49 {master} ~/julia$ gfortran -v
Using built-in specs.
COLLECT_GCC=gfortran
COLLECT_LTO_WRAPPER=/usr/local/Cellar/gfortran/4.8.1/gfortran/libexec/gcc/x86_64-apple-darwin12.3.0/4.8.1/lto-wrapper
Target: x86_64-apple-darwin12.3.0
Configured with: ../configure --prefix=/usr/local/Cellar/gfortran/4.8.1/gfortran --datarootdir=/usr/local/Cellar/gfortran/4.8.1/share --bindir=/usr/local/Cellar/gfortran/4.8.1/bin --enable-languages=fortran --with-system-zlib --with-gmp=/usr/local/opt/gmp --with-mpfr=/usr/local/opt/mpfr --with-mpc=/usr/local/opt/libmpc --with-cloog=/usr/local/opt/cloog --with-isl=/usr/local/opt/isl --enable-checking=release --disable-stage1-checking --disable-build-poststage1-with-cxx --disable-libstdcxx-pc --disable-nls
Thread model: posix
gcc version 4.8.1 (GCC) 

@ViralBShah
Copy link
Contributor Author

I built again this time by removing the -mmacosx-version-min=10.6 option to clang, and also setting the max number of threads to 128 and I do not get the crash.

@ViralBShah
Copy link
Contributor Author

I finally narrowed it down. Compiling with 128 threads max instead of 256 fixed this for me on mac. Perhaps there is some kind of memory corruption when using a large number of threads - which for some reason only shows up when using DYNMAIC_ARCH.

@xianyi
Copy link
Collaborator

xianyi commented Jul 1, 2013

Thank you for the investigation.

I will debug this error.

2013/7/1 Viral B. Shah notifications@github.com

I finally narrowed it down. Compiling with 128 threads max instead of 256
fixed this for me on mac. Perhaps there is some kind of memory corruption
when using a large number of threads?


Reply to this email directly or view it on GitHubhttps://github.com//issues/221#issuecomment-20265773
.

xianyi added a commit that referenced this issue Jul 2, 2013
@xianyi
Copy link
Collaborator

xianyi commented Jul 2, 2013

Hi @ViralBShah ,

I think I fixed this bug on develop branch. Could you try it?

Xianyi

xianyi added a commit that referenced this issue Jul 7, 2013
When NUM_THREADS(MAX_CPU_NUNBERS) is very large ,e.g. 256.

typedef struct {
  volatile BLASLONG working[MAX_CPU_NUMBER][CACHE_LINE_SIZE * DIVIDE_RATE];
} job_t;

job_t          job[MAX_CPU_NUMBER];

The job array is equal 8MB.

Thus, We use malloc instead of stack allocation.
@xianyi
Copy link
Collaborator

xianyi commented Jul 7, 2013

@ViralBShah , I fixed this bug in last commit, which didn't change the stack limit on Mac OS X.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants