Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault in relion_refine_mpi with --firstiter_cc and --gpu #7

Closed
bforsbe opened this issue Jun 21, 2016 · 9 comments
Closed

Segfault in relion_refine_mpi with --firstiter_cc and --gpu #7

bforsbe opened this issue Jun 21, 2016 · 9 comments

Comments

@bforsbe
Copy link
Contributor

bforsbe commented Jun 21, 2016

Originally reported by: Dimitry Tegunov (Bitbucket: DTegunov, GitHub: DTegunov)


I hope it's an actual bug this time ;-)

I'm running 3D refinement using

#!bash
mpirun -n 3 `which relion_refine_mpi` --o RefineInitial/run1 --auto_refine --split_random_halves --i particles.star --ref emd_2984_280.mrc --firstiter_cc --ini_high 30 --dont_combine_weights_via_disc --pool 3 --ctf --ctf_corrected_ref --particle_diameter 200 --flatten_solvent --zero_mask --oversampling 1 --healpix_order 2 --auto_local_healpix_order 4 --offset_range 10 --offset_step 2 --sym D2 --low_resol_join_halves 40 --norm --scale  --j 1 --gpu

(template created in GUI, names modified and launched in terminal), and it crashes saying

#!bash
KERNEL_ERROR: invalid argument in /home/dtegunov/Desktop/relion2beta/src/gpu_utils/cuda_helper_functions.cu at line 598 (error-code 11)
[dtegunov:09959] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10d10)[0x7f72177d9d10]
[dtegunov:09959] [ 1] /lib/x86_64-linux-gnu/libpthread.so.0(raise+0x29)[0x7f72177d9bd9]
[dtegunov:09959] [ 2] /home/dtegunov/Desktop/relion2beta/build/lib/librelion_gpu_util.so(_Z20runDiff2KernelCoarseR19CudaProjectorKernelPfS1_S1_S1_S1_S1_S1_R21OptimisationParamtersP11MlOptimisermiiiiiP11CUstream_stb+0x9ce)[0x7f7216a4e5ae]
[dtegunov:09959] [ 3] /home/dtegunov/Desktop/relion2beta/build/lib/librelion_gpu_util.so(_Z30getAllSquaredDifferencesCoarsejR21OptimisationParamtersR18SamplingParametersP11MlOptimiserP15MlOptimiserCudaR13CudaGlobalPtrIfLb1EE+0x13d0)[0x7f7216a583e0]
[dtegunov:09959] [ 4] /home/dtegunov/Desktop/relion2beta/build/lib/librelion_gpu_util.so(_ZN15MlOptimiserCuda32doThreadExpectationSomeParticlesEi+0x2ea9)[0x7f7216a68779]
[dtegunov:09959] [ 5] /home/dtegunov/Desktop/relion2beta/build/lib/librelion_lib.so(_Z11_threadMainPv+0x1d)[0x7f721867639d]
[dtegunov:09959] [ 6] /lib/x86_64-linux-gnu/libpthread.so.0(+0x76aa)[0x7f72177d06aa]
[dtegunov:09959] [ 7] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f7217505e9d]

It doesn't crash on the GPU if I remove --firstiter_cc, and the CPU version runs fine with --firstiter_cc. Not sure if I can provide my test data due to its size, but maybe there are some debug flags I can set that will give you more information to work with?


@bforsbe
Copy link
Contributor Author

bforsbe commented Jun 21, 2016

Original comment by Bjoern Forsberg (Bitbucket: bforsbe, GitHub: bforsbe):


I hope it WAS a bug. We noticed a recently introduced bug with --firstiter-cc which should be amended in v 2.0.b1, which was pushed no more than 30 minutes ago. Try pulling the new code and running again. If the problem persists, I'll dig deeper.

Thanks again for reporting!

@bforsbe
Copy link
Contributor Author

bforsbe commented Jun 21, 2016

Original comment by Dimitry Tegunov (Bitbucket: DTegunov, GitHub: DTegunov):


Nope, still crashes with the same message.

@bforsbe
Copy link
Contributor Author

bforsbe commented Jun 21, 2016

Original comment by Bjoern Forsberg (Bitbucket: bforsbe, GitHub: bforsbe):


Does it crash immediately? If so, is it possible to create a minimal example with input data that shows this error, like just a few particles? I f so I can have a look at it. If the files are "too" large then I could receive them in some other way than through here. I'll try to reproduce here on separate data in the mean time.

@bforsbe
Copy link
Contributor Author

bforsbe commented Jun 22, 2016

Original comment by craigyk (Bitbucket: craigyk, GitHub: craigyk):


I just ran into this problem. I pulled all the latest changes and reran and it seems to be OK. So the problem seems fixed for me with the latest bits.

@bforsbe
Copy link
Contributor Author

bforsbe commented Jun 22, 2016

Original comment by Bjoern Forsberg (Bitbucket: bforsbe, GitHub: bforsbe):


I believe that the error Craig observed, is the one we did in fact fix in v.2.0.1b. Dimitry appears to have found a wholly separate issue. Luckily, I seem to have been able to reproduce that issue here now, so hopefully there will be a fix for it later today.

@bforsbe
Copy link
Contributor Author

bforsbe commented Jun 22, 2016

Original comment by Bjoern Forsberg (Bitbucket: bforsbe, GitHub: bforsbe):


I believe I know what the issue is now. Since cross-correlation is so infrequently used and not a bottleneck, these functions have not been adapted to the most recent version of the difference-kernel layout. Subsequently they still use a layout which is potentially limited by hardware capacity, by requesting shared memory which potentially exceeds that available on the device. We could fix this in a number of ways, the easiest being to decrease the block-size if the memory limit is exceeded. However this suffers the same weakness, just at a much later stage. I think the more reasonable thing is to update the cc-kernels to the new layout, which may take a day or two.

For now, however, you can circumvent the issue by doing one of two (I had to do both...) things;

  • Compile for sm_52 (which has higher shared mem capacity). The default for RELION is to use sm_35, the minimum supported architecture. Since I noticed you had a TITAN X from issue Problem with linux workstation install #3, your device is sm_52. To compile for sm_52, modify your cmake configuration command to
#!bash

cmake -DCUDA_ARCH=52 ..
  • Change the precompile-variable BLOCK_SIZE in src/gpu_utils/cuda_settings.h to a lower value. In general WE DO NOT RECOMMEND CHANGING THESE VALUES. But as a temporary fix until I get an updated version pushed, it should work. Always set them to multiples of 32! Currently it is set to 128, so reducing it to 96, 64 or 32 will reduce the currently needed shared memory.

Let me know if any of these measures help at all!

@bforsbe
Copy link
Contributor Author

bforsbe commented Jun 22, 2016

Original comment by Bjoern Forsberg (Bitbucket: bforsbe, GitHub: bforsbe):


I just pushed a possible fix (v2.0.b2) by creating a new cross-correlation kernel which is does not have shared-memory usage proportional to the number of translations. This should also do the trick. If not, let me know and I'll continue hacking away at it.

@bforsbe
Copy link
Contributor Author

bforsbe commented Jun 22, 2016

Original comment by Dimitry Tegunov (Bitbucket: DTegunov, GitHub: DTegunov):


It appears fixed in 0be3990, thanks! On a side note: compiling with sm_52 won't solve issues with dynamic shared memory allocation. The hardware will already allocate everything it physically can, regardless of the compiler target.

@bforsbe
Copy link
Contributor Author

bforsbe commented Jun 22, 2016

Original comment by Bjoern Forsberg (Bitbucket: bforsbe, GitHub: bforsbe):


Good to know! That's probably why I had to also adjust the block-size to get it working. Thanks!

@bforsbe bforsbe closed this as completed Jan 26, 2017
biochem-fan pushed a commit that referenced this issue Feb 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant