-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
error == cudaSuccess (77 vs. 0) an illegal memory access was encountered #226
Comments
Hello, which version of Caffe did you build from source? |
I downloaded it from this link just yesterday so I believe it was the latest version till then |
Thanks. Can you provide the Git SHA1? |
Sorry I don't get what you mean by "Git SHA1" |
Hello again, I tried to build caffe with cuDNN4 but I receive this error: Could you help on this? I can hardly train more than epoch 20 now! The issue is happening on every training, each and every one of them! |
I also cloned the latest nvidia caffe with git clone https://github.com/NVIDIA/caffe.git $CAFFE_HOME and rebuilt it again with cuDNN 5 but still the same error |
I hit this (error == cudaSuccess (77 vs. 0) an illegal memory access was encountered) too, running this fork of caffe in a docker container using cudnn4, cuda 7.5 . (Host has cuda7.5 and cudnn5). Driver on host is : NVIDIA-SMI 352.39 Driver Version: 352.39 |
Isn't anyone going to suggest anything? I even rebuilt caffe without cuDNN (removing all cuDNNs via software center and then running cmake and all that) but again training stopped (this time after epoch 57) while nvidia-smi shows caffe is still running occupying more that 3 GBs of memory. I don't understand what's going on here!!! my Driver Version: 352.63 |
@szm2015 Hello. You say "I built caffe nvidia fork from source". Could you clarify what particular branch you built? We recently fixed memory access issue and the fix is now in 0.15.13 release. Could you try it out please? |
The one I mentioned in my last comment (which I have built without cuDNN and which still suffers from this) is indeed 0.15.13. |
Got it. In order to reproduce it, I'd need the Makefile.config you use as well as the exact output you get (including the call stack). Could you attach this info here? |
Did you solve this problem? If yes, could you please say how? |
I can't reproduce it still because my previous question was now answered. |
I didn't solve it exactly I just omitted cuDNN, but I'm working on another system now (a GTX 1080) there cuda 8,0 and cuDNN ver5.1.5 is working without a problem |
I encountered this problem in the following scenario: I had I excluded the 14'th hdf5 file from the training set and the error was gone. |
Hi everyone
As the title suggests I have encountered this error while training:
error == cudaSuccess (77 vs. 0) an illegal memory access was encountered
Another issue in NVIDIA/DIGITS ( #598) exits for this error, there the problem seems to have been resolved by reinstalling digits and caffe nv, but in my case:
I first installed DIGITS in the normal procedure and have been working with it for a while without significant problem, but in order to use multi-class detection yesterday I built caffe nvidia fork from source (as suggested in #157 ), I was then able to successfully launch multi-class detectNet training but then I get this error every now and then at apparently random iterations (once after epoch 26 then before epoch 20 and so on) It's either this or the training just stops at again a random iteration,in this latter case the GPU utilization reduces from more than 80 to less than 40 (most of the time zero) percent while nvidia-smi shows that caffe is still running on GPU and memory is also occupied.
my CUDA version:
Package: cuda
Status: install ok installed
Priority: optional
Section: devel
Installed-Size: 8
Maintainer: cudatools cudatools@nvidia.com
Architecture: amd64
Version: 7.5-18
Depends: cuda-7-5 (= 7.5-18)
Description: CUDA meta-package
Meta-package containing all the available packages required for native CUDA
development. Contains the toolkit, samples, driver and documentation.
my cuDNN version:
Package: libcudnn5
Status: install ok installed
Priority: optional
Section: multiverse/libs
Installed-Size: 59315
Maintainer: cudatools cudatools@nvidia.com
Architecture: amd64
Source: cudnn
Version: 5.1.3-1+cuda7.5
Description: cuDNN runtime libraries
cuDNN runtime libraries containing primitives for deep neural networks.
and my caffe version:
Package: caffe-nv
Status: install ok installed
Priority: optional
Section: universe/misc
Installed-Size: 120
Maintainer: Caffe Maintainers caffe-maint@googlegroups.com
Architecture: amd64
Version: 0.15.9-1+cuda7.5
I'm using a GeForce GTX 850M.
Note that before installing caffe from source and all that I have had several training and these issues (illegal memory access and getting stuck at an iteration) has just happened once or twice and now it's a recurring issue at every training (either the latter or the former)
In here someone suggested to replace cudnn 5 with 4 (I don't know how to do that, simply installing libcudnn4 would do it? I thought it may cause conflict so I haven't tried it yet)
I greatly appreciate your help!!!
The text was updated successfully, but these errors were encountered: