-
Notifications
You must be signed in to change notification settings - Fork 341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: an illegal memory access was encountered at src/convolution.cu:259 device number specification and assertion #40
Comments
Hmm could you post the engine version, cuda version, nvcc version? I confirmed it worked on most Pascal architecture chips, but not the P100. I’ll check. |
System: Host: C249-SYS-7048GR-TR Kernel: 4.15.0-54-generic x86_64 bits: 64 Console: tty 3 Machine: Device: kvm System: Supermicro product: SYS-7048GR-TR v: 0123456789 serial: N/A CPU(s): 2 8 core Intel Xeon E5-2620 v4s (-MT-MCP-SMP-) cache: 40960 KB Graphics: Card-1: NVIDIA GM206GL [Quadro M2000] CUDA Version 10.0.130 nvcc --version |
Have you test on Maxwell architecture? |
I checked it works on K40, K80, GTX1060, GTX1080Ti, GTX2080Ti, Titan X, Titan XP, Titan RTX, V100. |
ok I fix it |
Hmm interesting. I’ve always specified this before so I didn’t encounter this, but adding some assertion checks to prevent this would be a good idea. |
I find when use the first gpu it always work, but if use other gpu, it seems the program will also use the first gpu, and the one which is specified is also used. so this is the real error. I don't know whether it is my code mistake.(Very likely) I will check it. |
I always use |
For pytorch v1.3.1, we found out that the asynchronous memory allocation in pytorch doesn't always emit the error in place and propagates the error back to the engine. To verify this is not the error in the engine, we placed We found out that when OOM were to happen, we get this error This requires further investigation in side the pytorch as well, but for now, I can assure you that there is no memory leakage, invalid pointer access error in the Minkowski Engine. |
When I use
But I got an error:
|
when you set CUDA_VISIBLE_DEVICES=1, the only GPU your code can get is the second gpu in your system. So I think if you set "a = torch.Tensor(1).to("cuda:0")", it will be ok. |
I can run my code with GTX1060
but it doesn't works with Tesla P100 on server
Ant output:
Traceback_ (most recent call last):
File "train.py", line 135, in
trainer.train(epoch)
File "train.py", line 56, in train
output_sparse = self.model(point)
File "/home/gaoqiyu/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/gaoqiyu/PointCloudSeg_Minkowski/model/minkunet.py", line 122, in forward
out = self.conv0p1s1(x)
File "/home/gaoqiyu/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/gaoqiyu/PointCloudSeg_Minkowski/MinkowskiEngine/MinkowskiConvolution.py", line 269, in forward
out_coords_key, input.coords_man)
File "/home/gaoqiyu/PointCloudSeg_Minkowski/MinkowskiEngine/MinkowskiConvolution.py", line 91, in forward
ctx.coords_man.CPPCoordsManager)
RuntimeError: an illegal memory access was encountered at src/convolution.cu:259
The text was updated successfully, but these errors were encountered: