You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With the env above, ME-0.5.4 is tested successfully ( including ME.Conv, ME.BN, ME.ReLU, ME.interpolation, and loss.backward ) on GPUs T4 and P40, but fails with A100, the error is 'cudaErrorNoKernelImageForDevice no kernel image is available for execution on the device'.
The details of output error:
{"@timestamp":"2022-02-22 00:14:15.936","@message":" sparse_tensor = ME.SparseTensor(code, coord_sparse)"}
{"@timestamp":"2022-02-22 00:14:15.936","@message":" File "/opt/conda/envs/3dr3_cu113/lib/python3.8/site-packages/MinkowskiEngine/MinkowskiSparseTensor.py", line 275, in init"}
{"@timestamp":"2022-02-22 00:14:15.936","@message":" coordinates, features, coordinate_map_key = self.initialize_coordinates("}
{"@timestamp":"2022-02-22 00:14:15.936","@message":" File "/opt/conda/envs/3dr3_cu113/lib/python3.8/site-packages/MinkowskiEngine/MinkowskiSparseTensor.py", line 304, in initialize_coordinates"}
{"@timestamp":"2022-02-22 00:14:15.936","@message":" ) = self._manager.insert_and_map(coordinates, *coordinate_map_key.get_key())"}
{"@timestamp":"2022-02-22 00:14:15.936","@message":" File "/opt/conda/envs/3dr3_cu113/lib/python3.8/site-packages/MinkowskiEngine/MinkowskiCoordinateManager.py", line 179, in insert_and_map"}
{"@timestamp":"2022-02-22 00:14:15.936","@message":" return self._manager.insert_and_map(coordinates, tensor_stride, string_id)"}
{"@timestamp":"2022-02-22 00:14:15.936","@message":"RuntimeError: CUDA error encountered at: /tmp/pip-req-build-16c08htu/src/3rdparty/concurrent_unordered_map.cuh:595: 209 cudaErrorNoKernelImageForDevice no kernel image is available for execution on the device"}
At First, I guess it may be caused by the compatibility between pytorch1.7 and the compute capability of A100. However, pytorch-1.7.1+cuda-11.0+driver-450.80.2 dose support A100 (I used a simple network without ME and it passed successfully).
Have you test ME on A100 and can it work well?
Thank you very much~
The text was updated successfully, but these errors were encountered:
As a note for others how find this issue like me, but don't want to use docker for their training, we just have to add export TORCH_CUDA_ARCH_LIST="6.0 6.1 6.2 7.0 7.2 7.5 8.0 8.6" to our script prior pip installing ME.
Soory for still bothering you after reading some similar issues of ME like issue#330, issus#350, issue#52.
My problem
I build ME=0.5.4 with anaconda virtualenv:
pytorch=1.7.1;
cudatoolkit=11.0 or 10.2 (with CUDA in system 11.0 or 10.2, respectively)
and system:
ubuntu 18.04
nvidia driver: 450.80.2 (or 450.102.04)
gcc 7.5.0
With the env above, ME-0.5.4 is tested successfully ( including ME.Conv, ME.BN, ME.ReLU, ME.interpolation, and loss.backward ) on GPUs T4 and P40, but fails with A100, the error is 'cudaErrorNoKernelImageForDevice no kernel image is available for execution on the device'.
The details of output error:
{"@timestamp":"2022-02-22 00:14:15.936","@message":" sparse_tensor = ME.SparseTensor(code, coord_sparse)"}
{"@timestamp":"2022-02-22 00:14:15.936","@message":" File "/opt/conda/envs/3dr3_cu113/lib/python3.8/site-packages/MinkowskiEngine/MinkowskiSparseTensor.py", line 275, in init"}
{"@timestamp":"2022-02-22 00:14:15.936","@message":" coordinates, features, coordinate_map_key = self.initialize_coordinates("}
{"@timestamp":"2022-02-22 00:14:15.936","@message":" File "/opt/conda/envs/3dr3_cu113/lib/python3.8/site-packages/MinkowskiEngine/MinkowskiSparseTensor.py", line 304, in initialize_coordinates"}
{"@timestamp":"2022-02-22 00:14:15.936","@message":" ) = self._manager.insert_and_map(coordinates, *coordinate_map_key.get_key())"}
{"@timestamp":"2022-02-22 00:14:15.936","@message":" File "/opt/conda/envs/3dr3_cu113/lib/python3.8/site-packages/MinkowskiEngine/MinkowskiCoordinateManager.py", line 179, in insert_and_map"}
{"@timestamp":"2022-02-22 00:14:15.936","@message":" return self._manager.insert_and_map(coordinates, tensor_stride, string_id)"}
{"@timestamp":"2022-02-22 00:14:15.936","@message":"RuntimeError: CUDA error encountered at: /tmp/pip-req-build-16c08htu/src/3rdparty/concurrent_unordered_map.cuh:595: 209 cudaErrorNoKernelImageForDevice no kernel image is available for execution on the device"}
At First, I guess it may be caused by the compatibility between pytorch1.7 and the compute capability of A100. However, pytorch-1.7.1+cuda-11.0+driver-450.80.2 dose support A100 (I used a simple network without ME and it passed successfully).
Have you test ME on A100 and can it work well?
Thank you very much~
The text was updated successfully, but these errors were encountered: