You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to use the new multi-GPU features in AD-GPU 1.5.1, and with OVERLAP=ON during compiling.
There are 8 GPU cards in my local computing machine. When I specified the cuda device numbers, the command line parsers in main.cpp seems not correctly parsing the input CUDA IDs:
When I try autodock_gpu_128wi -B ligand_conf_batch.dat -n 10 -D 1 , it seems to work correctly, with outputting the cuda info as
Running Job #7:
Device: GeForce RTX 2080 Ti (#1 / 8)
Grid map file: protein.maps.fld
Ligand file: ligand_2_isomer_0_conf_0_split_0.pdbqt
Using heuristics: (capped) number of evaluations set to 2068966
Local-search chosen method is: ADADELTA (ad)
When I try autodock_gpu_128wi -B ligand_conf_batch.dat -n 10 -D 2 , it gives me an cuda error like
Running Job #13:
Device: GeForce RTX 2080 Ti (#2 / 8)
Grid map file: protein.maps.fld
Ligand file: ligand_8_isomer_0_conf_0_split_0.pdbqt
Using heuristics: (capped) number of evaluations set to 2068966
Local-search chosen method is: ADADELTA (ad)
gpu_calc_initpop_kernel an illegal memory access was encountered
autodock_gpu_128wi: ./cuda/kernel1.cu:65: void gpu_calc_initpop(uint32_t, uint32_t, float*, float*): Assertion `0' failed.
[1] 38987 abort (core dumped) autodock_gpu_128wi -B ligand_conf_batch.dat -n 10 -D 2
The error output is the same for -D to (2,3,4,5,6,7,8), it does not work for all cuda device indices except for -D 1.
And when I try autodock_gpu_128wi -B ligand_conf_batch.dat -n 10 -D 2, (Note there is a comma at the end), it seems to duplicate the parsing of cuda idx 2, but runs normally:
Cuda device: GeForce RTX 2080 Ti (#2 / 8)
Available memory on device: 10439 MB (total: 11019 MB)
CUDA Setup time 0.280641s
Cuda device: GeForce RTX 2080 Ti (#2 / 8)
Available memory on device: 8301 MB (total: 11019 MB)
CUDA Setup time 0.000852s
Running Job #14:
Device: GeForce RTX 2080 Ti (#2 / 8)
Grid map file: protein.maps.fld
Ligand file: ligand_9_isomer_0_conf_0_split_0.pdbqt
Using heuristics: (capped) number of evaluations set to 2068966
Local-search chosen method is: ADADELTA (ad)
Rest of Setup time 0.036681s
But if i try to use multiple cards, such as autodock_gpu_128wi -B ligand_conf_batch.dat -n 10 -D 2,3,7, it works correctly, with all specified GPU cards detected:
Cuda device: GeForce RTX 2080 Ti (#2 / 8)
Available memory on device: 10746 MB (total: 11019 MB)
CUDA Setup time 0.291592s
Cuda device: GeForce RTX 2080 Ti (#3 / 8)
Available memory on device: 8883 MB (total: 11019 MB)
CUDA Setup time 0.236695s
Cuda device: GeForce RTX 2080 Ti (#7 / 8)
Available memory on device: 9072 MB (total: 11019 MB)
CUDA Setup time 0.212491s
Running Job #24:
Device: GeForce RTX 2080 Ti (#7 / 8)
Grid map file: protein.maps.fld
Ligand file: ligand_17_isomer_0_conf_1_split_0.pdbqt
Using heuristics: (capped) number of evaluations set to 2280181
Local-search chosen method is: ADADELTA (ad)
Rest of Setup time 0.021547s
Finally if I use -D all autodock_gpu_128wi -B ligand_conf_batch.dat -n 10 -D all, everything works pretty fine as expected:
Cuda device: GeForce RTX 2080 Ti (#1 / 8)
Available memory on device: 2816 MB (total: 11019 MB)
CUDA Setup time 0.175057s
Cuda device: GeForce RTX 2080 Ti (#2 / 8)
Available memory on device: 10441 MB (total: 11019 MB)
CUDA Setup time 0.157140s
Cuda device: GeForce RTX 2080 Ti (#3 / 8)
Available memory on device: 7385 MB (total: 11019 MB)
CUDA Setup time 0.229185s
Cuda device: GeForce RTX 2080 Ti (#4 / 8)
Available memory on device: 3330 MB (total: 11019 MB)
CUDA Setup time 0.194629s
Cuda device: GeForce RTX 2080 Ti (#5 / 8)
Available memory on device: 7873 MB (total: 11019 MB)
CUDA Setup time 0.257307s
Cuda device: GeForce RTX 2080 Ti (#6 / 8)
Available memory on device: 8565 MB (total: 11019 MB)
CUDA Setup time 0.204192s
Cuda device: GeForce RTX 2080 Ti (#7 / 8)
Available memory on device: 8097 MB (total: 11019 MB)
CUDA Setup time 0.226710s
Cuda device: GeForce RTX 2080 Ti (#8 / 8)
Available memory on device: 9240 MB (total: 11019 MB)
CUDA Setup time 0.351348s
Running Job #1:
Device: GeForce RTX 2080 Ti (#1 / 8)
Grid map file: protein.maps.fld
Ligand file: ligand_0_isomer_0_conf_0_split_0.pdbqt
Using heuristics: (capped) number of evaluations set to 2068966
Local-search chosen method is: ADADELTA (ad)
Rest of Setup time 0.022616s
Hope these info is helpful for identifying the bugs.
Thanks.
The text was updated successfully, but these errors were encountered:
@Hong-Rui Thank you for reporting this issue. Also a big thank you for going the extra mile and being very thorough :-)
I am currently suspecting it might be a cut character when parsing the last argument (which would explain why -D 1 but none of the others work as GPU #1 is used by default) but on a smaller machine (2 OpenCL GPUs) I could so far not reproduce this ...
I'll continue looking into it and hope to be able to reproduce and fix it soon once one of our 8x Cuda machines becomes available.
@Hong-Rui Fix for the bug is up as PR #153 and should be merged soon. The bug turned out to be the wrong Cuda device being set (the first) for some threads which then lead to the crash as the memory pointer for device # 2 wasn't valid on # 1.
Hi developers,
I'm trying to use the new multi-GPU features in AD-GPU 1.5.1, and with OVERLAP=ON during compiling.
There are 8 GPU cards in my local computing machine. When I specified the cuda device numbers, the command line parsers in main.cpp seems not correctly parsing the input CUDA IDs:
When I try
autodock_gpu_128wi -B ligand_conf_batch.dat -n 10 -D 1
, it seems to work correctly, with outputting the cuda info asWhen I try
autodock_gpu_128wi -B ligand_conf_batch.dat -n 10 -D 2
, it gives me an cuda error likeThe error output is the same for -D to (2,3,4,5,6,7,8), it does not work for all cuda device indices except for -D 1.
And when I try
autodock_gpu_128wi -B ligand_conf_batch.dat -n 10 -D 2,
(Note there is a comma at the end), it seems to duplicate the parsing of cuda idx 2, but runs normally:But if i try to use multiple cards, such as
autodock_gpu_128wi -B ligand_conf_batch.dat -n 10 -D 2,3,7
, it works correctly, with all specified GPU cards detected:Finally if I use -D all
autodock_gpu_128wi -B ligand_conf_batch.dat -n 10 -D all
, everything works pretty fine as expected:Hope these info is helpful for identifying the bugs.
Thanks.
The text was updated successfully, but these errors were encountered: