Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

invalid device symbol #172

Open
Jianzhong2020 opened this issue Jan 11, 2022 · 10 comments
Open

invalid device symbol #172

Jianzhong2020 opened this issue Jan 11, 2022 · 10 comments

Comments

@Jianzhong2020
Copy link

Hello,

I just installed autodock-gpu on a ubuntu 20.04 (two 3080 cards, one CUDA version (11.5)) with "make DEVICE=GPU NUMWI=128" command.
"autodock_gpu_128wi" did appear in the bin directory.
But when I ran "ADU --ffile input/1stp/derived/1stp_protein.maps.fld -lfile input/1stp/derived/1stp_ligand.pdbqt" (I set an alias for autodock_gpu_128wi), the following error kept poping up:

_AutoDock-GPU version: v1.5-release

Running 1 docking calculation

Cuda device: NVIDIA GeForce RTX 3080 (#1 / 2)
Available memory on device: 9772 MB (total: 10014 MB)

CUDA Setup time 0.119027s

Running Job #1
Using heuristics: (capped) number of evaluations set to 1132076
Local-search chosen method is: ADADELTA (ad)
SetKernelsGpuData copy to cData failed invalid device symbol
autodock_gpu_128wi: ./cuda/kernels.cu:130: void SetKernelsGpuData(GpuData*): Assertion `0' failed._

I'm wondering if this is because I have two cards? And I should compile with extra flags? Any guidance would be appreciated.

@atillack
Copy link
Collaborator

atillack commented Jan 11, 2022

@Jianzhong2020 AD-GPU by default only compiles for compute capabilities 52, 60, 61, and 70 - to compile only for an RTX 3080 (compute capability 86) you could compile with:
make DEVICE=GPU NUMWI=128 TARGETS="86"
Please make sure to add more compute capabilities if you are planning to run on other GPUs, for example, if you wanted a binary for both an RTX 3080 and a Quadro RTX 8000 (compute capability 75) you would use `TARGETS="75 86".

Wikipedia has a very good list of GPU compute capabilities and their Cuda versions here:
https://en.wikipedia.org/wiki/CUDA#GPUs_supported

@atillack
Copy link
Collaborator

@Jianzhong2020 One more thing - if you have more than one docking job I'd recommend using the --filelist feature and to enable multithreading by compiling with OVERLAP=ON. Then, you could let AD-GPU run on both GPUs automatically using the AD-GPU command line option --filelist <your.lst> -D all ;-)

@Jianzhong2020
Copy link
Author

Problem solved with TARGETS="86" and OVERLAP=ON. Many thanks @atillack

@BJWiley233
Copy link

BJWiley233 commented Apr 20, 2022

I am also getting this in the cuda image nvidia/cuda:10.1-devel-ubuntu18.04 with CUDA version 10.1 and running with 2 V100 Tesla GPUs. Sorry I only get the hanging issue in #186

$ /opt/AutoDock-GPU/bin/autodock_gpu_64wi --ffile 1hsg.maps.fld --lfile indinavir.pdbqt
AutoDock-GPU version: v1.5.3-22-gf8a00853dd3fddd82d13866d3ba88c9137ebd5c0

Running 1 docking calculation

Cuda device:                              Tesla V100-SXM2-32GB (#1 / 2)
Available memory on device:               32162 MB (total: 32480 MB)

CUDA Setup time 0.191685s

@BJWiley233
Copy link

Here is my Dockerfile. I am running on LSF as well.

FROM nvidia/cuda:10.1-devel-ubuntu18.04

RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y wget check libssl-dev git build-essential devscripts debhelper fakeroot pkg-config dkms libsubunit0 libsubunit-dev cuda-toolkit-10-1 && apt-get update -y && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends hwloc openssh-client && rm -rf /var/lib/apt/lists/* && apt-get update -y &&  DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends software-properties-common && apt-add-repository ppa:ubuntu-toolchain-r/test -y && apt-get update -y && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends libgomp1 && rm -rf /var/lib/apt/lists/*
RUN wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/libnvidia-compute-418_418.87.01-0ubuntu1_amd64.deb
RUN apt-get update && dpkg -i libnvidia-compute-418_418.87.01-0ubuntu1_amd64.deb

ENV GPU_INCLUDE_PATH=/usr/local/cuda-10.1/include
ENV GPU_LIBRARY_PATH=/usr/local/cuda-10.1/lib64
ENV CPU_INCLUDE_PATH=/usr/local/cuda-10.1/include
ENV CPU_LIBRARY_PATH=/usr/local/cuda-10.1/lib64

RUN cd /opt && git clone https://github.com/ccsb-scripps/AutoDock-GPU.git && cd AutoDock-GPU && make DEVICE=GPU NUMWI=128

# clean up
RUN apt-get clean && \
    rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* && \
    apt-get autoclean && \
    apt-get autoremove -y && \
    rm -rf /var/lib/{apt,dpkg,cache,log}/

CMD ["/bin/bash"]

@BJWiley233
Copy link

BJWiley233 commented Apr 20, 2022

So I tried making with make DEVICE=GPU NUMWI=128 TARGETS=70 for the V100s but it still hangs. Is there by any chance a docker image already created that I could test?

@BJWiley233
Copy link

BJWiley233 commented Apr 20, 2022

Well I tried with nvcr.io/hpc/autodock:2020.06 and nvcr.io/hpc/autodock:2020.06-x86_64 even with 296GB RAM and get a seg fault:

$ /opt/AutoDock-GPU/bin/autodock_gpu_128wi -ffile 1hsg.maps.fld -lfile indinavir.pdbqt
AutoDock-GPU version: 09773678fc7e39677061d765b767f4bae8930fb7-dirty

CUDA Setup time 0.261890s
(Thread 0 is setting up Job 0)
Segmentation fault (core dumped)

Go NVIDIA :(

@atillack
Copy link
Collaborator

@BJWiley233 Thank you for reporting.

Sorry I only get the hanging issue in #186

Is what you are observing that the code hangs indefinitely? - or does it eventually terminate (w/ or w/o an error message)?

#186 shows this error output (and subsequently triggered program exit) which occurs when the correct target isn't set:
SetKernelsGpuData copy to cData failed invalid device symbol

70 is one of the default targets so this doesn't apply to what's happening on your system (confirmed by your next post...).

@BJWiley233
Copy link

Yes on my docker image the code hangs indefinitely.

@BJWiley233
Copy link

I just checked again with Nvidia's image nvcr.io/hpc/autodock:2020.06-x86_64 and realized my map files got screwed up transferring from person computer to LSF storage so this actually works. My image doesn't seem to work and still hangs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants