New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failure on Nvidia devices with compute capability 8.6 #55
Comments
c1bf7af 7f0b5ad solve this issue with the Dockerfile changes. Thanks a lot @chrisroat |
Hi, I think I'm having a similar error. I'm on
|
i have the same problem with ampere cards. Can you please fix this problem? |
Could you try changing https://github.com/deepmind/alphafold/blob/main/docker/Dockerfile#L15 to CUDA 11.1? According to https://en.wikipedia.org/wiki/CUDA, 11.1+ is required for CC 8.6 (some of the more recent Ampere series but not the A100). We will be looking at upgrading the CUDA version in this repository, but this requires careful benchmarking to ensure that there are no performance or accuracy regressions, so it will be faster for you to try this locally for now. |
Thank you @tfgg. I experimented a bit with the Dockerfile initially. I can confirm the current master works with Nvidia GPUs based on the GA100 chip (A100, A30). Upgraded the CUDA toolkit to 11.2 for GPUs based on the GA102 chip (A10, A40, RTX AX000) and everything worked fine so far. |
Thank you! I confirm to |
11.2 has compatibility issues i believe.11.1 works |
I god rid of the error also. Thanks! Running dockers on CentOS 7.9.2009 (Core), with NVIDIA GeForce RTX 3090. |
This was fixed in 57a2455. |
Just a FYI. Running on an Ampere A10 with CC 8.6.
I0725 13:13:22.066373 139761799472960 run_docker.py:200] 2021-07-25 03:13:22.065889: W external/org_tensorflow/tensorflow/stream_executor/gpu/asm_compiler.cc:235] Falling back to the CUDA driver for PTX compilation; ptxas does not support CC 8.6
I0725 13:13:22.066596 139761799472960 run_docker.py:200] 2021-07-25 03:13:22.065923: W external/org_tensorflow/tensorflow/stream_executor/gpu/asm_compiler.cc:238] Used ptxas at ptxas
I0725 13:13:22.066839 139761799472960 run_docker.py:200] 2021-07-25 03:13:22.066559: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:625] failed to get PTX kernel "shift_right_logical_3" from module: CUDA_ERROR_NOT_FOUND: named symbol not found
I0725 13:13:22.067006 139761799472960 run_docker.py:200] 2021-07-25 03:13:22.066620: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2040] Execution of replica 0 failed: Internal: Could not find the corresponding function
I0725 13:13:22.068676 139761799472960 run_docker.py:200] Traceback (most recent call last):
I0725 13:13:22.068786 139761799472960 run_docker.py:200] File "/app/alphafold/run_alphafold.py", line 303, in
I0725 13:13:22.068876 139761799472960 run_docker.py:200] app.run(main)
I0725 13:13:22.068992 139761799472960 run_docker.py:200] File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 312, in run
I0725 13:13:22.069080 139761799472960 run_docker.py:200] _run_main(main, args)
I0725 13:13:22.069165 139761799472960 run_docker.py:200] File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main
I0725 13:13:22.069255 139761799472960 run_docker.py:200] sys.exit(main(argv))
I0725 13:13:22.069342 139761799472960 run_docker.py:200] File "/app/alphafold/run_alphafold.py", line 285, in main
I0725 13:13:22.069428 139761799472960 run_docker.py:200] random_seed=random_seed)
I0725 13:13:22.069509 139761799472960 run_docker.py:200] File "/app/alphafold/run_alphafold.py", line 149, in predict_structure
I0725 13:13:22.069588 139761799472960 run_docker.py:200] prediction_result = model_runner.predict(processed_feature_dict)
I0725 13:13:22.069675 139761799472960 run_docker.py:200] File "/app/alphafold/alphafold/model/model.py", line 134, in predict
I0725 13:13:22.069755 139761799472960 run_docker.py:200] result = self.apply(self.params, jax.random.PRNGKey(0), feat)
I0725 13:13:22.069834 139761799472960 run_docker.py:200] File "/opt/conda/lib/python3.7/site-packages/jax/_src/random.py", line 75, in PRNGKey
I0725 13:13:22.069914 139761799472960 run_docker.py:200] k1 = convert(lax.shift_right_logical(seed_arr, lax._const(seed_arr, 32)))
I0725 13:13:22.070003 139761799472960 run_docker.py:200] File "/opt/conda/lib/python3.7/site-packages/jax/_src/lax/lax.py", line 382, in shift_right_logical
I0725 13:13:22.070081 139761799472960 run_docker.py:200] return shift_right_logical_p.bind(x, y)
I0725 13:13:22.070159 139761799472960 run_docker.py:200] File "/opt/conda/lib/python3.7/site-packages/jax/core.py", line 264, in bind
I0725 13:13:22.070236 139761799472960 run_docker.py:200] out = top_trace.process_primitive(self, tracers, params)
I0725 13:13:22.070315 139761799472960 run_docker.py:200] File "/opt/conda/lib/python3.7/site-packages/jax/core.py", line 604, in process_primitive
I0725 13:13:22.070394 139761799472960 run_docker.py:200] return primitive.impl(*tracers, **params)
I0725 13:13:22.070472 139761799472960 run_docker.py:200] File "/opt/conda/lib/python3.7/site-packages/jax/interpreters/xla.py", line 262, in apply_primitive
I0725 13:13:22.070549 139761799472960 run_docker.py:200] return compiled_fun(*args)
I0725 13:13:22.070631 139761799472960 run_docker.py:200] File "/opt/conda/lib/python3.7/site-packages/jax/interpreters/xla.py", line 378, in _execute_compiled_primitive
I0725 13:13:22.070705 139761799472960 run_docker.py:200] out_bufs = compiled.execute(input_bufs)
I0725 13:13:22.070770 139761799472960 run_docker.py:200] RuntimeError: Internal: Could not find the corresponding function
The text was updated successfully, but these errors were encountered: