New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA_ERROR_ILLEGAL_ADDRESS error with AlphaFold multimer 2.3.0 #667
Comments
Hi @leiterenato |
a different non-relax model issue seems to have been resolved by using cuda 11.8 #646 (comment) assuming that doesn't work, can try use_gpu_relax=False, or turn off relax entirely with run_relax=False. we will attempt to address the problem more fully in the new year. |
We think this is due to jax version change from 0.3.17 to 0.3.25. We don't want to revert jax version though, so are looking for workarounds. |
The latest fix should solve this issue. Thank you for your patience. |
Fixed in AlphaFold v2.3.1. Thank you! |
Thank you! |
Hi!
I am trying to run AlphaFold 2.3.0 multimer and encountered this error:
Execution of replica 0 failed: INTERNAL: Failed to load in-memory CUBIN: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
(see details below). I was wondering if you could help me resolve it? Thank you very much!!Machine spec etc.:
When I searched related errors online, it seems there are generally two solutions proposed: (1) change to a newer cuda version https://github.com/deepmind/dm-haiku/issues/204, or (2) disable unified memory https://github.com/deepmind/alphafold/issues/406.
I tried using cuda 11.4.0 instead of 11.1.1 by changing the following lines in Dockerfile, but the same error persists.
As for (2) disable unified memory, I am worried that this would give me out of memory error given the size of the protein.
Not sure if this is relevant, but this is a recent problem and prediction for this and other similarly-sized complexes worked fine before (was using v2.2.0 before, and I wonder if this is an issue with e.g. version of jax or jaxlib).
Thank you very much!
Error message:
The text was updated successfully, but these errors were encountered: