Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenFold on Ampere Nvidia GPUs #143

Closed
WillExeter opened this issue Jun 29, 2022 · 9 comments
Closed

OpenFold on Ampere Nvidia GPUs #143

WillExeter opened this issue Jun 29, 2022 · 9 comments

Comments

@WillExeter
Copy link

WillExeter commented Jun 29, 2022

Hi,

I am trying to install OpenFold on a machine with two RTX A5000s, but running into issues with PyTorch not supporting cards with compute capability SM 86. I saw on a previous post that you had trained OF on A100s, which will have a similar compute capability. Is there a method for installing OpenFold on newer GPU architectures?

Many thanks!

@WillExeter WillExeter changed the title OpenFold on Ampere nVidia GPUs OpenFold on Ampere Nvidia GPUs Jun 29, 2022
@gahdritz
Copy link
Collaborator

Which version of torch are you trying to install?

@liuyixin-louis
Copy link

I am also facing the same problem. Is anybody find the solutions? Thanks a lot!

@jamaliki
Copy link

@WillExeter is the problem potentially BFloat16? What is the exact error message

@WillExeter
Copy link
Author

WillExeter commented Jun 30, 2022

Which version of torch are you trying to install?

I was on PyTorch 10.2, which supports only up to sm_70. I have since tried updating to a newer cudatoolkit (10.2.89 -> 11.6.0) and PyTorch (1.10.2-py3.7_cuda10.2_cudnn7.6.5_0 -> 1.12.0-py3.7_cuda11.6_cudnn8.3.2_0).

On 10.2 OpenFold runs for a while then fails complaining about compute capability (see response to jamaliki). On the updated versions it fails instantly:

python3 run_pretrained_openfold.py fastas/test ~/alphafold-2.2.0/alphafold_libs/pdb_mmcif/mmcif_files/ --uniref90_database_path ~/alphafold-2.2.0/alphafold_libs/uniref90/uniref90.fasta --mgnify_database_path ~/alphafold-2.2.0/alphafold_libs/mgnify/mgy_clusters_2018_12.fa --pdb70_database_path ~/alphafold-2.2.0/alphafold_libs/pdb70/pdb70 --uniclust30_database_path ~/alphafold-2.2.0/alphafold_libs/uniclust30/uniclust30_2018_08/uniclust30_2018_08 --output_dir ./ --bfd_database_path ~/alphafold-2.2.0/alphafold_libs/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt --model_device "cuda:1" --jackhmmer_binary_path lib/conda/envs/openfold_venv/bin/jackhmmer --hhblits_binary_path lib/conda/envs/openfold_venv/bin/hhblits --hhsearch_binary_path lib/conda/envs/openfold_venv/bin/hhsearch --kalign_binary_path lib/conda/envs/openfold_venv/bin/kalign --config_preset "model_1_ptm" --openfold_checkpoint_path openfold/resources/openfold_params/finetuning_ptm_1.pt
Traceback (most recent call last):
File "run_pretrained_openfold.py", line 34, in
from openfold.config import model_config
File "/home/alphafold/openfold/openfold-main/openfold/init.py", line 1, in
from . import model
File "/home/alphafold/openfold/openfold-main/openfold/model/init.py", line 11, in
_modules = [(m, importlib.import_module("." + m, name)) for m in all]
File "/home/alphafold/openfold/openfold-main/openfold/model/init.py", line 11, in
_modules = [(m, importlib.import_module("." + m, name)) for m in all]
File "/home/alphafold/openfold/openfold-main/lib/conda/envs/openfold_venv/lib/python3.7/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "/home/alphafold/openfold/openfold-main/openfold/model/embedders.py", line 20, in
from openfold.model.primitives import Linear, LayerNorm
File "/home/alphafold/openfold/openfold-main/openfold/model/primitives.py", line 27, in
from openfold.utils.kernel.attention_core import attention_core
File "/home/alphafold/openfold/openfold-main/openfold/utils/kernel/attention_core.py", line 20, in
attn_core_inplace_cuda = importlib.import_module("attn_core_inplace_cuda")
File "/home/alphafold/openfold/openfold-main/lib/conda/envs/openfold_venv/lib/python3.7/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
ImportError: /home/alphafold/openfold/openfold-main/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/openfold-1.0.0-py3.7-linux-x86_64.egg/attn_core_inplace_cuda.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZNK3c1010TensorImpl36is_contiguous_nondefault_policy_implENS_12MemoryFormatE

@WillExeter
Copy link
Author

@WillExeter is the problem potentially BFloat16? What is the exact error message

Hi, here is the error message:

python3 run_pretrained_openfold.py \

fastas/test \
~/alphafold-2.2.0/alphafold_libs/pdb_mmcif/mmcif_files/ \
--uniref90_database_path ~/alphafold-2.2.0/alphafold_libs/uniref90/uniref90.fasta \
--mgnify_database_path ~/alphafold-2.2.0/alphafold_libs/mgnify/mgy_clusters_2018_12.fa \
--pdb70_database_path ~/alphafold-2.2.0/alphafold_libs/pdb70/pdb70 \
--uniclust30_database_path ~/alphafold-2.2.0/alphafold_libs/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
--output_dir ./ \
--bfd_database_path ~/alphafold-2.2.0/alphafold_libs/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--model_device "cuda:1" \
--jackhmmer_binary_path lib/conda/envs/openfold_venv/bin/jackhmmer \
--hhblits_binary_path lib/conda/envs/openfold_venv/bin/hhblits \
--hhsearch_binary_path lib/conda/envs/openfold_venv/bin/hhsearch \
--kalign_binary_path lib/conda/envs/openfold_venv/bin/kalign \
--config_preset "model_1_ptm" \
--openfold_checkpoint_path openfold/resources/openfold_params/finetuning_ptm_1.pt

INFO:run_pretrained_openfold.py:Generating alignments for sp|Q83EE0|RL21_COXBU...
/home/alphafold/openfold/openfold-main/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/torch/cuda/init.py:106: UserWarning:
NVIDIA RTX A5000 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA RTX A5000 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
INFO:run_pretrained_openfold.py:Loaded OpenFold parameters at openfold/resources/openfold_params/finetuning_ptm_1.pt...
INFO:run_pretrained_openfold.py:Running inference for sp|Q83EE0|RL21_COXBU...
Traceback (most recent call last):
File "run_pretrained_openfold.py", line 481, in
main(args)
File "run_pretrained_openfold.py", line 351, in main
out = run_model(model, working_batch, tag, args)
File "run_pretrained_openfold.py", line 100, in run_model
out = model(batch)
File "/home/alphafold/openfold/openfold-main/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/alphafold/openfold/openfold-main/openfold/model/model.py", line 503, in forward
_recycle=(num_iters > 1)
File "/home/alphafold/openfold/openfold-main/openfold/model/model.py", line 227, in iteration
pair_mask = seq_mask[..., None] * seq_mask[..., None, :]
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

@jamaliki
Copy link

@WillExeter understood. The issue is, as you noted, that PyTorch needs to be updated with the new cuda toolkit for you to run. The error you mention here:
On 10.2 OpenFold runs for a while then fails complaining about compute capability (see response to jamaliki). On the updated versions it fails instantly:
is because you need to recompile the kernels with the new toolkit. Have you tried running python setup.py install again? You need to force it to recompile.

@gahdritz
Copy link
Collaborator

gahdritz commented Jun 30, 2022

I'm running torch==1.10+cu11.3 on a system with CUDA version 11.4 and CUDA driver version 470.82.01. That works with my A100s.

@jamaliki
Copy link

Right but I think the CUDA 10.2 is too old

@WillExeter
Copy link
Author

@WillExeter understood. The issue is, as you noted, that PyTorch needs to be updated with the new cuda toolkit for you to run. The error you mention here: On 10.2 OpenFold runs for a while then fails complaining about compute capability (see response to jamaliki). On the updated versions it fails instantly: is because you need to recompile the kernels with the new toolkit. Have you tried running python setup.py install again? You need to force it to recompile.

Hi, that seems to have worked - brilliant.

@gahdritz gahdritz closed this as completed Jul 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants