OpenFold on Ampere Nvidia GPUs #143

WillExeter · 2022-06-29T16:54:26Z

Hi,

I am trying to install OpenFold on a machine with two RTX A5000s, but running into issues with PyTorch not supporting cards with compute capability SM 86. I saw on a previous post that you had trained OF on A100s, which will have a similar compute capability. Is there a method for installing OpenFold on newer GPU architectures?

Many thanks!

gahdritz · 2022-06-29T21:16:09Z

Which version of torch are you trying to install?

liuyixin-louis · 2022-06-30T10:29:14Z

I am also facing the same problem. Is anybody find the solutions? Thanks a lot!

jamaliki · 2022-06-30T10:33:31Z

@WillExeter is the problem potentially BFloat16? What is the exact error message

WillExeter · 2022-06-30T11:10:25Z

Which version of torch are you trying to install?

I was on PyTorch 10.2, which supports only up to sm_70. I have since tried updating to a newer cudatoolkit (10.2.89 -> 11.6.0) and PyTorch (1.10.2-py3.7_cuda10.2_cudnn7.6.5_0 -> 1.12.0-py3.7_cuda11.6_cudnn8.3.2_0).

On 10.2 OpenFold runs for a while then fails complaining about compute capability (see response to jamaliki). On the updated versions it fails instantly:

python3 run_pretrained_openfold.py fastas/test ~/alphafold-2.2.0/alphafold_libs/pdb_mmcif/mmcif_files/ --uniref90_database_path ~/alphafold-2.2.0/alphafold_libs/uniref90/uniref90.fasta --mgnify_database_path ~/alphafold-2.2.0/alphafold_libs/mgnify/mgy_clusters_2018_12.fa --pdb70_database_path ~/alphafold-2.2.0/alphafold_libs/pdb70/pdb70 --uniclust30_database_path ~/alphafold-2.2.0/alphafold_libs/uniclust30/uniclust30_2018_08/uniclust30_2018_08 --output_dir ./ --bfd_database_path ~/alphafold-2.2.0/alphafold_libs/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt --model_device "cuda:1" --jackhmmer_binary_path lib/conda/envs/openfold_venv/bin/jackhmmer --hhblits_binary_path lib/conda/envs/openfold_venv/bin/hhblits --hhsearch_binary_path lib/conda/envs/openfold_venv/bin/hhsearch --kalign_binary_path lib/conda/envs/openfold_venv/bin/kalign --config_preset "model_1_ptm" --openfold_checkpoint_path openfold/resources/openfold_params/finetuning_ptm_1.pt
Traceback (most recent call last):
File "run_pretrained_openfold.py", line 34, in
from openfold.config import model_config
File "/home/alphafold/openfold/openfold-main/openfold/init.py", line 1, in
from . import model
File "/home/alphafold/openfold/openfold-main/openfold/model/init.py", line 11, in
_modules = [(m, importlib.import_module("." + m, name)) for m in all]
File "/home/alphafold/openfold/openfold-main/openfold/model/init.py", line 11, in
_modules = [(m, importlib.import_module("." + m, name)) for m in all]
File "/home/alphafold/openfold/openfold-main/lib/conda/envs/openfold_venv/lib/python3.7/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "/home/alphafold/openfold/openfold-main/openfold/model/embedders.py", line 20, in
from openfold.model.primitives import Linear, LayerNorm
File "/home/alphafold/openfold/openfold-main/openfold/model/primitives.py", line 27, in
from openfold.utils.kernel.attention_core import attention_core
File "/home/alphafold/openfold/openfold-main/openfold/utils/kernel/attention_core.py", line 20, in
attn_core_inplace_cuda = importlib.import_module("attn_core_inplace_cuda")
File "/home/alphafold/openfold/openfold-main/lib/conda/envs/openfold_venv/lib/python3.7/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
ImportError: /home/alphafold/openfold/openfold-main/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/openfold-1.0.0-py3.7-linux-x86_64.egg/attn_core_inplace_cuda.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZNK3c1010TensorImpl36is_contiguous_nondefault_policy_implENS_12MemoryFormatE

WillExeter · 2022-06-30T11:13:12Z

@WillExeter is the problem potentially BFloat16? What is the exact error message

Hi, here is the error message:

python3 run_pretrained_openfold.py \

fastas/test \
~/alphafold-2.2.0/alphafold_libs/pdb_mmcif/mmcif_files/ \
--uniref90_database_path ~/alphafold-2.2.0/alphafold_libs/uniref90/uniref90.fasta \
--mgnify_database_path ~/alphafold-2.2.0/alphafold_libs/mgnify/mgy_clusters_2018_12.fa \
--pdb70_database_path ~/alphafold-2.2.0/alphafold_libs/pdb70/pdb70 \
--uniclust30_database_path ~/alphafold-2.2.0/alphafold_libs/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
--output_dir ./ \
--bfd_database_path ~/alphafold-2.2.0/alphafold_libs/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--model_device "cuda:1" \
--jackhmmer_binary_path lib/conda/envs/openfold_venv/bin/jackhmmer \
--hhblits_binary_path lib/conda/envs/openfold_venv/bin/hhblits \
--hhsearch_binary_path lib/conda/envs/openfold_venv/bin/hhsearch \
--kalign_binary_path lib/conda/envs/openfold_venv/bin/kalign \
--config_preset "model_1_ptm" \
--openfold_checkpoint_path openfold/resources/openfold_params/finetuning_ptm_1.pt

INFO:run_pretrained_openfold.py:Generating alignments for sp|Q83EE0|RL21_COXBU...
/home/alphafold/openfold/openfold-main/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/torch/cuda/init.py:106: UserWarning:
NVIDIA RTX A5000 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA RTX A5000 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
INFO:run_pretrained_openfold.py:Loaded OpenFold parameters at openfold/resources/openfold_params/finetuning_ptm_1.pt...
INFO:run_pretrained_openfold.py:Running inference for sp|Q83EE0|RL21_COXBU...
Traceback (most recent call last):
File "run_pretrained_openfold.py", line 481, in
main(args)
File "run_pretrained_openfold.py", line 351, in main
out = run_model(model, working_batch, tag, args)
File "run_pretrained_openfold.py", line 100, in run_model
out = model(batch)
File "/home/alphafold/openfold/openfold-main/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/alphafold/openfold/openfold-main/openfold/model/model.py", line 503, in forward
_recycle=(num_iters > 1)
File "/home/alphafold/openfold/openfold-main/openfold/model/model.py", line 227, in iteration
pair_mask = seq_mask[..., None] * seq_mask[..., None, :]
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

jamaliki · 2022-06-30T12:49:38Z

@WillExeter understood. The issue is, as you noted, that PyTorch needs to be updated with the new cuda toolkit for you to run. The error you mention here:
On 10.2 OpenFold runs for a while then fails complaining about compute capability (see response to jamaliki). On the updated versions it fails instantly:
is because you need to recompile the kernels with the new toolkit. Have you tried running python setup.py install again? You need to force it to recompile.

gahdritz · 2022-06-30T15:19:35Z

I'm running torch==1.10+cu11.3 on a system with CUDA version 11.4 and CUDA driver version 470.82.01. That works with my A100s.

jamaliki · 2022-06-30T16:08:27Z

Right but I think the CUDA 10.2 is too old

WillExeter · 2022-07-01T13:55:52Z

@WillExeter understood. The issue is, as you noted, that PyTorch needs to be updated with the new cuda toolkit for you to run. The error you mention here: On 10.2 OpenFold runs for a while then fails complaining about compute capability (see response to jamaliki). On the updated versions it fails instantly: is because you need to recompile the kernels with the new toolkit. Have you tried running python setup.py install again? You need to force it to recompile.

Hi, that seems to have worked - brilliant.

WillExeter changed the title ~~OpenFold on Ampere nVidia GPUs~~ OpenFold on Ampere Nvidia GPUs Jun 29, 2022

gahdritz closed this as completed Jul 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenFold on Ampere Nvidia GPUs #143

OpenFold on Ampere Nvidia GPUs #143

WillExeter commented Jun 29, 2022 •

edited

gahdritz commented Jun 29, 2022

liuyixin-louis commented Jun 30, 2022

jamaliki commented Jun 30, 2022

WillExeter commented Jun 30, 2022 •

edited

WillExeter commented Jun 30, 2022

jamaliki commented Jun 30, 2022

gahdritz commented Jun 30, 2022 •

edited

jamaliki commented Jun 30, 2022

WillExeter commented Jul 1, 2022

OpenFold on Ampere Nvidia GPUs #143

OpenFold on Ampere Nvidia GPUs #143

Comments

WillExeter commented Jun 29, 2022 • edited

gahdritz commented Jun 29, 2022

liuyixin-louis commented Jun 30, 2022

jamaliki commented Jun 30, 2022

WillExeter commented Jun 30, 2022 • edited

WillExeter commented Jun 30, 2022

jamaliki commented Jun 30, 2022

gahdritz commented Jun 30, 2022 • edited

jamaliki commented Jun 30, 2022

WillExeter commented Jul 1, 2022

WillExeter commented Jun 29, 2022 •

edited

WillExeter commented Jun 30, 2022 •

edited

gahdritz commented Jun 30, 2022 •

edited