Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to unleash my full GPU power #7

Closed
WJluluxiu opened this issue Sep 19, 2021 · 10 comments
Closed

how to unleash my full GPU power #7

WJluluxiu opened this issue Sep 19, 2021 · 10 comments

Comments

@WJluluxiu
Copy link

Hi, there. I have 2 RTX2080Ti(11G) GPUs with CUDA-11.2, when i input a sequence less than 1000 amino acids, it can run normally but only one of my GPU works. When i tried a ~1200 sequence or complex it will throw 'ResourceExhausted' error. So the problem is how do i let all my GPUs work and be able to calculate larger sequence or complex?

@YoshitakaMo
Copy link
Owner

In my environment (RTX3090 x 2), localcolabfold could recognize all the two GPUs:

2021-09-20 01:22:52.133719: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-20 01:23:16.382686: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-09-20 01:23:16.438671: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:81:00.0 name: GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s
2021-09-20 01:23:16.440621: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s
2021-09-20 01:23:16.440666: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-20 01:23:17.154445: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-09-20 01:23:17.154595: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-09-20 01:23:17.554468: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-09-20 01:23:18.287336: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-09-20 01:23:19.093015: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-09-20 01:23:19.283513: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-09-20 01:23:19.519908: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-09-20 01:23:19.527578: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1
2021-09-20 01:25:46.107795: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-09-20 01:25:46.110540: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:

Adding visible gpu devices: 0, 1 appeared in my log. How about you?

@WJluluxiu
Copy link
Author

In my environment (RTX3090 x 2), localcolabfold could recognize all the two GPUs:

2021-09-20 01:22:52.133719: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-20 01:23:16.382686: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-09-20 01:23:16.438671: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:81:00.0 name: GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s
2021-09-20 01:23:16.440621: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s
2021-09-20 01:23:16.440666: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-20 01:23:17.154445: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-09-20 01:23:17.154595: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-09-20 01:23:17.554468: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-09-20 01:23:18.287336: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-09-20 01:23:19.093015: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-09-20 01:23:19.283513: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-09-20 01:23:19.519908: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-09-20 01:23:19.527578: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1
2021-09-20 01:25:46.107795: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-09-20 01:25:46.110540: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:

Adding visible gpu devices: 0, 1 appeared in my log. How about you?

yes, both two of my GPUs are detected, same code. But whatever i input a <1000 sequence or a ~1200 sequence, only one GPU's memory is fullfilled, the other is cold. And when a ~1200 sequence is uploaded, there would be a 'ResourceExhausted' error.

@YoshitakaMo
Copy link
Owner

YoshitakaMo commented Sep 20, 2021

There is a discussion about the use of multiple GPUs, google-deepmind/alphafold#149.
Setting environment variable TF_FORCE_UNIFIED_MEMORY to 1 and XLA_PYTHON_CLIENT_MEM_FRACTION to 4.0 and executing colabfold-conda/bin/python3.7 runner.py/runner_af2advanced.py will solve this issue.
For bash,

#!/bin/bash
export TF_FORCE_UNIFIED_MEMORY=1
export XLA_PYTHON_CLIENT_MEM_FRACTION=4.0
colabfold-conda/bin/python3.7 runner.py

export XLA_PYTHON_CLIENT_MEM_FRACTION=0.5 may be better depending on the environment.

See also:
https://sbgrid.org/wiki/examples/alphafold2

@couunderbarz
Copy link

I'm using g4dn.12xlarge instance on AWS.
It has four T4 GPUs. localcolabfold seems to recognize all of them.
However, when I calculate about 2,000 sequences, it throws 'Resource exhausted' error even though I defined those environmental variables at the beginning of 'runner.py'.

2021-09-21 02:04:47.037978: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1, 2, 3
RuntimeError: Resource exhausted: Failed to allocate request for 32.80GiB (35220792856B) on device ordinal 0

@YoshitakaMo
Copy link
Owner

YoshitakaMo commented Sep 21, 2021

I'm completely ignorant about AWS, but which OS does the AWS use, Linux or Windows?
I have heard of a bug that we can't use multiple GPUs on Windows even though we set the above environment variables, although I've not checked it because I don't have Windows.

@couunderbarz
Copy link

Thank you for your reply.
I’m using Linux.
Now I’m trying to use GCP instead of AWS, and these environmental variables seem to work fine on GCP.
RAM is used, so unified memory seems to work.

I’m using a Deep Learning image named Google, Deep Learning Image: TensorFlow Enterprise 2.6, m79 CUDA 110 on GCP and Deep Learning AMI (Ubuntu 18.04) Version 49.0 on AWS.
The difference between these images might be the reason why unified memory didn't work in my environment, but I haven't investigated it deeply.

@YoshitakaMo
Copy link
Owner

Adding an environment variable export TF_FORCE_GPU_ALLOW_GROWTH=true may solve this issue. See also: https://www.tensorflow.org/guide/gpu

@WJluluxiu
Copy link
Author

Adding an environment variable export TF_FORCE_GPU_ALLOW_GROWTH=true may solve this issue. See also: https://www.tensorflow.org/guide/gpu

it seems that, still only one GPU is working.
image
image

@WJluluxiu
Copy link
Author

Adding an environment variable export TF_FORCE_GPU_ALLOW_GROWTH=true may solve this issue. See also: https://www.tensorflow.org/guide/gpu

i noticed that 'runner.py' seems to use both of the GPUs, but why only NO.0 GPU memory is fulfilled ?
image

@zach-hensel
Copy link

The first line here was the trick for me on top of other things in this thread.

In total:

export NVIDIA_VISIBLE_DEVICES='all'
export TF_FORCE_UNIFIED_MEMORY='1'
export XLA_PYTHON_CLIENT_MEM_FRACTION='4.0'

Taken from what's in run_docker.py from alphafold:

environment={
          'NVIDIA_VISIBLE_DEVICES': FLAGS.gpu_devices,
          # The following flags allow us to make predictions on proteins that
          # would typically be too long to fit into GPU memory.
          'TF_FORCE_UNIFIED_MEMORY': '1',
          'XLA_PYTHON_CLIENT_MEM_FRACTION': '4.0',
      }

With 2x1080TI that was OK for a 1500-residue tetramer plus running another 400-residue prediction at the same time... however the load wasn't split between the two GPUs evenly and it was one of those alphafold predictions that gives two monomers with nearly identical coordinates and blows up the relaxation step!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants