how to unleash my full GPU power #7

WJluluxiu · 2021-09-19T15:38:57Z

Hi, there. I have 2 RTX2080Ti(11G) GPUs with CUDA-11.2, when i input a sequence less than 1000 amino acids, it can run normally but only one of my GPU works. When i tried a ~1200 sequence or complex it will throw 'ResourceExhausted' error. So the problem is how do i let all my GPUs work and be able to calculate larger sequence or complex?

YoshitakaMo · 2021-09-19T16:50:28Z

In my environment (RTX3090 x 2), localcolabfold could recognize all the two GPUs:

2021-09-20 01:22:52.133719: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-20 01:23:16.382686: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-09-20 01:23:16.438671: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:81:00.0 name: GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s
2021-09-20 01:23:16.440621: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s
2021-09-20 01:23:16.440666: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-20 01:23:17.154445: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-09-20 01:23:17.154595: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-09-20 01:23:17.554468: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-09-20 01:23:18.287336: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-09-20 01:23:19.093015: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-09-20 01:23:19.283513: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-09-20 01:23:19.519908: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-09-20 01:23:19.527578: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1
2021-09-20 01:25:46.107795: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-09-20 01:25:46.110540: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:

Adding visible gpu devices: 0, 1 appeared in my log. How about you?

WJluluxiu · 2021-09-19T18:13:02Z

In my environment (RTX3090 x 2), localcolabfold could recognize all the two GPUs:

2021-09-20 01:22:52.133719: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-20 01:23:16.382686: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-09-20 01:23:16.438671: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:81:00.0 name: GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s
2021-09-20 01:23:16.440621: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s
2021-09-20 01:23:16.440666: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-20 01:23:17.154445: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-09-20 01:23:17.154595: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-09-20 01:23:17.554468: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-09-20 01:23:18.287336: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-09-20 01:23:19.093015: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-09-20 01:23:19.283513: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-09-20 01:23:19.519908: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-09-20 01:23:19.527578: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1
2021-09-20 01:25:46.107795: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-09-20 01:25:46.110540: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:

Adding visible gpu devices: 0, 1 appeared in my log. How about you?

yes, both two of my GPUs are detected, same code. But whatever i input a <1000 sequence or a ~1200 sequence, only one GPU's memory is fullfilled, the other is cold. And when a ~1200 sequence is uploaded, there would be a 'ResourceExhausted' error.

YoshitakaMo · 2021-09-20T15:21:32Z

There is a discussion about the use of multiple GPUs, google-deepmind/alphafold#149.
Setting environment variable TF_FORCE_UNIFIED_MEMORY to 1 and XLA_PYTHON_CLIENT_MEM_FRACTION to 4.0 and executing colabfold-conda/bin/python3.7 runner.py/runner_af2advanced.py will solve this issue.
For bash,

#!/bin/bash
export TF_FORCE_UNIFIED_MEMORY=1
export XLA_PYTHON_CLIENT_MEM_FRACTION=4.0
colabfold-conda/bin/python3.7 runner.py

export XLA_PYTHON_CLIENT_MEM_FRACTION=0.5 may be better depending on the environment.

See also:
https://sbgrid.org/wiki/examples/alphafold2

couunderbarz · 2021-09-21T02:10:06Z

I'm using g4dn.12xlarge instance on AWS.
It has four T4 GPUs. localcolabfold seems to recognize all of them.
However, when I calculate about 2,000 sequences, it throws 'Resource exhausted' error even though I defined those environmental variables at the beginning of 'runner.py'.

2021-09-21 02:04:47.037978: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1, 2, 3

RuntimeError: Resource exhausted: Failed to allocate request for 32.80GiB (35220792856B) on device ordinal 0

YoshitakaMo · 2021-09-21T03:40:29Z

I'm completely ignorant about AWS, but which OS does the AWS use, Linux or Windows?
I have heard of a bug that we can't use multiple GPUs on Windows even though we set the above environment variables, although I've not checked it because I don't have Windows.

couunderbarz · 2021-09-21T08:49:21Z

Thank you for your reply.
I’m using Linux.
Now I’m trying to use GCP instead of AWS, and these environmental variables seem to work fine on GCP.
RAM is used, so unified memory seems to work.

I’m using a Deep Learning image named Google, Deep Learning Image: TensorFlow Enterprise 2.6, m79 CUDA 110 on GCP and Deep Learning AMI (Ubuntu 18.04) Version 49.0 on AWS.
The difference between these images might be the reason why unified memory didn't work in my environment, but I haven't investigated it deeply.

YoshitakaMo · 2021-09-21T10:56:55Z

Adding an environment variable export TF_FORCE_GPU_ALLOW_GROWTH=true may solve this issue. See also: https://www.tensorflow.org/guide/gpu

WJluluxiu · 2021-09-21T11:27:01Z

Adding an environment variable export TF_FORCE_GPU_ALLOW_GROWTH=true may solve this issue. See also: https://www.tensorflow.org/guide/gpu

it seems that, still only one GPU is working.

WJluluxiu · 2021-09-21T11:31:10Z

Adding an environment variable export TF_FORCE_GPU_ALLOW_GROWTH=true may solve this issue. See also: https://www.tensorflow.org/guide/gpu

i noticed that 'runner.py' seems to use both of the GPUs, but why only NO.0 GPU memory is fulfilled ?

zach-hensel · 2021-09-29T00:23:49Z

The first line here was the trick for me on top of other things in this thread.

In total:

export NVIDIA_VISIBLE_DEVICES='all'
export TF_FORCE_UNIFIED_MEMORY='1'
export XLA_PYTHON_CLIENT_MEM_FRACTION='4.0'

Taken from what's in run_docker.py from alphafold:

environment={
          'NVIDIA_VISIBLE_DEVICES': FLAGS.gpu_devices,
          # The following flags allow us to make predictions on proteins that
          # would typically be too long to fit into GPU memory.
          'TF_FORCE_UNIFIED_MEMORY': '1',
          'XLA_PYTHON_CLIENT_MEM_FRACTION': '4.0',
      }

With 2x1080TI that was OK for a 1500-residue tetramer plus running another 400-residue prediction at the same time... however the load wasn't split between the two GPUs evenly and it was one of those alphafold predictions that gives two monomers with nearly identical coordinates and blows up the relaxation step!

YoshitakaMo closed this as completed Oct 4, 2021

cochan424 mentioned this issue Oct 19, 2021

GPU Memory failed to allocate #22

Closed

yujifujeida mentioned this issue Jan 11, 2022

CUDA_ERROR_OUT_OF_MEMORY: out of memory エラー #50

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to unleash my full GPU power #7

how to unleash my full GPU power #7

WJluluxiu commented Sep 19, 2021

YoshitakaMo commented Sep 19, 2021

WJluluxiu commented Sep 19, 2021

YoshitakaMo commented Sep 20, 2021 •

edited

Loading

couunderbarz commented Sep 21, 2021

YoshitakaMo commented Sep 21, 2021 •

edited

Loading

couunderbarz commented Sep 21, 2021

YoshitakaMo commented Sep 21, 2021

WJluluxiu commented Sep 21, 2021

WJluluxiu commented Sep 21, 2021

zach-hensel commented Sep 29, 2021

how to unleash my full GPU power #7

how to unleash my full GPU power #7

Comments

WJluluxiu commented Sep 19, 2021

YoshitakaMo commented Sep 19, 2021

WJluluxiu commented Sep 19, 2021

YoshitakaMo commented Sep 20, 2021 • edited Loading

couunderbarz commented Sep 21, 2021

YoshitakaMo commented Sep 21, 2021 • edited Loading

couunderbarz commented Sep 21, 2021

YoshitakaMo commented Sep 21, 2021

WJluluxiu commented Sep 21, 2021

WJluluxiu commented Sep 21, 2021

zach-hensel commented Sep 29, 2021

YoshitakaMo commented Sep 20, 2021 •

edited

Loading

YoshitakaMo commented Sep 21, 2021 •

edited

Loading