Low GPU Utilization when fitting MultitaskFitTransformRegressor #4296

Bruce20040502 · 2025-02-20T11:49:31Z

❓ Questions & Help

I have just begin exploring AI-aided molecular design. I'm trying to run QM9 example code on my own PC but I have found the utilization of GPU is really low

Here's the hardware on my computer

CPU: AMD Ryzen 5 5500
GPU: NVIDIA GeForce RTX 4070 Ti Super
RAM: 2x Kingston 2400MHz 32GB

Here's my development environment

operating system: Windows 10
cudnn=8.2.1
deepchem=2.7.2
keras=2.6.0
tensorflow=2.6.0
python=3.7.16

I don't know is it my computer's problem or my configuration is somewhere inappropriate
PS: if I'm missing some information please tell me

Here's the code I'm running(completely pasted from the examples)

"""
Script that trains Tensorflow multitask models on QM9 dataset.
"""
from __future__ import print_function
from __future__ import division
from __future__ import unicode_literals

import os
import deepchem as dc
import numpy as np
from deepchem.molnet import load_qm9

np.random.seed(123)

print('loading dataset...')

qm9_tasks, datasets, transformers = load_qm9()

print('dataset loaded!')

train_dataset, valid_dataset, test_dataset = datasets

print('loading transformers...')

fit_transformers = [dc.trans.CoulombFitTransformer(train_dataset)]

print('loading metrics...')

regression_metric = [
    dc.metrics.Metric(dc.metrics.mean_absolute_error, mode="regression"),
    dc.metrics.Metric(dc.metrics.pearson_r2_score, mode="regression")
]

print('building model...')

model = dc.models.MultitaskFitTransformRegressor(
    n_tasks=len(qm9_tasks),
    n_features=[29, 29],
    learning_rate=0.001,
    momentum=.8,
    batch_size=32,
    weight_init_stddevs=[1 / np.sqrt(400), 1 / np.sqrt(100), 1 / np.sqrt(100)],
    bias_init_consts=[0., 0., 0.],
    layer_sizes=[400, 100, 100],
    dropouts=[0.01, 0.01, 0.01],
    fit_transformers=fit_transformers,
    seed=123)

print("fitting...")

# Fit trained model
model.fit(train_dataset, nb_epoch=5)

print('fitting complete!')

train_scores = model.evaluate(train_dataset, regression_metric, transformers)
print("Train scores [kcal/mol]")
print(train_scores)

valid_scores = model.evaluate(valid_dataset, regression_metric, transformers)
print("Valid scores [kcal/mol]")
print(valid_scores)

test_scores = model.evaluate(test_dataset, regression_metric, transformers)
print("Test scores [kcal/mol]")
print(test_scores)

krtimisra67 · 2025-02-28T13:48:25Z

Check TensorFlow GPU Usage

Run the following in Python to check if TensorFlow detects your GPU:

import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
print(tf.config.list_physical_devices('GPU'))

If it doesn't detect your GPU, you may need to update CUDA or TensorFlow.
2. Upgrade TensorFlow & Keras

You're using TensorFlow 2.6.0 and Keras 2.6.0, which are outdated and may not be optimized for your RTX 4070 Ti Super. Upgrade to the latest compatible versions:

pip install --upgrade tensorflow keras

Ensure your CUDA and cuDNN versions are compatible with the new TensorFlow version.
3. Install Proper CUDA & cuDNN Versions

Your CUDA 8.2.1 is not compatible with TensorFlow 2.6.0. Check which versions of CUDA and cuDNN your TensorFlow supports. You likely need CUDA 11.x and cuDNN 8.x for best performance.

Use:

pip show tensorflow

Then go to TensorFlow GPU installation guide and match the required CUDA/cuDNN versions.
4. Force TensorFlow to Use GPU

Add this to your script before model training:

import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
tf.config.experimental.set_virtual_device_configuration(
gpus[0],
[tf.config.experimental.VirtualDeviceConfiguration(memory_limit=8192)]) # Adjust if needed
except RuntimeError as e:
print(e)

This forces TensorFlow to allocate memory properly.
5. Check GPU Utilization While Running

Run the following command while your script is executing:

nvidia-smi

It will show real-time GPU usage. If it remains low, the model might not be optimized for GPU training.
6. Try Another Model

DeepChem’s MultitaskFitTransformRegressor may not be fully optimized for GPU. Try running another DeepChem model, such as GraphConvModel:

model = dc.models.GraphConvModel(
len(qm9_tasks), mode='regression', batch_size=32, learning_rate=0.001
)

If this improves GPU usage, the issue is with your specific model.
7. Use Mixed Precision (Optional)

Since you have an RTX 4070 Ti Super, you can try mixed precision training for better GPU utilization:

from tensorflow.keras import mixed_precision
policy = mixed_precision.Policy('mixed_float16')
mixed_precision.set_global_policy(policy)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Low GPU Utilization when fitting MultitaskFitTransformRegressor #4296

Low GPU Utilization when fitting MultitaskFitTransformRegressor #4296

Bruce20040502 commented Feb 20, 2025 •

edited

Loading

krtimisra67 commented Feb 28, 2025

Low GPU Utilization when fitting MultitaskFitTransformRegressor #4296

Low GPU Utilization when fitting MultitaskFitTransformRegressor #4296

Comments

Bruce20040502 commented Feb 20, 2025 • edited Loading

❓ Questions & Help

krtimisra67 commented Feb 28, 2025

Bruce20040502 commented Feb 20, 2025 •

edited

Loading