Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low GPU Utilization when fitting MultitaskFitTransformRegressor #4296

Open
Bruce20040502 opened this issue Feb 20, 2025 · 1 comment
Open

Comments

@Bruce20040502
Copy link

Bruce20040502 commented Feb 20, 2025

❓ Questions & Help

I have just begin exploring AI-aided molecular design. I'm trying to run QM9 example code on my own PC but I have found the utilization of GPU is really low

Here's the hardware on my computer

  • CPU: AMD Ryzen 5 5500
  • GPU: NVIDIA GeForce RTX 4070 Ti Super
  • RAM: 2x Kingston 2400MHz 32GB

Here's my development environment

  • operating system: Windows 10
  • cudnn=8.2.1
  • deepchem=2.7.2
  • keras=2.6.0
  • tensorflow=2.6.0
  • python=3.7.16

I don't know is it my computer's problem or my configuration is somewhere inappropriate
PS: if I'm missing some information please tell me

Here's the code I'm running(completely pasted from the examples)

"""
Script that trains Tensorflow multitask models on QM9 dataset.
"""
from __future__ import print_function
from __future__ import division
from __future__ import unicode_literals

import os
import deepchem as dc
import numpy as np
from deepchem.molnet import load_qm9

np.random.seed(123)

print('loading dataset...')

qm9_tasks, datasets, transformers = load_qm9()

print('dataset loaded!')

train_dataset, valid_dataset, test_dataset = datasets

print('loading transformers...')

fit_transformers = [dc.trans.CoulombFitTransformer(train_dataset)]

print('loading metrics...')

regression_metric = [
    dc.metrics.Metric(dc.metrics.mean_absolute_error, mode="regression"),
    dc.metrics.Metric(dc.metrics.pearson_r2_score, mode="regression")
]

print('building model...')

model = dc.models.MultitaskFitTransformRegressor(
    n_tasks=len(qm9_tasks),
    n_features=[29, 29],
    learning_rate=0.001,
    momentum=.8,
    batch_size=32,
    weight_init_stddevs=[1 / np.sqrt(400), 1 / np.sqrt(100), 1 / np.sqrt(100)],
    bias_init_consts=[0., 0., 0.],
    layer_sizes=[400, 100, 100],
    dropouts=[0.01, 0.01, 0.01],
    fit_transformers=fit_transformers,
    seed=123)

print("fitting...")

# Fit trained model
model.fit(train_dataset, nb_epoch=5)

print('fitting complete!')

train_scores = model.evaluate(train_dataset, regression_metric, transformers)
print("Train scores [kcal/mol]")
print(train_scores)

valid_scores = model.evaluate(valid_dataset, regression_metric, transformers)
print("Valid scores [kcal/mol]")
print(valid_scores)

test_scores = model.evaluate(test_dataset, regression_metric, transformers)
print("Test scores [kcal/mol]")
print(test_scores)
@krtimisra67
Copy link

  1. Check TensorFlow GPU Usage

Run the following in Python to check if TensorFlow detects your GPU:

import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
print(tf.config.list_physical_devices('GPU'))

If it doesn't detect your GPU, you may need to update CUDA or TensorFlow.
2. Upgrade TensorFlow & Keras

You're using TensorFlow 2.6.0 and Keras 2.6.0, which are outdated and may not be optimized for your RTX 4070 Ti Super. Upgrade to the latest compatible versions:

pip install --upgrade tensorflow keras

Ensure your CUDA and cuDNN versions are compatible with the new TensorFlow version.
3. Install Proper CUDA & cuDNN Versions

Your CUDA 8.2.1 is not compatible with TensorFlow 2.6.0. Check which versions of CUDA and cuDNN your TensorFlow supports. You likely need CUDA 11.x and cuDNN 8.x for best performance.

Use:

pip show tensorflow

Then go to TensorFlow GPU installation guide and match the required CUDA/cuDNN versions.
4. Force TensorFlow to Use GPU

Add this to your script before model training:

import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
tf.config.experimental.set_virtual_device_configuration(
gpus[0],
[tf.config.experimental.VirtualDeviceConfiguration(memory_limit=8192)]) # Adjust if needed
except RuntimeError as e:
print(e)

This forces TensorFlow to allocate memory properly.
5. Check GPU Utilization While Running

Run the following command while your script is executing:

nvidia-smi

It will show real-time GPU usage. If it remains low, the model might not be optimized for GPU training.
6. Try Another Model

DeepChem’s MultitaskFitTransformRegressor may not be fully optimized for GPU. Try running another DeepChem model, such as GraphConvModel:

model = dc.models.GraphConvModel(
len(qm9_tasks), mode='regression', batch_size=32, learning_rate=0.001
)

If this improves GPU usage, the issue is with your specific model.
7. Use Mixed Precision (Optional)

Since you have an RTX 4070 Ti Super, you can try mixed precision training for better GPU utilization:

from tensorflow.keras import mixed_precision
policy = mixed_precision.Policy('mixed_float16')
mixed_precision.set_global_policy(policy)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants