Skip to content
This repository was archived by the owner on Jul 9, 2025. It is now read-only.
This repository was archived by the owner on Jul 9, 2025. It is now read-only.

train_folder_ff does not utilize GPU #115

Open
@rashigeek

Description

@rashigeek

I am trying to train a force fields model by using a variation of the following command that is mentioned in the readme to match my directories:

train_folder_ff.py --root_dir "alignn/examples/sample_data_ff" --config "alignn/examples/sample_data_ff/config_example_atomwise.json" --output_dir=temp
However, training is super slow and does not seem to utilize the GPU at all. This can be further confirmed by running nvidia-smi and viewing the output during training:
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.43.02 Driver Version: 535.43.02 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3060 Off | 00000000:07:00.0 Off | N/A |
| 0% 42C P8 13W / 170W | 71MiB / 12288MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1405 G /usr/lib/xorg/Xorg 56MiB |
| 0 N/A N/A 1571 G /usr/bin/gnome-shell 5MiB |
+---------------------------------------------------------------------------------------+





If I am training a model that does not utilize force fields, the GPU is used.
For example, running train_folder.py --root_dir "alignn/examples/sample_data" --config "alignn/examples/sample_data/config_example.json" --output_dir=temp and simultanously running nvidia-smi gives the following output:
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.43.02 Driver Version: 535.43.02 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3060 Off | 00000000:07:00.0 Off | N/A |
| 0% 46C P2 62W / 170W | 921MiB / 12288MiB | 39% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1405 G /usr/lib/xorg/Xorg 56MiB |
| 0 N/A N/A 1571 G /usr/bin/gnome-shell 5MiB |
| 0 N/A N/A 29095 C .../miniconda3/envs/version/bin/python 848MiB |
+---------------------------------------------------------------------------------------+

I have done my best to check that all the dependencies are compatible and I can confirm that the device is switched to cuda in the train_folder_ff.py script.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions