Training is slow and not using GPU #176

davidfstein · 2022-12-12T22:27:31Z

I'm attempting to run the GraphCL example with a custom dataset (n~=150,000). I am passing device='cuda' and my GPU is available, but the GPU utilization is at 0% and the evaluate training loop is expected to run for ~12 hours. Is there a way to increase GPU utilization and do you expect the implementation to scale to larger datasets?

ycremar · 2023-01-04T19:16:51Z

Hi @davidfstein ,

Thank you for letting us know about the issue. This is not our expected performance. Could you try setting log_interval to be equal to your total number of epochs and see if the GPU utilization increases? Also, could you confirm if the GPU memory is used?

We will continue working on efficiency optimization.

davidfstein · 2023-01-17T19:45:39Z

Hi, I added a data.to(device) in the encoder training loop and now the models are using the GPU. I will try to go back and take a look to see why that data isn't being moved to the GPU in the first place. I will try to update here later

ycremar added the sslgraph Self-supervised Learning on Graphs label Jan 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training is slow and not using GPU #176

Training is slow and not using GPU #176

davidfstein commented Dec 12, 2022

ycremar commented Jan 4, 2023

davidfstein commented Jan 17, 2023

Training is slow and not using GPU #176

Training is slow and not using GPU #176

Comments

davidfstein commented Dec 12, 2022

ycremar commented Jan 4, 2023

davidfstein commented Jan 17, 2023