User ability to decide GPU device number #85

siddanib · 2024-01-02T01:17:34Z

Integration of FTorch with a distributed CPU based solver can lead to a scenario where there are N (--ntasks-per-node) MPI and M (torch::cuda::device_count()) GPUs per node (**M** <= **N**). The current implementation of FTorch appears to leverage only GPU:0 for all N MPI ranks. Providing user ability to decide which GPU to leverage can ensure that all available GPUs are used.

An initial discussion regarding this potential feature: #84.

Furthermore, there might still be multiple MPI ranks per GPU even after uniformly distributing the MPI ranks among available GPUs. The GPU probably calls these ML model copies serially. CUDA MPS could be utilized to concurrently run the ML model copies. An alternative might be to perform (gather to a single task, deploy the ML model from that task, and finally scatter to respective tasks) inside the fortran code.

The text was updated successfully, but these errors were encountered:

jatkinson1000 · 2024-03-22T09:33:35Z

@siddanib PR #96 has been opened to address this which builds on your suggestions (thank you!) but in a way consistent with our overall approach.

Please do feel free to take a look and see what you think.

jatkinson1000 · 2024-03-28T15:12:25Z

@siddanib this is now implemented in the main code.
You can see information here and a worked example here.

Thank you for raising this as an improvement to the code, and all the help you provided pointing us to the relevant information.

siddanib · 2024-03-28T16:40:06Z

Thank you very much for providing this capability, @jatkinson1000 & @jwallwork23! I will look into this new feature and get back to you.

siddanib mentioned this issue Jan 2, 2024

GPU Device Number #86

Closed

TomMelt self-assigned this Jan 15, 2024

TomMelt added the enhancement New feature or request label Jan 15, 2024

jwallwork23 mentioned this issue Mar 22, 2024

Allow specification for GPU device index #96

Merged

jatkinson1000 assigned jwallwork23 and unassigned TomMelt Mar 22, 2024

jatkinson1000 closed this as completed in #96 Mar 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

User ability to decide GPU device number #85

User ability to decide GPU device number #85

siddanib commented Jan 2, 2024

jatkinson1000 commented Mar 22, 2024

jatkinson1000 commented Mar 28, 2024

siddanib commented Mar 28, 2024

User ability to decide GPU device number #85

User ability to decide GPU device number #85

Comments

siddanib commented Jan 2, 2024

jatkinson1000 commented Mar 22, 2024

jatkinson1000 commented Mar 28, 2024

siddanib commented Mar 28, 2024