-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multi-gpu lammps issue #322
Comments
Can you paste your input file? |
Thanks for your attention. Here is the input of my lammps-mace simulation, which runs well in a single GPU execution.
|
The
instead. This isn't very well documented, sorry. Please note that, right now, a single-GPU |
Thanks for the heads-up. Indeed, I was trying to resolve the out-of-memory issue encountered in the single-gpu simulation when increasing the number of atoms in the simulation system. Now, for a test run using two GPUs, After using
it turns out that I got an out-of-memory error This error was not shown in a single GPU simulation (same system size, 4086 atoms). Guess I have to stick to a small system size for the simulation for now? Thanks in advance. |
For single species, on our A100 (80GB memory), I'd normally expect to reach system sizes of 5000-10000 before seeing memory problems, depending on how expressive the model is (L=0, L=1, L=2, etc). So you may be able to reach larger systems on a single GPU by reducing your model size. It's also possible, but not guaranteed, that increasing to four GPUs (say) would be enough. But this wouldn't be my first choice if you can avoid it. |
ok. Many thanks for your advice. I will try that. |
I'm encountering difficulties with running a multi-GPU simulation in LAMMPS using the MACE model. In a preliminary test using two GPUs, I executed the simulation with the following command: mpirun -np 2 ~/lammps-mace-gpu/lammps/build-kokkos-cuda/lmp -in lmp.in -k on g 2 -sf kk. However, I ran into an error stating cudaFree(arg_alloc_ptr) error(cudaErrorAssert): device-side assert triggered.
Would you have any advice on how to address this problem? Thank you in advance.
The text was updated successfully, but these errors were encountered: