Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi-gpu lammps issue #322

Open
hwsheng opened this issue Feb 11, 2024 · 6 comments
Open

multi-gpu lammps issue #322

hwsheng opened this issue Feb 11, 2024 · 6 comments
Labels

Comments

@hwsheng
Copy link

hwsheng commented Feb 11, 2024

I'm encountering difficulties with running a multi-GPU simulation in LAMMPS using the MACE model. In a preliminary test using two GPUs, I executed the simulation with the following command: mpirun -np 2 ~/lammps-mace-gpu/lammps/build-kokkos-cuda/lmp -in lmp.in -k on g 2 -sf kk. However, I ran into an error stating cudaFree(arg_alloc_ptr) error(cudaErrorAssert): device-side assert triggered.

Would you have any advice on how to address this problem? Thank you in advance.

@wcwitt
Copy link
Collaborator

wcwitt commented Feb 12, 2024

Can you paste your input file?

@hwsheng
Copy link
Author

hwsheng commented Feb 12, 2024

Thanks for your attention. Here is the input of my lammps-mace simulation, which runs well in a single GPU execution.

# Test of MACE potential for C system

units           metal
boundary        p p p

atom_style      atomic
atom_modify map yes
newton on

read_data       C.dat

mass            1 12.011

pair_style mace no_domain_decomposition
pair_coeff * * ../carbon_swa.model-lammps.pt C


velocity all create 10000 4928459 rot yes dist gaussian

fix             1 all npt temp 6300 300 0.2  iso 10000 10000 0.5
thermo          100
timestep        0.002
dump            dump all custom 10000 dump.dat id type xu yu zu
run             600000
unfix 1
fix             1 all npt temp 300 300 0.2  iso 0 0 0.5
run             100000

@wcwitt
Copy link
Collaborator

wcwitt commented Feb 12, 2024

The no_domain_decomposition only works on a single GPU, so you need

pair_style mace

instead.

This isn't very well documented, sorry. Please note that, right now, a single-GPU no_domain_decomposition simulation will almost certainly be faster than a multi-GPU simulation. I don't recommend using multi-GPU unless you absolutely need it (e.g., for memory). We are working on this.

@hwsheng
Copy link
Author

hwsheng commented Feb 12, 2024

Thanks for the heads-up. Indeed, I was trying to resolve the out-of-memory issue encountered in the single-gpu simulation when increasing the number of atoms in the simulation system.

Now, for a test run using two GPUs,

After using

pair_style mace 

it turns out that I got an out-of-memory error RuntimeError: CUDA out of memory. Tried to allocate 7.39 GiB (GPU 1; 79.15 GiB total capacity; 65.36 GiB already allocated; 5.00 GiB free; 72.67 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

This error was not shown in a single GPU simulation (same system size, 4086 atoms).

Guess I have to stick to a small system size for the simulation for now?

Thanks in advance.

@wcwitt
Copy link
Collaborator

wcwitt commented Feb 12, 2024

For single species, on our A100 (80GB memory), I'd normally expect to reach system sizes of 5000-10000 before seeing memory problems, depending on how expressive the model is (L=0, L=1, L=2, etc). So you may be able to reach larger systems on a single GPU by reducing your model size.

It's also possible, but not guaranteed, that increasing to four GPUs (say) would be enough. But this wouldn't be my first choice if you can avoid it.

@hwsheng
Copy link
Author

hwsheng commented Feb 12, 2024

ok. Many thanks for your advice. I will try that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants