multi-gpu lammps issue #322

hwsheng · 2024-02-11T18:50:42Z

I'm encountering difficulties with running a multi-GPU simulation in LAMMPS using the MACE model. In a preliminary test using two GPUs, I executed the simulation with the following command: mpirun -np 2 ~/lammps-mace-gpu/lammps/build-kokkos-cuda/lmp -in lmp.in -k on g 2 -sf kk. However, I ran into an error stating cudaFree(arg_alloc_ptr) error(cudaErrorAssert): device-side assert triggered.

Would you have any advice on how to address this problem? Thank you in advance.

wcwitt · 2024-02-12T12:01:34Z

Can you paste your input file?

hwsheng · 2024-02-12T18:19:31Z

Thanks for your attention. Here is the input of my lammps-mace simulation, which runs well in a single GPU execution.

# Test of MACE potential for C system

units           metal
boundary        p p p

atom_style      atomic
atom_modify map yes
newton on

read_data       C.dat

mass            1 12.011

pair_style mace no_domain_decomposition
pair_coeff * * ../carbon_swa.model-lammps.pt C


velocity all create 10000 4928459 rot yes dist gaussian

fix             1 all npt temp 6300 300 0.2  iso 10000 10000 0.5
thermo          100
timestep        0.002
dump            dump all custom 10000 dump.dat id type xu yu zu
run             600000
unfix 1
fix             1 all npt temp 300 300 0.2  iso 0 0 0.5
run             100000

wcwitt · 2024-02-12T18:26:48Z

The no_domain_decomposition only works on a single GPU, so you need

pair_style mace

instead.

This isn't very well documented, sorry. Please note that, right now, a single-GPU no_domain_decomposition simulation will almost certainly be faster than a multi-GPU simulation. I don't recommend using multi-GPU unless you absolutely need it (e.g., for memory). We are working on this.

hwsheng · 2024-02-12T18:52:39Z

Thanks for the heads-up. Indeed, I was trying to resolve the out-of-memory issue encountered in the single-gpu simulation when increasing the number of atoms in the simulation system.

Now, for a test run using two GPUs,

After using

pair_style mace

it turns out that I got an out-of-memory error RuntimeError: CUDA out of memory. Tried to allocate 7.39 GiB (GPU 1; 79.15 GiB total capacity; 65.36 GiB already allocated; 5.00 GiB free; 72.67 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

This error was not shown in a single GPU simulation (same system size, 4086 atoms).

Guess I have to stick to a small system size for the simulation for now?

Thanks in advance.

wcwitt · 2024-02-12T19:29:34Z

For single species, on our A100 (80GB memory), I'd normally expect to reach system sizes of 5000-10000 before seeing memory problems, depending on how expressive the model is (L=0, L=1, L=2, etc). So you may be able to reach larger systems on a single GPU by reducing your model size.

It's also possible, but not guaranteed, that increasing to four GPUs (say) would be enough. But this wouldn't be my first choice if you can avoid it.

hwsheng · 2024-02-12T20:06:20Z

ok. Many thanks for your advice. I will try that.

ilyes319 added the lammps label Apr 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multi-gpu lammps issue #322

multi-gpu lammps issue #322

hwsheng commented Feb 11, 2024

wcwitt commented Feb 12, 2024

hwsheng commented Feb 12, 2024

wcwitt commented Feb 12, 2024

hwsheng commented Feb 12, 2024

wcwitt commented Feb 12, 2024

hwsheng commented Feb 12, 2024

multi-gpu lammps issue #322

multi-gpu lammps issue #322

Comments

hwsheng commented Feb 11, 2024

wcwitt commented Feb 12, 2024

hwsheng commented Feb 12, 2024

wcwitt commented Feb 12, 2024

hwsheng commented Feb 12, 2024

wcwitt commented Feb 12, 2024

hwsheng commented Feb 12, 2024