Some problems encountered in using GPU to accelerate lammps #3033
Unanswered
SEU-NiuWenLong
asked this question in
Q&A
Replies: 1 comment 1 reply
-
@Yi-FanLi Is this the error you got? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have two GPU cards, and when I use GPU acceleration, lammps always breaks off after running for a while and reports an error.
Possible remote error message: ESC[31m==> /home/gcniu/workspace/deepmd/23-32/run/temp/81c427e9fd55ff100029be97075854c91642ee29/task.002.000055/model_devi
.log <==
ibdeepmd_1697184996481/work/source/lib/src/gpu/prod_env_mat.cu: 625, in file /home/conda/feedstock_root/build_artifacts/libdeepmd_1697184996481/work/sour
ce/op/custom_op.cc:18
[[{{node ProdEnvMatA}}]]
[[o_energy/_31]]
(1) INTERNAL: Operation received an exception: DeePMD-kit Error: CUDA Runtime library throws an error: an illegal memory access was encountered, in file /home/conda/feedstock_root/build_artifacts/libdeepmd_1697184996481/work/source/lib/src/gpu/prod_env_mat.cu: 625, in file /home/conda/feedstock_root/build_artifacts/libdeepmd_1697184996481/work/source/op/custom_op.cc:18
[[{{node ProdEnvMatA}}]]
0 successful operations.
0 derived errors ignored. (/home/conda/feedstock_root/build_artifacts/libdeepmd_1697184996481/work/source/lmp/pair_deepmd.cpp:634)
Last command: run ${NSTEPS} upto
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=1
:
system msg for write_line failure : Bad file descriptor
ESC[0m
This is my machine.json file:
"model_devi": [
{
"command": "lmp",
"machine": {
"context_type": "local",
"batch_type": "Slurm",
"local_root": "./",
"remote_root": "/home/gcniu/workspace/deepmd/23-32/run/temp"
},
"resources": {
"number_node": 1,
"cpu_per_node": 16,
"gpu_per_node": 2,
"queue_name": "GPU",
"strategy":{"if_cuda_multi_devices":true},
"custom_flags" : [
"#SBATCH -J gcniu",
"#SBATCH -n 16",
"#SBATCH -o %j.log",
"#SBATCH -e %j.log"
],
"group_size": 1000,
"_source_list": ["/home/gcniu/workspace/deepmd/23-32/run/envs.sh"]
}
}
]
Is there something wrong with my parameter file configuration? Or is it something else?
Beta Was this translation helpful? Give feedback.
All reactions