I trained 2 models using dpgen, following the example of CH4, In the 00.trian stage, I got the following error:
RuntimeError: Job ecbbfb3f-001e-4b19-8ee2-2abb65a182fc failed for more than 3 times
I checked the train.log file. Found the following error:
{
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: Blas xGEMM launch failed : a.shape=[1,16,1], b.shape=[1,1,10], m=16, n=10, k=1
[[node filter_type_1/MatMul (defined at /lib/python3.9/site-packages/deepmd/utils/network.py:176) ]]
[[add_14/_35]]
(1) Internal: Blas xGEMM launch failed : a.shape=[1,16,1], b.shape=[1,1,10], m=16, n=10, k=1
[[node filter_type_1/MatMul (defined at /lib/python3.9/site-packages/deepmd/utils/network.py:176) ]]
0 successful operations.
0 derived errors ignored.
}
How should i solve this problem?
Best regards!