Skip to content

tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found #528

@lauthirteen

Description

@lauthirteen

I trained 2 models using dpgen, following the example of CH4, In the 00.trian stage, I got the following error:

RuntimeError: Job ecbbfb3f-001e-4b19-8ee2-2abb65a182fc failed for more than 3 times

I checked the train.log file. Found the following error:
{
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: Blas xGEMM launch failed : a.shape=[1,16,1], b.shape=[1,1,10], m=16, n=10, k=1
[[node filter_type_1/MatMul (defined at /lib/python3.9/site-packages/deepmd/utils/network.py:176) ]]
[[add_14/_35]]
(1) Internal: Blas xGEMM launch failed : a.shape=[1,16,1], b.shape=[1,1,10], m=16, n=10, k=1
[[node filter_type_1/MatMul (defined at /lib/python3.9/site-packages/deepmd/utils/network.py:176) ]]
0 successful operations.
0 derived errors ignored.
}

How should i solve this problem?
Best regards!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions