本篇教程将介绍如何使用deepmd和训练好的模型，得到数据集上descriptor或模型的输出结果，descriptor的输出结果可以作为特征在下游任务中使用。  
运行本教程需要安装`deepmd-kit`和`dpdata` 
- `deepmd-kit`安装介绍：https://github.com/deepmodeling/deepmd-kit  
- `dpdata`支持从vasp, lammps, deepmd, Amber, cp2k等软件读取输入数据，安装介绍可参考：https://github.com/deepmodeling/dpdata  

本篇教程的内容可以参考代码注释：https://github.com/deepmodeling/deepmd-kit/blob/v2.1.5/deepmd/infer/deep_pot.py

In [None]:
import dpdata
from deepmd.infer import DeepPot

import numpy as np

In [None]:
# 训练好的模型路径
model_path = '../../model/se_e2_a/water-14000.pb'
# 数据集的路径，数据集可以是deepmd的标准格式或者VASP
system_path = '../../data/water/data_0'
max_nframe_each = 10
type_map = ["O", "H"]

## 获取descriptor的输出

In [None]:
# 加载模型
dp = DeepPot(model_path)

# 使用dpdata读取数据
ms = dpdata.System(type_map=type_map)
ms.from_deepmd_npy(system_path, labeled=False)

# 获取当前system的frame数量
nframe = ms.get_nframes()
    
# nframes x natoms x 3
coord = ms['coords']
# nframes x 9
cell = ms['cells'].reshape([nframe, -1])
# List[int]
# len(atype) == natoms, 且第i个原子的元素种类为type_map[atype[i]]
atype = ms['atom_types'].tolist()

start_idx = 0
end_idx = start_idx + max_nframe_each

descriptor_out = []
# 限制每次inference时的frame数量，防止内存溢出(Out Of Memory, 简称OOM)
while start_idx < nframe:
    out = \
        dp.eval_descriptor(coord[start_idx: end_idx, ...], 
                            cell[start_idx: end_idx, :],
                            atype)
    descriptor_out.append(out)
    start_idx += max_nframe_each
    end_idx += max_nframe_each

# nframes x natoms x feature_num
descriptor_out = np.concatenate(descriptor_out, axis=0)
descriptor_out.shape

descriptor的输出结果为nframes x natoms x feature_num, 其中feature_num的大小跟input.json中的参数有关，在这个例子中为1600

## 获取模型的输出

In [None]:
# 加载模型
dp = DeepPot(model_path)

# 使用dpdata读取数据
ms = dpdata.System(type_map=type_map)
ms.from_deepmd_npy(system_path, labeled=False)

nframe = ms.get_nframes()
# nframes x natoms x 3
coord = ms['coords']
# nframes x 9
cell = ms['cells'].reshape([nframe, -1])
# List[int]
# len(atype) == natoms, 且第i个原子的元素种类为type_map[atype[i]]
atype = ms['atom_types'].tolist()

start_idx = 0
end_idx = start_idx + max_nframe_each

energy = []
force = []
virial = []
while start_idx < nframe:
    e, f, v = \
        dp.eval(coord[start_idx: end_idx, :], 
                cell[start_idx: end_idx, :],
                atype)
    
    energy.append(e)
    force.append(f)
    virial.append(v)

    start_idx += max_nframe_each
    end_idx += max_nframe_each

# [nframe]
energy = np.concatenate(energy, axis=0).squeeze()
# nframe x -1
force = np.concatenate(force, axis=0).reshape([nframe, -1])
# nframe x 9
virial = np.concatenate(virial, axis=0).reshape([nframe, -1])

模型将输出energy(shape 为[ nframe ]), force(shape为[nframe, -1]), virial force(shape为[nframe, -9]). 其中shape为-1的意思是该维度的值可能为任意值，将在计算其它维度的大小之后再确定。例如shape[a, b, c]的矩阵reshape([nframe, -1])之后shape将变成[nframe, a * b * c / nframe].  
输出的结果可以保存标准的npy格式，并重新被dpdata读取：

In [None]:
from pathlib import Path
save_path = Path('./system')
# 将不含energy, force, virial force的data保存成标准deepmd格式
ms.to('deepmd/npy', save_path)
# 找到coord.npy的路径
coord_path = list(save_path.glob('**/coord.npy'))[0]

# 将energy, force, virial force跟coord.npy保存在同一路径下，组成标准的deepmd格式
with open(coord_path.parent / 'energy.npy', 'wb') as f:
    np.save(f, energy)
with open(coord_path.parent / 'force.npy', 'wb') as f:
    np.save(f, force)
with open(coord_path.parent / 'virial.npy', 'wb') as f:
    np.save(f, virial)

In [None]:
# 读取包含energy, force, virial force的数据
ls = dpdata.LabeledSystem()
ls.from_deepmd_npy('./system/')