Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

calculate neighbor statistics from CLI #1476

Merged
merged 5 commits into from
Feb 17, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,7 @@ A full [document](doc/train/train-input-auto.rst) on options in the training inp
- [Descriptor `"se_e2_r"`](doc/model/train-se-e2-r.md)
- [Descriptor `"se_e3"`](doc/model/train-se-e3.md)
- [Descriptor `"hybrid"`](doc/model/train-hybrid.md)
- [Descriptor `sel`](doc/model/sel.md)
- [Fit energy](doc/model/train-energy.md)
- [Fit `tensor` like `Dipole` and `Polarizability`](doc/model/train-fitting-tensor.md)
- [Train a Deep Potential model using `type embedding` approach](doc/model/train-se-e2-a-tebd.md)
Expand Down
2 changes: 2 additions & 0 deletions deepmd/entrypoints/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
from .transfer import transfer
from ..infer.model_devi import make_model_devi
from .convert import convert
from .neighbor_stat import neighbor_stat

__all__ = [
"config",
Expand All @@ -23,4 +24,5 @@
"doc_train_input",
"make_model_devi",
"convert",
"neighbor_stat",
]
33 changes: 33 additions & 0 deletions deepmd/entrypoints/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
transfer,
make_model_devi,
convert,
neighbor_stat,
)
from deepmd.loggers import set_log_handles

Expand Down Expand Up @@ -408,6 +409,36 @@ def parse_args(args: Optional[List[str]] = None):
type=str,
help='the output model',
)

# neighbor_stat
parser_neighbor_stat = subparsers.add_parser(
'neighbor-stat',
parents=[parser_log],
help='Calculate neighbor statistics',
)
parser_neighbor_stat.add_argument(
"-s",
"--system",
default=".",
type=str,
help="The system dir. Recursively detect systems in this directory",
)
parser_neighbor_stat.add_argument(
"-r",
"--rcut",
type=float,
required=True,
help="cutoff radius",
)
parser_neighbor_stat.add_argument(
"-t",
"--type-map",
type=str,
nargs='+',
required=True,
help="type map",
)

# --version
parser.add_argument('--version', action='version', version='DeePMD-kit v%s' % __version__)

Expand Down Expand Up @@ -456,6 +487,8 @@ def main():
make_model_devi(**dict_args)
elif args.command == "convert-from":
convert(**dict_args)
elif args.command == "neighbor-stat":
neighbor_stat(**dict_args)
elif args.command is None:
pass
else:
Expand Down
49 changes: 49 additions & 0 deletions deepmd/entrypoints/neighbor_stat.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
import logging
from typing import List

from deepmd.common import expand_sys_str
from deepmd.utils.data_system import DeepmdDataSystem
from deepmd.utils.neighbor_stat import NeighborStat

log = logging.getLogger(__name__)

def neighbor_stat(
*,
system: str,
rcut: float,
type_map: List[str],
**kwargs,
):
"""Calculate neighbor statistics.

Parameters
----------
system : str
system to stat
rcut : float
cutoff radius
type_map : list[str]
type map

Examples
--------
>>> neighbor_stat(system='.', rcut=6., type_map=["C", "H", "O", "N", "P", "S", "Mg", "Na", "HW", "OW", "mNa", "mCl", "mC", "mH", "mMg", "mN", "mO", "mP"])
min_nbor_dist: 0.6599510670195264
max_nbor_size: [23, 26, 19, 16, 2, 2, 1, 1, 72, 37, 5, 0, 31, 29, 1, 21, 20, 5]
"""
all_sys = expand_sys_str(system)
if not len(all_sys):
raise RuntimeError("Did not find valid system")
data = DeepmdDataSystem(
systems=all_sys,
batch_size=1,
test_size=1,
rcut=rcut,
type_map=type_map,
)
data.get_batch()
nei = NeighborStat(data.get_ntypes(), rcut)
min_nbor_dist, max_nbor_size = nei.get_stat(data)
log.info("min_nbor_dist: %f" % min_nbor_dist)
log.info("max_nbor_size: %s" % str(max_nbor_size))
return min_nbor_dist, max_nbor_size
1 change: 1 addition & 0 deletions doc/model/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
- [Descriptor `"se_e2_r"`](train-se-e2-r.md)
- [Descriptor `"se_e3"`](train-se-e3.md)
- [Descriptor `"hybrid"`](train-hybrid.md)
- [Descriptor `sel`](sel.md)
- [Fit energy](train-energy.md)
- [Fit `tensor` like `Dipole` and `Polarizability`](train-fitting-tensor.md)
- [Train a Deep Potential model using `type embedding` approach](train-se-e2-a-tebd.md)
Expand Down
1 change: 1 addition & 0 deletions doc/model/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ Model
train-se-e2-r
train-se-e3
train-hybrid
sel
train-energy
train-fitting-tensor
train-se-e2-a-tebd
Expand Down
13 changes: 13 additions & 0 deletions doc/model/sel.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Determine `sel`

All descriptors require to set `sel`, which means the expected maximum number of type-i neighbors of an atom. DeePMD-kit will allocate memory according to `sel`.

`sel` should not be too large or too small. If `sel` is too large, the computing will become much slower and cost more memory. If `sel` is not enough, the energy will be not conserved, making the accuracy of the model worse.

To determine a proper `sel`, one can calculate the neighbor stat of the training data before training:
```sh
dp neighbor-stat -s data -r 6.0 -t O H
```
where `data` is the directory of data, `6.0` is the cutoff radius, and `O` and `H` is the type map. The program will give the `max_nbor_size`. For example, `max_nbor_size` of the water example is `[38, 72]`, meaning an atom may have 38 O neighbors and 72 H neighbors in the training data.

The `sel` should be set to a higher value than that of the training data, considering there may be some extreme geometries during MD simulations. As a result, we set to `[46, 92]` in the water example.
47 changes: 47 additions & 0 deletions source/tests/test_neighbor_stat.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
import shutil
import numpy as np
import unittest
import dpdata

from deepmd.entrypoints.neighbor_stat import neighbor_stat

def gen_sys(nframes):
natoms = 1000
data = {}
X, Y, Z = np.mgrid[0:9:10j, 0:9:10j, 0:9:10j]
positions = np.vstack([X.ravel(), Y.ravel(), Z.ravel()]).T #+ 0.1
data['coords'] = np.repeat(positions[np.newaxis, :, :], nframes, axis=0)
data['forces'] = np.random.random([nframes, natoms, 3])
data['cells'] = np.array([10., 0., 0., 0., 10., 0., 0., 0., 10.]).reshape(1,3,3)
data['energies'] = np.random.random([nframes, 1])
data['atom_names'] = ['TYPE']
data['atom_numbs'] = [1000]
data['atom_types'] = np.repeat(0, 1000)
return data


class TestNeighborStat(unittest.TestCase):
def setUp(self):
data0 = gen_sys(1)
sys0 = dpdata.LabeledSystem()
sys0.data = data0
sys0.to_deepmd_npy('system_0', set_size = 10)

def tearDown(self):
shutil.rmtree('system_0')

def test_neighbor_stat(self):
# set rcut to 0. will cause a core dumped
# TODO: check what is wrong
for rcut in (3., 6., 11.):
Comment on lines +34 to +36
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@denghuilu setting rcut to 0. will cause "core dumped". Any ideas?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Although, I don't think some one will set it to 0)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be caused by this line:

ncell[dd] = to_face[dd] / rc;

with self.subTest():
rcut += 1e-3 # prevent numerical errors
min_nbor_dist, max_nbor_size = neighbor_stat(system="system_0", rcut=rcut, type_map=["TYPE"])
upper = np.ceil(rcut)+1
X, Y, Z = np.mgrid[-upper:upper, -upper:upper, -upper:upper]
positions = np.vstack([X.ravel(), Y.ravel(), Z.ravel()]).T
# distance to (0,0,0)
distance = np.linalg.norm(positions, axis=1)
expected_neighbors = np.count_nonzero(np.logical_and(distance > 0, distance <= rcut))
self.assertAlmostEqual(min_nbor_dist, 1.0, 6)
self.assertEqual(max_nbor_size, [expected_neighbors])