Skip to content

Commit

Permalink
add and run pre-commit to format codes (#120)
Browse files Browse the repository at this point in the history
This PR is to replace #117. It has too many conflicts with #118 and
#119, thus I open this new PR.

Co-authored-by: Han Wang <wang_han@iapcm.ac.cn>
  • Loading branch information
wanghan-iapcm and Han Wang committed Jan 27, 2023
1 parent c06fb0d commit 49a8ee5
Show file tree
Hide file tree
Showing 126 changed files with 7,445 additions and 5,959 deletions.
8 changes: 4 additions & 4 deletions .github/workflows/pub-docker.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,28 +23,28 @@ jobs:
steps:
- name: Check out the repo
uses: actions/checkout@v3

- name: Log in to Docker Hub
uses: docker/login-action@f054a8b539a109f9f41c372932f1ae047eff08c9
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}

- name: Log in to the Container registry
uses: docker/login-action@f054a8b539a109f9f41c372932f1ae047eff08c9
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Extract metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@98669ae865ea3cffbcbaa878cf57c20bbf1c6c38
with:
images: |
dptechnology/dpgen2
ghcr.io/deepmodeling/dpgen2
- name: Build and push Docker images
uses: docker/build-push-action@ad44023a93711e3deb337508980b4b5e9bcdc5dc
with:
Expand Down
1 change: 0 additions & 1 deletion .github/workflows/pub-pypi.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,4 +36,3 @@ jobs:
uses: pypa/gh-action-pypi-publish@master
with:
password: ${{ secrets.PYPI_API_TOKEN }}

2 changes: 1 addition & 1 deletion .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ jobs:
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
run: |
pip install -e .[test]
pip install mock coverage pytest
- name: Test
Expand Down
21 changes: 21 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# See https://pre-commit.com for more information
# See https://pre-commit.com/hooks.html for more hooks
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
- id: trailing-whitespace
exclude: "^tests/.*$"
- id: end-of-file-fixer
exclude: "^tests/fp/.*$"
- id: check-yaml
#- id: check-json
- id: check-added-large-files
- id: check-merge-conflict
- id: check-symlinks
- id: check-toml
# Python
- repo: https://github.com/psf/black
rev: 22.12.0
hooks:
- id: black-jupyter
1 change: 0 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
DPGEN2 is the 2nd generation of the Deep Potential GENerator.

For developers please read the [developers guide](docs/developer.md)

2 changes: 1 addition & 1 deletion docs/.gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
api/
_build/
_build/
54 changes: 34 additions & 20 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,9 @@

# -- Project information -----------------------------------------------------

project = 'DPGEN2'
copyright = '2022-%d, DeepModeling' % date.today().year
author = 'DeepModeling'
project = "DPGEN2"
copyright = "2022-%d, DeepModeling" % date.today().year
author = "DeepModeling"


# -- General configuration ---------------------------------------------------
Expand All @@ -28,53 +28,67 @@
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
'deepmodeling_sphinx',
'dargs.sphinx',
'myst_parser',
"deepmodeling_sphinx",
"dargs.sphinx",
"myst_parser",
"sphinx_rtd_theme",
'sphinx.ext.viewcode',
'sphinx.ext.intersphinx',
'numpydoc',
'sphinx.ext.autosummary',
'sphinxarg.ext',
"sphinx.ext.viewcode",
"sphinx.ext.intersphinx",
"numpydoc",
"sphinx.ext.autosummary",
"sphinxarg.ext",
]

# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
templates_path = ["_templates"]

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']
exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]


# -- Options for HTML output -------------------------------------------------

# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = 'sphinx_rtd_theme'
html_theme = "sphinx_rtd_theme"

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
html_static_path = ["_static"]
html_css_files = []

autodoc_default_flags = ['members']
autodoc_default_flags = ["members"]
autosummary_generate = True
master_doc = 'index'
master_doc = "index"


def run_apidoc(_):
from sphinx.ext.apidoc import main
sys.path.append(os.path.join(os.path.dirname(__file__), '..'))

sys.path.append(os.path.join(os.path.dirname(__file__), ".."))
cur_dir = os.path.abspath(os.path.dirname(__file__))
module = os.path.join(cur_dir, "..", "dpgen2")
main(['-M', '--tocfile', 'api', '-H', 'DPGEN2 API', '-o', os.path.join(cur_dir, "api"), module, '--force'])
main(
[
"-M",
"--tocfile",
"api",
"-H",
"DPGEN2 API",
"-o",
os.path.join(cur_dir, "api"),
module,
"--force",
]
)


def setup(app):
app.connect('builder-inited', run_apidoc)
app.connect("builder-inited", run_apidoc)


intersphinx_mapping = {
Expand Down
28 changes: 14 additions & 14 deletions docs/developer.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,24 +7,24 @@

## The concurrent learning algorithm

DPGEN2 implements the concurrent learning algorithm named DP-GEN, described in [this paper](https://doi.org/10.1016/j.cpc.2020.107206). It is noted that other types of workflows, like active learning, should be easily implemented within the infrastructure of DPGEN2.
DPGEN2 implements the concurrent learning algorithm named DP-GEN, described in [this paper](https://doi.org/10.1016/j.cpc.2020.107206). It is noted that other types of workflows, like active learning, should be easily implemented within the infrastructure of DPGEN2.

The DP-GEN algorithm is iterative. In each iteration, four steps are consecutively executed: training, exploration, selection, and labeling.
The DP-GEN algorithm is iterative. In each iteration, four steps are consecutively executed: training, exploration, selection, and labeling.

1. **Training**. A set of DP models are trained with the same dataset and the same hyperparameters. The only difference is the random seed initializing the model parameters.
1. **Training**. A set of DP models are trained with the same dataset and the same hyperparameters. The only difference is the random seed initializing the model parameters.
2. **Exploration**. One of the DP models is used to explore the configuration space. The strategy of exploration highly depends on the purpose of the application case of the model. The simulation technique for exploration can be molecular dynamics, Monte Carlo, structure search/optimization, enhanced sampling, or any combination of them. Current DPGEN2 only supports exploration based on molecular simulation platform [LAMMPS](https://www.lammps.org/).
3. **Selection**. Not all the explored configurations are labeled, rather, the model prediction errors on the configurations are estimated by the ***model deviation***, which is defined as the standard deviation in predictions of the set of the models. The critical configurations with large and not-that-large errors are selected for labeling. The configurations with very large errors are not selected because the large error is usually caused by non-physical configurations, e.g. overlapping atoms.
4. **Labeling**. The selected configurations are labeled with energy, forces, and virial calculated by a method of first-principles accuracy. The usually used method is the [density functional theory](https://doi.org/10.1103/PhysRev.140.A1133) implemented in [VASP](https://www.vasp.at/), [Quantum Expresso](https://www.quantum-espresso.org/), [CP2K](https://www.cp2k.org/), and etc.. The labeled data are finally added to the training dataset to start the next iteration.
3. **Selection**. Not all the explored configurations are labeled, rather, the model prediction errors on the configurations are estimated by the ***model deviation***, which is defined as the standard deviation in predictions of the set of the models. The critical configurations with large and not-that-large errors are selected for labeling. The configurations with very large errors are not selected because the large error is usually caused by non-physical configurations, e.g. overlapping atoms.
4. **Labeling**. The selected configurations are labeled with energy, forces, and virial calculated by a method of first-principles accuracy. The usually used method is the [density functional theory](https://doi.org/10.1103/PhysRev.140.A1133) implemented in [VASP](https://www.vasp.at/), [Quantum Expresso](https://www.quantum-espresso.org/), [CP2K](https://www.cp2k.org/), and etc.. The labeled data are finally added to the training dataset to start the next iteration.

In each iteration, the quality of the model is improved by selecting and labeling more critical data and adding them to the training dataset. The DP-GEN iteration is converged when no more critical data can be selected.

## Overview of the DPGEN2 Implementation
## Overview of the DPGEN2 Implementation

The implementation DPGEN2 is based on the workflow platform [dflow](https://github.com/dptech-corp/dflow), which is a python wrapper of the [Argo Workflows](https://argoproj.github.io/workflows/), an open-source container-native workflow engine on [Kubernetes](https://kubernetes.io/).

The DP-GEN algorithm is conceptually modeled as a computational graph. The implementation is then considered as two lines: the operators and the workflow.
1. **Operators**. Operators are implemented in Python v3. The operators should be implemented and tested ***without*** the workflow.
2. **Workflow**. Workflow is implemented on [dflow](https://github.com/dptech-corp/dflow). Ideally, the workflow is implemented and tested with all operators mocked.
1. **Operators**. Operators are implemented in Python v3. The operators should be implemented and tested ***without*** the workflow.
2. **Workflow**. Workflow is implemented on [dflow](https://github.com/dptech-corp/dflow). Ideally, the workflow is implemented and tested with all operators mocked.


## The DPGEN2 workflow
Expand All @@ -33,16 +33,16 @@ The workflow of DPGEN2 is illustrated in the following figure

![dpgen flowchart](./figs/dpgen-flowchart.jpg)

In the center is the `block` operator, which is a super-OP (an OP composed by several OPs) for one DP-GEN iteration, i.e. the super-OP of the training, exploration, selection, and labeling steps. The inputs of the `block` OP are `lmp_task_group`, `conf_selector` and `dataset`.
- `lmp_task_group`: definition of a group of LAMMPS tasks that explore the configuration space.
In the center is the `block` operator, which is a super-OP (an OP composed by several OPs) for one DP-GEN iteration, i.e. the super-OP of the training, exploration, selection, and labeling steps. The inputs of the `block` OP are `lmp_task_group`, `conf_selector` and `dataset`.
- `lmp_task_group`: definition of a group of LAMMPS tasks that explore the configuration space.
- `conf_selector`: defines the rule by which the configurations are selected for labeling.
- `dataset`: the training dataset.

The outputs of the `block` OP are
- `exploration_report`: a report recording the result of the exploration. For example, home many configurations are accurate enough and how many are selected as candidates for labeling.
- `exploration_report`: a report recording the result of the exploration. For example, home many configurations are accurate enough and how many are selected as candidates for labeling.
- `dataset_incr`: the increment of the training dataset.

The `dataset_incr` is added to the training `dataset`.
The `dataset_incr` is added to the training `dataset`.

The `exploration_report` is passed to the `exploration_strategy` OP. The `exploration_strategy` implements the strategy of exploration. It reads the `exploration_report` generated by each iteration (`block`), then tells if the iteration is converged. If not, it generates a group of LAMMPS tasks (`lmp_task_group`) and the criteria of selecting configurations (`conf_selector`). The `lmp_task_group` and `conf_selector` are then used by `block` of the next iteration. The iteration closes.

Expand All @@ -58,14 +58,14 @@ The inside of the super-OP `block` is displayed on the right-hand side of the fi

### The exploration strategy

The exploration strategy defines how the configuration space is explored by the concurrent learning algorithm. The design of the exploration strategy is graphically illustrated in the following figure. The exploration is composed of stages. Only the DP-GEN exploration is converged at one stage (no configuration with a large error is explored), the exploration goes to the next iteration. The whole procedure is controlled by `exploration_scheduler`. Each stage has its schedule, which talks to the `exploration_scheduler` to generate the schedule for the DP-GEN algorithm.
The exploration strategy defines how the configuration space is explored by the concurrent learning algorithm. The design of the exploration strategy is graphically illustrated in the following figure. The exploration is composed of stages. Only the DP-GEN exploration is converged at one stage (no configuration with a large error is explored), the exploration goes to the next iteration. The whole procedure is controlled by `exploration_scheduler`. Each stage has its schedule, which talks to the `exploration_scheduler` to generate the schedule for the DP-GEN algorithm.

![exploration strategy](./figs/exploration-strategy.jpg)

Some concepts are explained below:

- **Exploration group**. A group of LAMMPS tasks shares similar settings. For example, a group of NPT MD simulations in a certain thermodynamic space.
- **Exploration stage**. The `exploration_stage` contains a list of exploration groups. It contains all information needed to define the `lmp_task_group` used by the `block` in the DP-GEN iteration.
- **Exploration stage**. The `exploration_stage` contains a list of exploration groups. It contains all information needed to define the `lmp_task_group` used by the `block` in the DP-GEN iteration.
- **Stage scheduler**. It guarantees the convergence of the DP-GEN algorithm in each `exploration_stage`. If the exploration is not converged, the `stage_scheduler` generates `lmp_task_group` and `conf_selector` from the `exploration_stage` for the next iteration (probably with a different initial condition, i.e. different initial configurations and randomly generated initial velocity).
- **Exploration scheduler**. The scheduler for the DP-GEN algorithm. When DP-GEN is converged in one of the stages, it goes to the next stage until all planned stages are used.

Expand Down

0 comments on commit 49a8ee5

Please sign in to comment.