deepmodeling · AnguseZhang · Jul 11, 2022 · Apr 26, 2022 · May 9, 2022 · May 23, 2022
diff --git a/.gitignore b/.gitignore
@@ -41,3 +41,4 @@ dbconfig.json
 .idea/*
 _build
 tests/generator/calypso_test_path
+doc/api/
diff --git a/README.md b/README.md
@@ -373,7 +373,6 @@ In `PARAM`, you can specialize the task as you expect.
 
   "_comment": " that's all ",
   "numb_models": 4,
-  "train_param": "input.json",
   "default_training_param": {
      "model": {
             "type_map": [
@@ -499,9 +498,8 @@ The bold notation of key (such aas **type_map**) means that it's a necessary key
 | **use_ele_temp** | int | 0 | Currently only support fp_style vasp. 0(default): no electron temperature. 1: eletron temperature as frame parameter. 2: electron temperature as atom parameter.
 | *#Data*
  | init_data_prefix | String | "/sharedext4/.../data/" | Prefix of initial data directories
- | ***init_data_sys*** | List of string|["CH4.POSCAR.01x01x01/.../deepmd"] |Directories of initial data. You may use either absolute or relative path here.
+ | ***init_data_sys*** | List of string|["CH4.POSCAR.01x01x01/.../deepmd"] |Directories of initial data. You may use either absolute or relative path here. Systems will be detected recursively in the directories.
  | ***sys_format*** | String | "vasp/poscar" | Format of initial data. It will be `vasp/poscar` if not set.
- | init_multi_systems | Boolean | false | If set to `true`, `init_data_sys` directories should contain sub-directories of various systems. DP-GEN will regard all of these sub-directories as inital data systems.
  | init_batch_size   | String of integer     | [8]                                                            | Each number is the batch_size of corresponding system  for training in `init_data_sys`. One recommended rule for setting the `sys_batch_size` and `init_batch_size` is that `batch_size` mutiply number of atoms ot the stucture should be larger than 32. If set to `auto`, batch size will be 32 divided by number of atoms. |
   | sys_configs_prefix | String | "/sharedext4/.../data/" | Prefix of `sys_configs`
  | **sys_configs**   | List of list of string         | [<br />["/sharedext4/.../POSCAR"], <br />["....../POSCAR"]<br />] | Containing directories of structures to be explored in iterations.Wildcard characters are supported here. |
@@ -515,10 +513,10 @@ The bold notation of key (such aas **type_map**) means that it's a necessary key
 | *#Exploration*
 | **model_devi_dt** | Float | 0.002 (recommend) | Timestep for MD |
 | **model_devi_skip** | Integer | 0 | Number of structures skipped for fp in each MD
-| **model_devi_f_trust_lo** | Float or List of float | 0.05 | Lower bound of forces for the selection. If List, should be set for each index in `sys_configs`, respectively. |
-| **model_devi_f_trust_hi** | Float or List of float | 0.15 | Upper bound of forces for the selection. If List, should be set for each index in `sys_configs`, respectively. |
-| **model_devi_v_trust_lo**  | Float or List of float | 1e10 | Lower bound of virial for the selection. If List, should be set for each index in `sys_configs`, respectively. Should be used with DeePMD-kit v2.x. |
-| **model_devi_v_trust_hi**  | Float or List of float | 1e10 | Upper bound of virial for the selection. If List, should be set for each index in `sys_configs`, respectively. Should be used with DeePMD-kit v2.x. |
+| **model_devi_f_trust_lo** | Float or List of float or Dict[str, float] | 0.05 | Lower bound of forces for the selection. If List, should be set for each index in `sys_configs`, respectively. |
+| **model_devi_f_trust_hi** | Float or List of float or Dict[str, float] | 0.15 | Upper bound of forces for the selection. If List, should be set for each index in `sys_configs`, respectively. |
+| **model_devi_v_trust_lo**  | Float or List of float or Dict[str, float] | 1e10 | Lower bound of virial for the selection. If List, should be set for each index in `sys_configs`, respectively. Should be used with DeePMD-kit v2.x. |
+| **model_devi_v_trust_hi**  | Float or List of float or Dict[str, float] | 1e10 | Upper bound of virial for the selection. If List, should be set for each index in `sys_configs`, respectively. Should be used with DeePMD-kit v2.x. |
 | model_devi_adapt_trust_lo  | Boolean | False | Adaptively determines the lower trust levels of force and virial. This option should be used together with `model_devi_numb_candi_f`,  `model_devi_numb_candi_v` and optionally with `model_devi_perc_candi_f` and `model_devi_perc_candi_v`. `dpgen` will make two sets: 1. From the frames with force model deviation lower than `model_devi_f_trust_hi`, select `max(model_devi_numb_candi_f, model_devi_perc_candi_f*n_frames)` frames with largest force model deviation. 2. From the frames with virial model deviation lower than `model_devi_v_trust_hi`, select `max(model_devi_numb_candi_v, model_devi_perc_candi_v*n_frames)` frames with largest virial model deviation. The union of the two sets is made as candidate dataset|
 | model_devi_numb_candi_f  | Int | 10 | See `model_devi_adapt_trust_lo`.|
 | model_devi_numb_candi_v  | Int | 0  | See `model_devi_adapt_trust_lo`.|
@@ -537,7 +535,8 @@ The bold notation of key (such aas **type_map**) means that it's a necessary key
 | **model_devi_jobs["ensembles"]** | String             | "nvt"                                    | Determining which ensemble used in MD, **options** include “npt” and “nvt”. |
 | model_devi_jobs["neidelay"] | Integer             | "10"                                    | delay building until this many steps since last build |
 | model_devi_jobs["taut"] | Float          | "0.1"                                    | Coupling time of thermostat (ps) |
-| model_devi_jobs["taup"] | Float             | "0.5"                                    | Coupling time of barostat (ps)
+| model_devi_jobs["taup"] | Float             | "0.5"                                    | Coupling time of barostat (ps) |
+| model_devi_jobs["model_devi_f_trust_lo"] <br> model_devi_jobs["model_devi_f_trust_hi"] <br> model_devi_jobs["model_devi_v_trust_lo"] <br> model_devi_jobs["model_devi_v_trust_hi"] | Float or Dict[str, float] | See global model_devi config above like **model_devi_f_trust_lo**. For dict, should be set for each index in sys_idx, respectively. |
 | *#Labeling*
 | **fp_style** | string                | "vasp"                                                       | Software for First Principles. **Options** include “vasp”, “pwscf”, “siesta” and “gaussian” up to now. |
 | **fp_task_max** | Integer            | 20                                                           | Maximum of  structures to be calculated in `02.fp` of each iteration. |
@@ -571,7 +570,7 @@ The bold notation of key (such aas **type_map**) means that it's a necessary key
 | **user_fp_params** | Dict |  |Parameters for cp2k calculation. find detail in manual.cp2k.org. only the kind section must be set before use.  we assume that you have basic knowledge for cp2k input.
 | **external_input_path** | String |  | Conflict with key:user_fp_params, use the template input provided by user, some rules should be followed, read the following text in detail.
 | *fp_style == ABACUS*
-| **user_fp_params** | Dict |  |Parameters for ABACUS INPUT. find detail [Here](https://github.com/deepmodeling/abacus-develop/blob/develop/docs/input-main.md#out-descriptor). If `deepks_model` is set, the model file should be in the pseudopotential directory. 
+| **user_fp_params** | Dict |  |Parameters for ABACUS INPUT. find detail [Here](https://github.com/deepmodeling/abacus-develop/blob/develop/docs/input-main.md#out-descriptor). If `deepks_model` is set, the model file should be in the pseudopotential directory. You can also set `KPT` file by adding `k_points` that corresponds to a list of six integers in this dictionary.
 | **fp_orb_files** | List |  |List of atomic orbital files. The files should be in pseudopotential directory. 
 | **fp_dpks_descriptor** | String |  |DeePKS descriptor file name. The file should be in pseudopotential directory. 
 
@@ -1016,7 +1015,6 @@ Here is an example of `param.json` for QM7 dataset:
         "auto"
     ],
     "numb_models": 4,
-    "train_param": "input.json",
     "default_training_param": {
         "model": {
             "type_map": [
@@ -1086,7 +1084,6 @@ Here is an example of `param.json` for QM7 dataset:
         },
         "_comment": "that's all"
     },
-    "use_clusters": true,
     "fp_style": "gaussian",
     "shuffle_poscar": false,
     "fp_task_max": 1000,
@@ -1109,7 +1106,7 @@ Here is an example of `param.json` for QM7 dataset:
 }
 ```
 
-Here `pick_data` is the data to simplify and currently only supports `MultiSystems` containing `System` with `deepmd/npy` format, and `use_clusters` should always be `true`. `init_pick_number` and `iter_pick_number` are the numbers of picked frames. `e_trust_lo`, `e_trust_hi` mean the range of the deviation of the frame energy, and `f_trust_lo` and `f_trust_hi` mean the range of the max deviation of atomic forces in a frame. `fp_style` can only be `gaussian` currently. Other parameters are as the same as those of generator.
+Here `pick_data` is the directory to data to simplify where the program recursively detects systems `System` with `deepmd/npy` format. `init_pick_number` and `iter_pick_number` are the numbers of picked frames. `e_trust_lo`, `e_trust_hi` mean the range of the deviation of the frame energy, and `f_trust_lo` and `f_trust_hi` mean the range of the max deviation of atomic forces in a frame. `fp_style` can only be `gaussian` currently. Other parameters are as the same as those of generator.
 
 
 ## Set up machine
@@ -1139,7 +1136,7 @@ an example of new dpgen's machine.json
 ```json
 {
   "api_version": "1.0",
-  "train": [
+  "train":
     {
       "command": "dp",
       "machine": {
@@ -1163,9 +1160,8 @@ an example of new dpgen's machine.json
         "para_deg": 3,
         "source_list": ["/home/user1234/deepmd.1.2.4.env"]
       }
-    }
-  ],
-  "model_devi":[
+    },
+  "model_devi":
     {
       "command": "lmp",
       "machine":{
@@ -1186,9 +1182,8 @@ an example of new dpgen's machine.json
         "group_size": 5,
         "source_list": ["/home/user1234/deepmd.1.2.4.env"]
       }
-    }
-  ],
-  "fp":[
+    },
+  "fp":
     {
       "command": "vasp_std",
       "machine":{
@@ -1210,7 +1205,6 @@ an example of new dpgen's machine.json
         "source_list": ["~/vasp.env"]
       }
     }
-  ]
 }
 ```
 note1: the key "local_root" in dpgen's machine.json is always `./`
@@ -1222,7 +1216,7 @@ When switching into a new machine, you may modifying the `MACHINE`, according to
 An example for `MACHINE` is:
 ```json
 {
-  "train": [
+  "train":
     {
       "machine": {
         "batch": "slurm",
@@ -1245,9 +1239,8 @@ An example for `MACHINE` is:
         "qos": "data"
       },
       "command": "USERPATH/dp"
-    }
-  ],
-  "model_devi": [
+    },
+  "model_devi":
     {
       "machine": {
         "batch": "slurm",
@@ -1271,9 +1264,8 @@ An example for `MACHINE` is:
       },
       "command": "lmp_serial",
       "group_size": 1
-    }
-  ],
-  "fp": [
+    },
+  "fp":
     {
       "machine": {
         "batch": "slurm",
@@ -1300,7 +1292,6 @@ An example for `MACHINE` is:
       "command": "vasp_gpu",
       "group_size": 1
     }
-  ]
 }
 ```
 Following table illustrates which key is needed for three types of machine: `train`,`model_devi`  and `fp`. Each of them is a list of dicts. Each dict can be considered as an independent environmnet for calculation.

diff --git a/conda/meta.yaml b/conda/meta.yaml
@@ -28,6 +28,7 @@ requirements:
     - ase
     - GromacsWrapper
     - custodian
+    - netCDF4
 
   run:
     - python >=3.6
@@ -40,6 +41,7 @@ requirements:
     - ase
     - GromacsWrapper
     - custodian
+    - netCDF4
 
 test:
   imports:

diff --git a/doc/conf.py b/doc/conf.py
@@ -40,17 +40,20 @@
 
 extensions = [
     'deepmodeling_sphinx',
+    'dargs.sphinx',
     "sphinx_rtd_theme",
     'myst_parser',
     'sphinx.ext.autosummary',
+    'sphinx.ext.viewcode',
+    'sphinxarg.ext',
 ]
 
 
 # Tell sphinx what the primary language being documented is.
-primary_domain = 'cpp'
+primary_domain = 'py'
 
 # Tell sphinx what the pygments highlight language should be.
-highlight_language = 'cpp'
+highlight_language = 'py'
 
 # 
 myst_heading_anchors = 4
@@ -81,9 +84,28 @@
 autosummary_generate = True
 master_doc = 'index'
 
+intersphinx_mapping = {
+    "python": ("https://docs.python.org/", None),
+    "dargs": ("https://docs.deepmodeling.com/projects/dargs/en/latest/", None),
+    "dpdata": ("https://docs.deepmodeling.com/projects/dpdata/en/latest/", None),
+    "dpdispatcher": ("https://docs.deepmodeling.com/projects/dpdispatcher/en/latest/", None),
+    "ase": ("https://wiki.fysik.dtu.dk/ase/", None),
+    "numpy": ("https://docs.scipy.org/doc/numpy/", None),
+    "pamatgen": ("https://pymatgen.org/", None),
+    "monty": ("https://guide.materialsvirtuallab.org/monty/", None),
+    "paramiko": ("https://docs.paramiko.org/en/stable/", None),
+    "custodian": ("https://cloudcustodian.io/docs/", None),
+    "GromacsWrapper": ("https://gromacswrapper.readthedocs.io/en/latest/", None),
+}
+
+
+def run_apidoc(_):
+    from sphinx.ext.apidoc import main
+    sys.path.append(os.path.join(os.path.dirname(__file__), '..'))
+    cur_dir = os.path.abspath(os.path.dirname(__file__))
+    module = os.path.join(cur_dir, "..", "dpgen")
+    main(['-M', '--tocfile', 'api', '-H', 'DP-GEN API', '-o', os.path.join(cur_dir, "api"), module, '--force'])
 
-def generate_arginfo(app):
-    subprocess.check_output((sys.executable, "gen_arginfo.py"), universal_newlines=True)
 
 def setup(app):
-    app.connect('builder-inited', generate_arginfo)
+    app.connect('builder-inited', run_apidoc)
diff --git a/doc/gen_arginfo.py b/doc/gen_arginfo.py
diff --git a/doc/index.rst b/doc/index.rst
@@ -2,13 +2,60 @@
 DPGEN's documentation
 ==========================
 
-.. _parameters::
+.. _overview::
 
 .. toctree::
    :maxdepth: 2
-   :caption: Parameters
+   :caption: Overview
+
+   overview/cli
+
+
+.. _installation::
+
+.. toctree::
+   :maxdepth: 2
+   :caption: Installation 
+
+
+.. _run::
+
+.. toctree::
+   :maxdepth: 2
+   :caption: Run
+
+   run/run-process.rst
+   run/param.rst
+   run/mdata.rst
+
+.. _init::
+
+.. toctree::
+   :maxdepth: 2
+   :caption: Init   
+
+   init/init-bulk-mdata
+   init/init-surf-mdata
+   init/init-reaction
+   init/init-reaction-jdata
+   init/init-reaction-mdata
+
+.. _autotest::
+
+.. toctree::
+   :maxdepth: 2
+   :caption: Autotest   
+
+
+.. _simplify::
+
+.. toctree::
+   :maxdepth: 2
+   :caption: Simplify  
+
+   simplify/simplify-jdata
+   simplify/simplify-mdata
 
-   run-mdata.rst
 
 .. _tutorial:
 
@@ -17,16 +64,17 @@ DPGEN's documentation
    :caption: Tutorial
    :glob:  
 
-   toymodels/*
+   Tutorials <https://tutorials.deepmodeling.com/en/latest/Tutorials/DP-GEN/>
 
 
 .. _Contribution:
 
 .. toctree::
    :maxdepth: 2
-   :caption: Contribution Guild
+   :caption: Contribution Guide
 
    README.md
+   api/api
 
 * :ref:`genindex`
 * :ref:`modindex`

diff --git a/doc/init/init-bulk-mdata.rst b/doc/init/init-bulk-mdata.rst
@@ -0,0 +1,6 @@
+dpgen init_bulk machine parameters
+==================================
+
+.. dargs::
+   :module: dpgen.data.arginfo
+   :func: init_bulk_mdata_arginfo
diff --git a/doc/init/init-reaction-jdata.rst b/doc/init/init-reaction-jdata.rst
@@ -0,0 +1,6 @@
+dpgen init_reaction parameters
+======================================
+
+.. dargs::
+   :module: dpgen.data.arginfo
+   :func: init_reaction_jdata_arginfo
diff --git a/doc/init/init-reaction-mdata.rst b/doc/init/init-reaction-mdata.rst
@@ -0,0 +1,6 @@
+dpgen init_reaction machine parameters
+======================================
+
+.. dargs::
+   :module: dpgen.data.arginfo
+   :func: init_reaction_mdata_arginfo
diff --git a/doc/init/init-reaction.md b/doc/init/init-reaction.md
@@ -0,0 +1,21 @@
+# init_reaction
+
+`dpgen init_reaction` is a workflow to initilize data for reactive systems of small gas-phase molecules. The workflow was introduced in the "Initialization" section of [Energy & Fuels, 2021, 35 (1), 762–769](https://10.1021/acs.energyfuels.0c03211).
+
+To start the workflow, one needs a box containing reactive systems. The following packages are required for each of the step:
+- Exploring: [LAMMPS](https://github.com/lammps/lammps)
+- Sampling: [MDDatasetBuilder](https://github.com/tongzhugroup/mddatasetbuilder)
+- Labeling: [Gaussian](https://gaussian.com/)
+
+The Exploring step uses LAMMPS [pair_style reaxff](https://docs.lammps.org/latest/pair_reaxff.html) to run a short ReaxMD NVT MD simulation. In the Sampling step, molecular clusters are taken and k-means clustering algorithm is applied to remove the redundancy, which is described in [Nature Communications, 11, 5713 (2020)](https://doi.org/10.1038/s41467-020-19497-z). The Labeling step calculates energies and forces using the Gaussian package.
+
+An example of `reaction.json` is given below:
+
+```{literalinclude} ../../examples/init/reaction.json
+:language: json
+:linenos:
+```
+
+For detailed parameters, see [parametes](init-reaction-jdata.rst) and [machine parameters](init-reaction-mdata.rst).
+
+The genereated data can be used to continue DP-GEN concurrent learning workflow. Read [Energy & Fuels, 2021, 35 (1), 762–769](https://10.1021/acs.energyfuels.0c03211) for details.
diff --git a/doc/init/init-surf-mdata.rst b/doc/init/init-surf-mdata.rst
@@ -0,0 +1,6 @@
+dpgen init_surf machine parameters
+==================================
+
+.. dargs::
+   :module: dpgen.data.arginfo
+   :func: init_surf_mdata_arginfo
diff --git a/doc/overview/cli.rst b/doc/overview/cli.rst
@@ -0,0 +1,7 @@
+Command line interface
+======================
+
+.. argparse::
+   :module: dpgen.main
+   :func: main_parser
+   :prog: dpgen
diff --git a/doc/requirements.txt b/doc/requirements.txt
@@ -2,6 +2,8 @@ sphinx>=4.0.2
 recommonmark
 sphinx_rtd_theme
 sphinx_markdown_tables
+sphinx-argparse
 myst-parser
 deepmodeling_sphinx
+dargs>=0.3.1
 .
diff --git a/doc/run-mdata.rst b/doc/run-mdata.rst
-Original file line number
+Diff line change
@@ Expand Up / @@ -41,3 +41,4 @@ dbconfig.json @@
     .idea/*
     _build
     tests/generator/calypso_test_path
+    doc/api/