Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: improve checkpoint description #2784

Merged
merged 1 commit into from
Sep 5, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
4 changes: 2 additions & 2 deletions deepmd/entrypoints/train.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,9 +86,9 @@ def train(
INPUT : str
json/yaml control file
init_model : Optional[str]
path to checkpoint folder or None
path prefix of checkpoint files or None
restart : Optional[str]
path to checkpoint folder or None
path prefix of checkpoint files or None
output : str
path for dump file with arguments
init_frz_model : str
Expand Down
2 changes: 1 addition & 1 deletion deepmd/model/frozen.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ def build(
frz_model : str, optional
The path to the frozen model
ckpt_meta : str, optional
The path to the checkpoint and meta file
The path prefix of the checkpoint and meta files
suffix : str, optional
The suffix of the scope
reuse : bool or tf.AUTO_REUSE, optional
Expand Down
2 changes: 1 addition & 1 deletion deepmd/model/linear.py
Original file line number Diff line number Diff line change
Expand Up @@ -166,7 +166,7 @@ def build(
frz_model : str, optional
The path to the frozen model
ckpt_meta : str, optional
The path to the checkpoint and meta file
The path prefix of the checkpoint and meta files
suffix : str, optional
The suffix of the scope
reuse : bool or tf.AUTO_REUSE, optional
Expand Down
4 changes: 2 additions & 2 deletions deepmd/model/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -191,7 +191,7 @@ def build(
frz_model : str, optional
The path to the frozen model
ckpt_meta : str, optional
The path to the checkpoint and meta file
The path prefix of the checkpoint and meta files
suffix : str, optional
The suffix of the scope
reuse : bool or tf.AUTO_REUSE, optional
Expand Down Expand Up @@ -259,7 +259,7 @@ def build_descrpt(
frz_model : str, optional
The path to the frozen model
ckpt_meta : str, optional
The path to the checkpoint and meta file
The path prefix of the checkpoint and meta files
suffix : str, optional
The suffix of the scope
reuse : bool or tf.AUTO_REUSE, optional
Expand Down
2 changes: 1 addition & 1 deletion deepmd/utils/argcheck.py
Original file line number Diff line number Diff line change
Expand Up @@ -1601,7 +1601,7 @@ def training_args(): # ! modified by Ziyao: data configuration isolated.
doc_disp_file = "The file for printing learning curve."
doc_disp_freq = "The frequency of printing learning curve."
doc_save_freq = "The frequency of saving check point."
doc_save_ckpt = "The file name of saving check point."
doc_save_ckpt = "The path prefix of saving check point files."
doc_disp_training = "Displaying verbose information during training."
doc_time_training = "Timing durining training."
doc_profiling = "Profiling during training."
Expand Down
6 changes: 3 additions & 3 deletions deepmd_cli/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -150,14 +150,14 @@ def main_parser() -> argparse.ArgumentParser:
"--init-model",
type=str,
default=None,
help="Initialize the model by the provided checkpoint.",
help="Initialize the model by the provided path prefix of checkpoint files.",
)
parser_train_subgroup.add_argument(
"-r",
"--restart",
type=str,
default=None,
help="Restart the training from the provided checkpoint.",
help="Restart the training from the provided path prefix of checkpoint files.",
)
parser_train_subgroup.add_argument(
"-f",
Expand Down Expand Up @@ -549,7 +549,7 @@ def main_parser() -> argparse.ArgumentParser:
"--restart",
type=str,
default=None,
help="Restart the training from the provided checkpoint.",
help="Restart the training from the provided prefix of checkpoint files.",
)
parser_train_nvnmd.add_argument(
"-s",
Expand Down
4 changes: 2 additions & 2 deletions doc/nvnmd/nvnmd.md
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ where items are defined as:
| numb_test | the accuracy is test by using {numb_test} sample | a positive integer |
| disp_file | the log file where the training message display | a string |
| disp_freq | display frequency | a positive integer |
| save_ckpt | check point file | a string |
| save_ckpt | path prefix of check point files | a string |
| save_freq | save frequency | a positive integer |
| systems | a list of data directory which contains the dataset | string list |
| set_prefix | the prefix of dataset | a string |
Expand All @@ -181,7 +181,7 @@ dp train-nvnmd train_qnn.json -s s2

After the training process, you will get two folders: `nvnmd_cnn` and `nvnmd_qnn`. The `nvnmd_cnn` contains the model after continuous neural network (CNN) training. The `nvnmd_qnn` contains the model after quantized neural network (QNN) training. The binary file `nvnmd_qnn/model.pb` is the model file that is used to perform NVNMD in the server [http://nvnmd.picp.vip].

You can also restart the CNN training from the checkpoint (`nvnmd_cnn/model.ckpt`) by
You can also restart the CNN training from the path prefix of checkpoint files (`nvnmd_cnn/model.ckpt`) by

``` bash
dp train-nvnmd train_cnn.json -r nvnmd_cnn/model.ckpt -s s1
Expand Down
2 changes: 1 addition & 1 deletion doc/train/training-advanced.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,7 @@ optional arguments:
--skip-neighbor-stat Skip calculating neighbor statistics. Sel checking, automatic sel, and model compression will be disabled. (default: False)
```

**`--init-model model.ckpt`**, initializes the model training with an existing model that is stored in the checkpoint `model.ckpt`, the network architectures should match.
**`--init-model model.ckpt`**, initializes the model training with an existing model that is stored in the path prefix of checkpoint files `model.ckpt`, the network architectures should match.

**`--restart model.ckpt`**, continues the training from the checkpoint `model.ckpt`.

Expand Down