Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Support different backends for DeePMD-kit #1462

Closed
njzjz opened this issue Jan 27, 2024 · 3 comments · Fixed by #1545
Closed

[Feature Request] Support different backends for DeePMD-kit #1462

njzjz opened this issue Jan 27, 2024 · 3 comments · Fixed by #1545
Labels
enhancement New feature or request

Comments

@njzjz
Copy link
Member

njzjz commented Jan 27, 2024

Summary

Support PyTorch backend for DeePMD-kit.

Detailed Description

Although I have submitted deepmodeling/deepmd-kit#3191 to make most of the things the same, the suffixes of the model and checkpoint files are different by design. These suffixes are hardcoded in the code.

backward_files = ["frozen_model.pb", "lcurve.out", "train.log"]
backward_files += [
"model.ckpt.meta",
"model.ckpt.index",
"model.ckpt.data-00000-of-00001",
"checkpoint",
]
if jdata.get("dp_compress", False):
backward_files.append("frozen_model_compressed.pb")

if not jdata.get("dp_compress", False):
model_name = "frozen_model.pb"
else:
model_name = "frozen_model_compressed.pb"
task_file = os.path.join(train_task_fmt % ii, model_name)
ofile = os.path.join(work_path, "graph.%03d.pb" % ii)

all_models = glob.glob(os.path.join(work_path, "graph*pb"))

models = glob.glob(os.path.join(train_path, "graph*pb"))

Besides, glob doesn't work for backward_files - in dpdispatcher, some contexts like Bohrium don't support it.
So, may we consider adding a parameter to switch?

Further Information, Files, and Links

No response

@njzjz
Copy link
Member Author

njzjz commented Mar 24, 2024

This is blocked by deepmodeling/deepmd-kit#3475.

@thangckt
Copy link

thangckt commented May 6, 2024

hi @njzjz

Did you solved this problem?
I made some changes to have option for choosing between backends TF and PT, in this file

If you find it helpful, I will make a PR. Otherwise, just omit

@njzjz
Copy link
Member Author

njzjz commented May 6, 2024

Did you solved this problem?

I don't have plans for this issue.

Contribution is welcome.

@njzjz njzjz linked a pull request May 7, 2024 that will close this issue
@njzjz njzjz linked a pull request May 10, 2024 that will close this issue
wanghan-iapcm pushed a commit that referenced this issue May 11, 2024
reopen PR #1541 due to branch is deleted

add a new key in `param.json` file

```
"train_backend": "pytorch"/"tensorflow",
```
relate to this issue #1462

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit


- **New Features**
- Improved model management by dynamically generating model suffixes
based on the selected backend, enhancing compatibility.
  
- **Enhancements**
- Updated model-related functions to incorporate backend-specific model
suffixes for accurate file handling during training processes.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: C. Thang Nguyen <46436648+thangckt@users.noreply.github.com>
Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
@njzjz njzjz closed this as completed May 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
2 participants