Skip to content

fix(pt): typo in epoch training#5410

Merged
njzjz merged 2 commits into
deepmodeling:masterfrom
OutisLi:pr/epochbug
Apr 22, 2026
Merged

fix(pt): typo in epoch training#5410
njzjz merged 2 commits into
deepmodeling:masterfrom
OutisLi:pr/epochbug

Conversation

@OutisLi
Copy link
Copy Markdown
Collaborator

@OutisLi OutisLi commented Apr 22, 2026

Summary by CodeRabbit

  • Bug Fixes
    • Fixed configuration lookup for the epoch count so training now reads and respects the configured number of epochs.

Copilot AI review requested due to automatic review settings April 22, 2026 01:14
@OutisLi OutisLi requested a review from njzjz April 22, 2026 01:15
@dosubot dosubot Bot added the bug label Apr 22, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes PyTorch trainer configuration parsing so epoch-based training works when configs are normalized via deepmd.utils.argcheck.normalize().

Changes:

  • Read training.numb_epoch (canonical argcheck key) instead of training.num_epoch in the PyTorch Trainer initialization.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 22, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 702b5d84-6b1c-4e33-8b6b-ee13e431331f

📥 Commits

Reviewing files that changed from the base of the PR and between 716a15b and 86a93af.

📒 Files selected for processing (1)
  • deepmd/pd/train/training.py

📝 Walkthrough

Walkthrough

Trainer initialization now reads epoch count from the "numb_epoch" key instead of "num_epoch" in training parameter dictionaries. No other iteration-related fields or control flow were modified.

Changes

Cohort / File(s) Summary
Configuration Parameter Rename
deepmd/pt/train/training.py, deepmd/pd/train/training.py
Replaced lookup of training parameter key from "num_epoch" to "numb_epoch" in Trainer.__init__; no other logic or public interfaces changed.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title refers to a typo fix in epoch training configuration, which aligns with the changes in both pt and pd training modules that correct 'num_epoch' to 'numb_epoch' lookup.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 22, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.47%. Comparing base (1a1dc59) to head (86a93af).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #5410   +/-   ##
=======================================
  Coverage   80.47%   80.47%           
=======================================
  Files         820      820           
  Lines       86005    86005           
  Branches     4139     4140    +1     
=======================================
+ Hits        69209    69210    +1     
  Misses      15521    15521           
+ Partials     1275     1274    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Member

@njzjz njzjz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I searched num_epoch and found many other places using num_epoch

https://github.com/search?q=repo%3Adeepmodeling%2Fdeepmd-kit%20num_epoch&type=code

@OutisLi
Copy link
Copy Markdown
Collaborator Author

OutisLi commented Apr 22, 2026

I searched num_epoch and found many other places using num_epoch

https://github.com/search?q=repo%3Adeepmodeling%2Fdeepmd-kit%20num_epoch&type=code

I searched num_epoch and found many other places using num_epoch

https://github.com/search?q=repo%3Adeepmodeling%2Fdeepmd-kit%20num_epoch&type=code

Thanks for the catch! I went through every num_epoch hit in that search.

The only other occurrence that had the same bug was in the Paddle backend, which I've now fixed in the latest commit:

deepmd/pd/train/training.py: training_params.get("num_epoch") → training_params.get("numb_epoch") (same root cause — the canonical key after normalize() is numb_epoch, while num_epoch is only an alias).
The remaining num_epoch mentions are intentional and not bugs:

deepmd/tf/entrypoints/train.py and change_bias.py: num_epoch is just a local variable name; the .get(...) call already uses the canonical "numb_epoch".
self.num_epoch, num_epoch_dict_config, test_sampler.py::num_epoch = 1.5, etc.: internal attribute/variable/parameter names, unrelated to the config key.
num_epoch_dict is itself the canonical name (not an alias of numb_epoch_dict), so all usages are correct.
num_epoch in docstrings and error messages in argcheck.py / TF entrypoints is user-facing text; since num_epoch is a documented alias, keeping it matches what most users actually write in their input files.
examples/water/se_e2_a/input_torch_num_epoch.json uses "num_epoch": 100, which is a legal alias and still works

@OutisLi OutisLi requested a review from njzjz April 22, 2026 05:47
Copy link
Copy Markdown
Member

@njzjz njzjz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@njzjz njzjz enabled auto-merge April 22, 2026 06:48
@njzjz njzjz added this pull request to the merge queue Apr 22, 2026
Merged via the queue into deepmodeling:master with commit e98e1c3 Apr 22, 2026
70 checks passed
@OutisLi OutisLi deleted the pr/epochbug branch April 23, 2026 07:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants