fix(pt): remove meaningless error raising by OutisLi · Pull Request #5411 · deepmodeling/deepmd-kit

OutisLi · 2026-04-22T01:18:15Z

Summary by CodeRabbit

Bug Fixes
- Enhanced distributed training configuration handling to automatically adjust advanced optimization settings when they're enabled without proper distributed training setup, allowing training to proceed instead of failing.

coderabbitai · 2026-04-22T01:20:42Z

📝 Walkthrough

Walkthrough

The Trainer constructor now automatically downgrades zero_stage to 0 instead of raising a ValueError when distributed training stage configuration is set without an initialized distributed context. The subsequent validation check for change_bias_after_training is bypassed as a result.

Changes

Cohort / File(s)	Summary
Distributed Training Configuration `deepmd/pt/train/training.py`	Modified error handling in Trainer constructor to downgrade `zero_stage` to 0 instead of raising an exception when distributed context is unavailable.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: removing a ValueError that was raised when zero_stage > 0 without distributed context, which the PR characterizes as 'meaningless' since the code now handles this case.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deepmd/pt/train/training.py`:
- Around line 185-186: The code silently resets self.zero_stage to 0 when
self.is_distributed is False; change this to emit a clear warning before
mutating the value so users know their requested ZeRO/FSDP stage was ignored
(e.g., call warnings.warn(...) or self.logger.warning(...) with a message like
"zero_stage X requested but distributed launch not detected; forcing
zero_stage=0"), then set self.zero_stage = 0 as before; update the block
containing self.zero_stage and self.is_distributed to perform the warning first
and ensure the message includes the original requested value and guidance about
using torchrun.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 23b375ec-8cc0-4f65-ba75-efbeb613411d

📥 Commits

Reviewing files that changed from the base of the PR and between 1a1dc59 and 5081e5b.

📒 Files selected for processing (1)

deepmd/pt/train/training.py

Copilot

Pull request overview

This PR changes the PyTorch Trainer initialization behavior so that requesting training.zero_stage > 0 without an initialized distributed process group no longer raises an error and instead falls back to zero_stage = 0.

Changes:

Removed the ValueError when zero_stage > 0 but training is not running under distributed initialization.
Added an implicit fallback that sets self.zero_stage = 0 in that case.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

codecov · 2026-04-22T02:21:48Z

Codecov Report

❌ Patch coverage is 0% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 80.46%. Comparing base (1a1dc59) to head (5081e5b).
⚠️ Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
deepmd/pt/train/training.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #5411      +/-   ##
==========================================
- Coverage   80.47%   80.46%   -0.01%     
==========================================
  Files         820      820              
  Lines       86005    86005              
  Branches     4139     4139              
==========================================
- Hits        69209    69208       -1     
  Misses      15521    15521              
- Partials     1275     1276       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

njzjz

This change is reasonable to me.

fix: remove meaningless error raising

5081e5b

Copilot AI review requested due to automatic review settings April 22, 2026 01:18

github-actions Bot added the Python label Apr 22, 2026

OutisLi requested a review from njzjz April 22, 2026 01:18

dosubot Bot added the bug label Apr 22, 2026

Copilot started reviewing on behalf of OutisLi April 22, 2026 01:18 View session

coderabbitai Bot reviewed Apr 22, 2026

View reviewed changes

Comment thread deepmd/pt/train/training.py

Copilot AI reviewed Apr 22, 2026

View reviewed changes

Comment thread deepmd/pt/train/training.py

njzjz approved these changes Apr 22, 2026

View reviewed changes

njzjz added this pull request to the merge queue Apr 22, 2026

Merged via the queue into deepmodeling:master with commit 5d9cbdf Apr 22, 2026
76 of 77 checks passed

OutisLi deleted the pr/zero branch April 22, 2026 11:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(pt): remove meaningless error raising#5411

fix(pt): remove meaningless error raising#5411
njzjz merged 1 commit intodeepmodeling:masterfrom
OutisLi:pr/zero

OutisLi commented Apr 22, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 22, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

codecov Bot commented Apr 22, 2026 •

edited

Loading

Uh oh!

njzjz left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

OutisLi commented Apr 22, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 22, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

codecov Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

njzjz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

OutisLi commented Apr 22, 2026 •

edited by coderabbitai Bot

Loading

codecov Bot commented Apr 22, 2026 •

edited

Loading