Skip to content

Add checkpoint and resume#1575

Merged
jayhenry merged 5 commits intoInternLM:rl_designfrom
jayhenry:rl06
Mar 17, 2026
Merged

Add checkpoint and resume#1575
jayhenry merged 5 commits intoInternLM:rl_designfrom
jayhenry:rl06

Conversation

@jayhenry
Copy link
Collaborator

No description provided.

@jayhenry jayhenry changed the title Add checkpoint and resume [WIP] Add checkpoint and resume Mar 13, 2026
@jayhenry jayhenry changed the title [WIP] Add checkpoint and resume Add checkpoint and resume Mar 13, 2026
@jayhenry
Copy link
Collaborator Author

@cursor review

@cursor
Copy link

cursor bot commented Mar 16, 2026

Skipping Bugbot: Bugbot is disabled for this repository. Visit the Bugbot dashboard to update your settings.

# self._maybe_save_hf()
# self._maybe_save_checkpoint()
self._maybe_save_checkpoint(rollout_idx)
# TODO: self._maybe_save_hf()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为啥不用save_hf了呀?

- Implemented save and resume methods in AgentLoopManager and Sampler for managing dataloader state.
- Enhanced RLColocateTrainer to support checkpoint configuration, including saving and resuming training state.
- Updated configuration to include checkpoint parameters and integrated checkpoint handling in training workflow.
- Added logging for checkpoint operations to improve traceability.
@jayhenry
Copy link
Collaborator Author

@claude fix lint error

@jayhenry jayhenry merged commit 6706198 into InternLM:rl_design Mar 17, 2026
2 of 6 checks passed
@jayhenry jayhenry deleted the rl06 branch March 18, 2026 02:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants