Skip to content

RemoteIterator skip iter ckpt#2824

Merged
copybara-service[bot] merged 1 commit intomainfrom
aireen/colocated_skip_ckpt
Dec 15, 2025
Merged

RemoteIterator skip iter ckpt#2824
copybara-service[bot] merged 1 commit intomainfrom
aireen/colocated_skip_ckpt

Conversation

@aireenmei
Copy link
Copy Markdown
Collaborator

@aireenmei aireenmei commented Dec 12, 2025

Description

Checkpointing Grain iterator when using RemoteIterator (for Pathways+colocated python) is not properly implemented yet. Causing this error: https://b.corp.google.com/issues/466407361#comment20. This PR skips data_iterator checkpointing to unblock testing the data loading part of Grain pipeline. Checkpointing model weights is not impacted. Will fix the checkpointing implementation later.

If the change fixes a bug or a Github issue, please include a link, e.g.,:
FIXES: https://b.corp.google.com/issues/466407361#comment20

Tests

Tested on v5e-32 and verified checkpointing only model weights work log

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@copybara-service copybara-service Bot merged commit 47bff86 into main Dec 15, 2025
78 of 84 checks passed
@copybara-service copybara-service Bot deleted the aireen/colocated_skip_ckpt branch December 15, 2025 23:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants