Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support S3 checkpointing for the torch strategy in distributed checkpointing #748

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Commits on Mar 22, 2024

  1. Support cloud storage for the torch strategy in dist checkpointing

    This commit adds support for saving checkpoints to cloud storage
    (e.g., S3) and loading checkpoints from cloud storage for the
    torch strategy in distributed checkpointing. It does so by
    replacing pathlib.Path with cloudpathlib.AnyPath, FileSystemReader
    with FsspecSystemReader, and FileSytemWriter with
    FsspecSystemWriter.
    
    The commit enables cloud checkpointing, but makes little attempt
    to optimize it.
    Jake Marcus committed Mar 22, 2024
    Configuration menu
    Copy the full SHA
    606b08c View commit details
    Browse the repository at this point in the history