Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CheckpointIO classes to split checkpoints #12712

Open
carmocca opened this issue Apr 11, 2022 · 2 comments
Open

Add CheckpointIO classes to split checkpoints #12712

carmocca opened this issue Apr 11, 2022 · 2 comments
Labels
design Includes a design discussion feature Is an improvement or enhancement io IO plugin related
Milestone

Comments

@carmocca
Copy link
Contributor

carmocca commented Apr 11, 2022

🚀 Feature

Create new CheckpointIO classes that allow:

Motivation

See existing discussions in huggingface/transformers#13548 for why these are interesting features to end-users.

Pitch

Trainer(plugins=CheckpointKeySplitterIO()) or Trainer(plugins=CheckpointSizeSplitterIO(max_size="10GB"))

Alternatives

Have users create and maintain these solutions. The Trainer should allow it.


If you enjoy Lightning, check out our other projects! ⚡

  • Metrics: Machine learning metrics for distributed, scalable PyTorch applications.

  • Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.

  • Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.

  • Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.

  • Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

cc @Borda @tchaton @justusschock @awaelchli @jjenniferdai @rohitgr7

@carmocca carmocca added feature Is an improvement or enhancement design Includes a design discussion io IO plugin related labels Apr 11, 2022
@carmocca carmocca added this to the future milestone Apr 11, 2022
@rohitgr7
Copy link
Contributor

Splitting a checkpoint by keys, enabling #5339

how does it enable this?
I think #5339 is related to the reloading of checkpoints.

@carmocca
Copy link
Contributor Author

If there's a checkpoint directory with a file called optimizer_states.ckpt, then we should allow the user to remove this file and have everything work as expected without reloading the optimizer state.

Maybe "enable" wasn't the right word but "supports"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Includes a design discussion feature Is an improvement or enhancement io IO plugin related
Projects
No open projects
Status: No status
Development

No branches or pull requests

3 participants