Add CheckpointIO
classes to split checkpoints
#12712
Labels
Milestone
CheckpointIO
classes to split checkpoints
#12712
🚀 Feature
Create new
CheckpointIO
classes that allow:Motivation
See existing discussions in huggingface/transformers#13548 for why these are interesting features to end-users.
Pitch
Trainer(plugins=CheckpointKeySplitterIO())
orTrainer(plugins=CheckpointSizeSplitterIO(max_size="10GB"))
Alternatives
Have users create and maintain these solutions. The
Trainer
should allow it.If you enjoy Lightning, check out our other projects! ⚡
Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.
Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.
Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.
cc @Borda @tchaton @justusschock @awaelchli @jjenniferdai @rohitgr7
The text was updated successfully, but these errors were encountered: