v2.0.0
What's new
This major release introduces a few breaking changes. We've provided more information here: OLMo-core v2 design and upgrade guide.
Added π
- Added
TrainModuleabstraction withTransformerTrainModuleimplementation, which encapsulates both a model and optimizer. - Added
namespaceargument toTrainer.record_metric(). - Added support for context parallelism.
- Added support for expert parallelism with MoE models.
- Added in-loop evals for Minerva, GSM, HumanEval, MBPP (
ai2-olmo-eval==0.7.0) - Added
CosWithWarmupAndLinearDecaylearning rate scheduler - Added
WSDlearning rate scheduler
Changed β οΈ
- The
Trainernow takes aTrainModuleinstead of a model and optimizer, and several configuration options have been moved toTransformerTrainModule, includingrank_microbatch_size,fused_loss,compile_loss,z_loss_multiplier, andautocast_precision. - Several
TransformerModelConfigoptions have been toTransformerTrainModule/TransformerTrainModuleConfig, includingdp_config,tp_config,float8_config, andcompile.
Removed π
- Removed the following callbacks:
MoEHandlerCallback,SchedulerCallback,MatrixNormalizerCallback,GradClipperCallback, andFloat8HandlerCallback.
The functionality from all of those callbacks has been moved to theTransformerTrainModuleclass. - Removed the callback methods
.pre_eval_batch()and.post_eval_batch().
Fixed β
- Fixed the model ladder code when training on mps or cpu device
Commits
dfa8f2b (chore) prepare for release v2.0.0
95fb084 add work-around for pytorch/ao#1871 (#205)
3ce0c58 32B Documentation (#210)
41f8ddc Add a public "official" version of our 32B train script (#214)
7e58d12 Update data paths in example to public URLs (#213)
4327bb9 upload data to r2 and updated their paths (#208)
0e6ea23 Assorted improvements (#207)
9ceb1e4 Add CUDA 12.6 images (#209)
eda3afb guard against wrapping MoE modules for AC (#206)
6e5b16f Bump ai2-olmo-eval==0.7.0 (in-loop Minerva, GSM, HumanEval, MBPP) (#204)
eccdc00 Make it easier for external users to run train scripts (#203)
da33f5b fix entrypoint steps
947a293 clean up changelog
725adf3 V2 (#202)