Skip to content

v2.0.0

Choose a tag to compare

@github-actions github-actions released this 13 Mar 01:42
· 385 commits to main since this release

What's new

This major release introduces a few breaking changes. We've provided more information here: OLMo-core v2 design and upgrade guide.

Added πŸŽ‰

  • Added TrainModule abstraction with TransformerTrainModule implementation, which encapsulates both a model and optimizer.
  • Added namespace argument to Trainer.record_metric().
  • Added support for context parallelism.
  • Added support for expert parallelism with MoE models.
  • Added in-loop evals for Minerva, GSM, HumanEval, MBPP (ai2-olmo-eval==0.7.0)
  • Added CosWithWarmupAndLinearDecay learning rate scheduler
  • Added WSD learning rate scheduler

Changed ⚠️

  • The Trainer now takes a TrainModule instead of a model and optimizer, and several configuration options have been moved to TransformerTrainModule, including rank_microbatch_size, fused_loss, compile_loss, z_loss_multiplier, and autocast_precision.
  • Several TransformerModelConfig options have been to TransformerTrainModule / TransformerTrainModuleConfig, including dp_config, tp_config, float8_config, and compile.

Removed πŸ‘‹

  • Removed the following callbacks: MoEHandlerCallback, SchedulerCallback, MatrixNormalizerCallback, GradClipperCallback, and Float8HandlerCallback.
    The functionality from all of those callbacks has been moved to the TransformerTrainModule class.
  • Removed the callback methods .pre_eval_batch() and .post_eval_batch().

Fixed βœ…

  • Fixed the model ladder code when training on mps or cpu device

Commits

dfa8f2b (chore) prepare for release v2.0.0
95fb084 add work-around for pytorch/ao#1871 (#205)
3ce0c58 32B Documentation (#210)
41f8ddc Add a public "official" version of our 32B train script (#214)
7e58d12 Update data paths in example to public URLs (#213)
4327bb9 upload data to r2 and updated their paths (#208)
0e6ea23 Assorted improvements (#207)
9ceb1e4 Add CUDA 12.6 images (#209)
eda3afb guard against wrapping MoE modules for AC (#206)
6e5b16f Bump ai2-olmo-eval==0.7.0 (in-loop Minerva, GSM, HumanEval, MBPP) (#204)
eccdc00 Make it easier for external users to run train scripts (#203)
da33f5b fix entrypoint steps
947a293 clean up changelog
725adf3 V2 (#202)