Skip to content

v2.4.0

Choose a tag to compare

@github-actions github-actions released this 20 Nov 19:15
· 151 commits to main since this release

What's new

Added πŸŽ‰

  • Added option to skip ranges of steps in the trainer.
  • Send a Slack notification when a Beaker job appears to be stuck.
  • Added ignore_fingerprint_mismatch parameter to NumpyDataLoaderConfig to allow resuming training from a checkpoint with a different dataset mix.
  • Added helpful error messages when OLMo-mix-0625 files are not found, directing users to use OLMo-mix-0925 and the fingerprint override flag.
  • Added olmo_core.generate.chat module to allow interacting with OlmoCore models without conversion to other formats.
  • Added GAPMonitorCallback for monitoring gradients, activations, and parameters (GAP).
  • Added official Olmo 3 7B and 32B pretraining scripts and data mix.
  • Added official Olmo 3 7B and 32B midtraining scripts and data mix.
  • Added official Olmo 3 7B and 32B long-context scripts and data mix.
  • Added a NoOpOptimizer that does nothing, uses no memory, and can be used for debugging.
  • Added official config for Olmo 3 32B.
  • Olmo 3 model card and checkpoint manifests.

Fixed βœ…

  • Set missing NCCL_NVLSTREE_MAX_CHUNKSIZE env var that is now needed for running jobs on Augusta cluster.
  • Fixed bug with RemoteFileSystemReader that caused excess memory usage.
  • No longer overrides random's RNG seed when building SourceMixtureDatasetConfig.
  • Fix handling URLs in olmo_core.nn.hf.checkpoint.save_hf_model and in examples/huggingface.
  • Fix potential NaN loss that can occur when using instance masking.
  • Stability improvements developed while training Olmo3 32B.

Changed ⚠️

  • Removed unused field in YaRNRoPEScalingConfig.

Commits

1ed8900 (chore) prepare for release v2.4.0
2c179c2 (chore) prepare for release v2.4.0 (#467)
7e0431f Fix link to 7B midtrain script (#469)
843fe3d Olmo3 model cards, checkpoint manifest, and readme (#468)
cbdc2f1 Olmo3 32B cleanup and checkin (#460)
20548a0 Official Olmo3 32B long-context script (#465)
14b15cc Official Olmo3 32B midtrain script(s) and mix(es) (#466)
a25a514 Official Olmo3 32B pretrain config and data mix (#464)
6b73ba0 Official Olmo3-7B long context script (#458)
bdc61e4 Official Olmo3-7B midtraining scripts (#445)
55804bf 32B official config (#454)
a86131d Slight refactor of Yarn Scaling Config (#456)
68c7409 Handle target URLs properly in HF conversion (#453)
2504cc2 Instance mask correction to avoid nan loss (#452)
137274e Add callback to monitor grads, activations, params (#446)
0959a54 Improve mem usage of RemoteFileSystemReader (#451)
600d2fe Official Olmo3-7B pretraining scripts (#443)
accc310 make launch timeout configurable from CLI
aa0e629 Avoid overriding RNG seed when building SourceMixtureDatasetConfig (#449)
98ba2e4 NoOp optimizer (#444)
03e6836 OlmoCore native chat interface (#439)
7a0bbd7 unset 2 NCCL env vars per Google's recommendation
aacb6eb only send local Slack notifications when callback is enabled (#441)
bfc8d7a Min python version to 3.10 (#442)
96d43d4 Set missing NCCL_NVLSTREE_MAX_CHUNKSIZE env var (#440)
5ad6db5 hot fix for listing gcs dirs
2186957 Allow manual bypass of fingerprint mismatch when switching datasets (#435)
043505d hot fix to step regex
dd7e747 Send a Slack notification when a Beaker job appears to be stuck (#431)
e27a9b4 Add WSDS (Warmup-Stable-Decay-Simplified) Scheduler (#419)
c92320f Use a dataclass for 'Trainer.steps_to_skip' (#430)
9669268 clean up checkpointing code to minimize distributed communication (#428)
87d64b9 fix changelog
269bf02 Add option to skip ranges of steps in the trainer (#425)