v2.4.0
What's new
Added π
- Added option to skip ranges of steps in the trainer.
- Send a Slack notification when a Beaker job appears to be stuck.
- Added
ignore_fingerprint_mismatchparameter toNumpyDataLoaderConfigto allow resuming training from a checkpoint with a different dataset mix. - Added helpful error messages when OLMo-mix-0625 files are not found, directing users to use OLMo-mix-0925 and the fingerprint override flag.
- Added
olmo_core.generate.chatmodule to allow interacting with OlmoCore models without conversion to other formats. - Added
GAPMonitorCallbackfor monitoring gradients, activations, and parameters (GAP). - Added official Olmo 3 7B and 32B pretraining scripts and data mix.
- Added official Olmo 3 7B and 32B midtraining scripts and data mix.
- Added official Olmo 3 7B and 32B long-context scripts and data mix.
- Added a
NoOpOptimizerthat does nothing, uses no memory, and can be used for debugging. - Added official config for Olmo 3 32B.
- Olmo 3 model card and checkpoint manifests.
Fixed β
- Set missing
NCCL_NVLSTREE_MAX_CHUNKSIZEenv var that is now needed for running jobs on Augusta cluster. - Fixed bug with
RemoteFileSystemReaderthat caused excess memory usage. - No longer overrides
random's RNG seed when buildingSourceMixtureDatasetConfig. - Fix handling URLs in
olmo_core.nn.hf.checkpoint.save_hf_modeland inexamples/huggingface. - Fix potential NaN loss that can occur when using instance masking.
- Stability improvements developed while training Olmo3 32B.
Changed β οΈ
- Removed unused field in
YaRNRoPEScalingConfig.
Commits
1ed8900 (chore) prepare for release v2.4.0
2c179c2 (chore) prepare for release v2.4.0 (#467)
7e0431f Fix link to 7B midtrain script (#469)
843fe3d Olmo3 model cards, checkpoint manifest, and readme (#468)
cbdc2f1 Olmo3 32B cleanup and checkin (#460)
20548a0 Official Olmo3 32B long-context script (#465)
14b15cc Official Olmo3 32B midtrain script(s) and mix(es) (#466)
a25a514 Official Olmo3 32B pretrain config and data mix (#464)
6b73ba0 Official Olmo3-7B long context script (#458)
bdc61e4 Official Olmo3-7B midtraining scripts (#445)
55804bf 32B official config (#454)
a86131d Slight refactor of Yarn Scaling Config (#456)
68c7409 Handle target URLs properly in HF conversion (#453)
2504cc2 Instance mask correction to avoid nan loss (#452)
137274e Add callback to monitor grads, activations, params (#446)
0959a54 Improve mem usage of RemoteFileSystemReader (#451)
600d2fe Official Olmo3-7B pretraining scripts (#443)
accc310 make launch timeout configurable from CLI
aa0e629 Avoid overriding RNG seed when building SourceMixtureDatasetConfig (#449)
98ba2e4 NoOp optimizer (#444)
03e6836 OlmoCore native chat interface (#439)
7a0bbd7 unset 2 NCCL env vars per Google's recommendation
aacb6eb only send local Slack notifications when callback is enabled (#441)
bfc8d7a Min python version to 3.10 (#442)
96d43d4 Set missing NCCL_NVLSTREE_MAX_CHUNKSIZE env var (#440)
5ad6db5 hot fix for listing gcs dirs
2186957 Allow manual bypass of fingerprint mismatch when switching datasets (#435)
043505d hot fix to step regex
dd7e747 Send a Slack notification when a Beaker job appears to be stuck (#431)
e27a9b4 Add WSDS (Warmup-Stable-Decay-Simplified) Scheduler (#419)
c92320f Use a dataclass for 'Trainer.steps_to_skip' (#430)
9669268 clean up checkpointing code to minimize distributed communication (#428)
87d64b9 fix changelog
269bf02 Add option to skip ranges of steps in the trainer (#425)