Release v2.4.0 · allenai/OLMo-core

What's new

Added 🎉

Added option to skip ranges of steps in the trainer.
Send a Slack notification when a Beaker job appears to be stuck.
Added ignore_fingerprint_mismatch parameter to NumpyDataLoaderConfig to allow resuming training from a checkpoint with a different dataset mix.
Added helpful error messages when OLMo-mix-0625 files are not found, directing users to use OLMo-mix-0925 and the fingerprint override flag.
Added olmo_core.generate.chat module to allow interacting with OlmoCore models without conversion to other formats.
Added GAPMonitorCallback for monitoring gradients, activations, and parameters (GAP).
Added official Olmo 3 7B and 32B pretraining scripts and data mix.
Added official Olmo 3 7B and 32B midtraining scripts and data mix.
Added official Olmo 3 7B and 32B long-context scripts and data mix.
Added a NoOpOptimizer that does nothing, uses no memory, and can be used for debugging.
Added official config for Olmo 3 32B.
Olmo 3 model card and checkpoint manifests.

Fixed ✅

Set missing NCCL_NVLSTREE_MAX_CHUNKSIZE env var that is now needed for running jobs on Augusta cluster.
Fixed bug with RemoteFileSystemReader that caused excess memory usage.
No longer overrides random's RNG seed when building SourceMixtureDatasetConfig.
Fix handling URLs in olmo_core.nn.hf.checkpoint.save_hf_model and in examples/huggingface.
Fix potential NaN loss that can occur when using instance masking.
Stability improvements developed while training Olmo3 32B.

Changed ⚠️

Removed unused field in YaRNRoPEScalingConfig.

Commits

1ed8900 (chore) prepare for release v2.4.0
2c179c2 (chore) prepare for release v2.4.0 (#467)
7e0431f Fix link to 7B midtrain script (#469)
843fe3d Olmo3 model cards, checkpoint manifest, and readme (#468)
cbdc2f1 Olmo3 32B cleanup and checkin (#460)
20548a0 Official Olmo3 32B long-context script (#465)
14b15cc Official Olmo3 32B midtrain script(s) and mix(es) (#466)
a25a514 Official Olmo3 32B pretrain config and data mix (#464)
6b73ba0 Official Olmo3-7B long context script (#458)
bdc61e4 Official Olmo3-7B midtraining scripts (#445)
55804bf 32B official config (#454)
a86131d Slight refactor of Yarn Scaling Config (#456)
68c7409 Handle target URLs properly in HF conversion (#453)
2504cc2 Instance mask correction to avoid nan loss (#452)
137274e Add callback to monitor grads, activations, params (#446)
0959a54 Improve mem usage of RemoteFileSystemReader (#451)
600d2fe Official Olmo3-7B pretraining scripts (#443)
accc310 make launch timeout configurable from CLI
aa0e629 Avoid overriding RNG seed when building SourceMixtureDatasetConfig (#449)
98ba2e4 NoOp optimizer (#444)
03e6836 OlmoCore native chat interface (#439)
7a0bbd7 unset 2 NCCL env vars per Google's recommendation
aacb6eb only send local Slack notifications when callback is enabled (#441)
bfc8d7a Min python version to 3.10 (#442)
96d43d4 Set missing NCCL_NVLSTREE_MAX_CHUNKSIZE env var (#440)
5ad6db5 hot fix for listing gcs dirs
2186957 Allow manual bypass of fingerprint mismatch when switching datasets (#435)
043505d hot fix to step regex
dd7e747 Send a Slack notification when a Beaker job appears to be stuck (#431)
e27a9b4 Add WSDS (Warmup-Stable-Decay-Simplified) Scheduler (#419)
c92320f Use a dataclass for 'Trainer.steps_to_skip' (#430)
9669268 clean up checkpointing code to minimize distributed communication (#428)
87d64b9 fix changelog
269bf02 Add option to skip ranges of steps in the trainer (#425)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2.4.0

Choose a tag to compare

Sorry, something went wrong.