Skip to content

v2.3.0

Choose a tag to compare

@github-actions github-actions released this 17 Oct 16:22
· 185 commits to main since this release

What's new

Fixed ✅

  • Fixed parsing username+password git remote URLs in launch.beaker module.
  • Fixed bug with default setup steps in launch.beaker.BeakerLaunchConfig when a branch can't be resolved.
  • Cluster names in Beaker have changed.
  • Fixed mixture rounding error with SourceMixtureDataset, which was previously causing samples to be repeated at the end of training.
  • Don't DDOS Beaker from big jobs.
  • A configuration error is now raised if you pass in a URL for the trainer or dataset's working directory.
    Previously the URL would just get mangled into a local path, leading to unexpected behavior.
  • Fixed an issue where the ConsoleLoggerCallback would attempt to log before the first step.
  • Only call teardown_distributed_environment() when training ends cleanly to avoid a hang for the duration of the distributed backend's timeout when there's an error from one rank.
  • Fixed tensor parallelism issue with torch 2.8.
  • More fixes for Beaker cluster names.
  • Callback.post_train() will still be called even if the run is canceled before the dry-run batch.
  • GarbageCollectorCallback will restore gc settings even when Trainer.fit() exits on an error.
  • Make move_to_device blocking for MPS device to fix possible incorrect transfer of data from CPU to MPS.
  • Fixed bug where glob_directory() would fail to match certain glob patterns.
  • Added one more type of error to retry on when the Google Storage API throws it.
  • Perform a garbage collection after checkpointing to avoid running out of CPU memory.
  • Avoidable overflow error when using NumpyPackedFSLDataset.
  • Fixed issue with NumpyFSLDatasetMixture + SourceMixtureDataset where not all instances would have the same sequence length.
  • Attention backend will no longer default to flash in non-CUDA environments.

Changed ⚠️

  • The dir option to Trainer.maybe_load_checkpoint() is now optional and defaults to the save_folder.
  • Set fused_linear_cross_entropy_loss accum_dtype to fp32 in LMHead.
  • Increased NCCL_FASTRAK_PLUGIN_ACCEPT_TIMEOUT_MS from 10 minutes to 30 minutes.
  • SlackNotifierCallback will now notify on checkpoint saved and post epoch events.
  • BeakerLaunchConfig.launch() will now send Slack notifications by default when follow=True if the env var SLACK_WEBHOOK_URL is set.
  • src/examples/llama/ has been renamed to src/examples/llm/.
  • Refactored eval task groups into task_groups.py
  • The use_flash argument to the Attention classes is deprecated. Use backend="flash_2" instead.
  • Refactored NumpyDatasetConfig by splitting it into a separate config per underlying dataset class.
  • Refactored internal/experiment module to facilitate modifying datasets or supplying a fully custom ExperimentConfig.
  • Simplified SourceMixtureDatasetConfig by removing redundant sequence_length and dtype fields.
  • The model_id argument to convert_state_from_hf is deprecated. Conversion information is deduced from the model type.
  • Refactored the example conversion scripts to/from HF, including decreasing false failures in validation.
  • Small refactor to source_mixture.py to make it easier to define data mixes in yaml.
  • Reorganized/cleaned up internal training scripts.

Added 🎉

  • Added CLI script src/scripts/unshard.py for converting distributed checkpoints to regular PyTorch or safetensors format.
  • Added a custom block that does LayerNorm scaling.
  • Added OLMo-mix-0625-150Bsample data mix.
  • Added alias support to DataMix enum.
  • Added the HalfCos learning rate scheduler.
  • Added CONTRIBUTING.md guidelines.
  • Added a lightweight, gantry-like Beaker launch CLI: python -m olmo_core.launch.beaker.
  • Added Beaker images with torch 2.8. There is olmo-core-tch280cu128-2025-09-18 and olmo-core-tch280cu129-2025-09-18 for CUDA 12.8 and 12.9, respectively.
  • Added TransformerEngine to Docker images and a TransformerEngine attention backend.
  • Added Callback.close() method, which is always called when exiting Trainer.fit().
  • Added flash-attention 3 to Docker images, added flash_3 attention backend.
  • Added support for sliding window attention to the Torch attention backend. Performance is not optimized, so other backends should be preferred.
  • Added RoPEScalingConfig.to_hf_config() for each RoPE scaling method to support automatic conversion to HuggingFace format.
  • Guide to dataset mixing in docs/source/guides/data_mixing.rst.
  • Added support for converting FlexOlmo models (with both dropless and default MoEs) between OLMo Core and HF formats.
  • Added olmo3_7B model config.
  • Added additional internal configuration tools.
  • Added a new named data mix that we used for the 32B run
  • Added internal OLMo3 7B midtraining and long-context configs.
  • Added ability to convert OLMo3 models to/from HF format with support for rope scaling configs.
  • Added a script that can pull out a single training batch from a training job

Commits

5b32459 (chore) prepare for release v2.3.0
3f21b77 Reorganize internal scripts for Olmo3 (#423)
67dd1d6 Use exec to start torchrun (#422)
206d25e Cookbook migration part 2 - long context config (#404)
eabb869 Raise timeout error if training job doesn't start in 5 mins (#421)
cc69286 Script to dump training tokens (#418)
b5ba7be Script for unsharding (#420)
464d01e olmo3 conversion w/ support for rope scaling (#415)
3afa7dd Add Rope scaling configs to rope module's exports (#414)
3ef0c05 Cookbook migration part 1 - midtraining config (#403)
cdb7922 Add Dolma 3 sample, and a way to alias data mixes (#412)
77adc0b pull Slack webhook URL from secret if available (#413)
c8e32ab NumpyFSLDatasetMixture + SourceMixtureDataset fix (#411)
6ce62cc Fix OverflowError in pack_documents (#409)
e229c17 Cookbook migration part 0 - more setup (#397)
8eef6f3 Data mix for the 32B (#405)
9ba6154 Retry more on GS failures (#406)
73d11f8 Do GC after checkpointing (#407)
cdfd201 Typo
f477a8d Fix glob_directory bug (#402)
bc2a3f1 Support conversion of dropless MoE to FlexOlmo (#401)
7fc49de Support rope scaling configs for hf conversion (#394)
0f38dc6 HF Conversion Refactor (#390)
a53f825 source mixture dataset simplification and documentation (#399)
69bd9d2 MPS bug fixes (#395)
601d336 refactor internal experiment configuration assembler (#386)
874be5d Add flash_3 attention backend (#377)
3aac611 Remove old beaker refs (#393)
1aeb369 Add sliding window support for torch attention backend (#388)
7bebea3 Fully migrate to new cluster names, fix internal experiment launching on Augusta (#391)
1d4b67f Add flash-attn-3 install to dockerfile (#392)
04d5d5d Add Callback.close() method, other minor callback improvements (#389)
dcf3a0e Add attention backend abstraction and integrate transformer engine's attention (#384)
0356c1b Np Dataset Config Refactor (#381)
afb5827 Improve distributed error handling (#380)
45d2031 Consolidate CLI code in public scripts (#379)
cbcc735 Port task groups from cookbook to OlmoCore (#378)
71f0023 bump torch and other dependencies in our Docker build (#360)
b3c3b4e Add an all-in-one guide for researchers (#374)
65d411b CONTRIBUTING.md (#376)
e469561 Treat cordoned beaker hosts as occupied (#375)
75d4d3a Avoid DDOS-ing Beaker for real (#372)
258a7a8 fix
b068044 Set accum_grad to fp32 for fused_linear_cross_entropy_loss, bump liger-kernel version (#370)
b550f9b Add the ability to send Slack notifications from launch.beaker (#371)
801de4b Makes it possible to override the common config builder (#369)
59a2b2f Adds a new, highly specific LR scheduler (#368)
d4ad23f fix rounding error with mixing datasets (#316)
46f7211 [Feat] Add LNS training example (#320)
3d9e9cd redo base dir calculation for new cluster names (#364)
b852ec0 Up NCCL_FASTRAK_PLUGIN_ACCEPT_TIMEOUT_MS to 30 minutes (#365)
72b7c1b Moved google-cloud-compute dependency from dev to beaker group. (#363)
c00d715 catch issues with dolma metadata files earlier (#361)
fa5a5dc Refine hostname constraints for beaker experiments on Google clusters (#355)
386b0a8 Fix parsing username+password git remote URLs (#356)
6726ffc make release process more robust