v2.0.1
What's new
Added π
- Added information about the official 32B training run.
- Added automatic support for LL128 when running on Augusta.
Fixed β
- The official config for the 32B had unrealistic batch size settings.
- Ignore
group_overridesfor frozen parameters instead of throwing an error.
Removed π
- Removed the "fused" cross-entropy loss variant. It had a bug and consistently under-performed the native PyTorch version when compiled. See Post Incident Report: bug with fused CE loss for more information.
Commits
27b1ae8 (chore) prepare for release v2.0.1
79ebc7f Add hybrid MoE transformer architecture (#223)
bce2b5b authenticate with Docker Hub to avoid rate limits
b1e0bbd Remove fused CE loss, reorganize MoE kernels/ops (#221)
56e06ee Ignore group_overrides for frozen params (#219)
9d80e8d Update logo for README header. (#218)
974e555 fix some typos, consistent naming
45fe007 Updated documentation (#217)
51aedcf More working config (#216)
47b2ad5 add release PR comments back in