v0.1.2
What's Changed
- Add architecture ablation toggles: mask dropout, L2 QKNorm, residual gates, GEGLU, gated attention by @briney in #7
- Implement ResFormer value residual connections by @briney in #8
Full Changelog: v0.1.1...v0.1.2
Full Changelog: v0.1.1...v0.1.2