Skip to content

Conversation

@thomasw21
Copy link
Member

Script to reproduce diverging layer_norm weights

adammoody pushed a commit to adammoody/Megatron-DeepSpeed that referenced this pull request Dec 18, 2023
* Enable universal ckpting

* Update run scripts

* Address PR feedback

* Remove line

* Fix white lines

* Remove redudant changes

* Apply to gpt_model only

* Code cleanup

* Code cleanup

* Update training.py

Co-authored-by: Michael Wyatt <mrwyattii@gmail.com>

* Update training.py

Co-authored-by: Michael Wyatt <mrwyattii@gmail.com>

* Log loss_scale only valid for fp16

* Add README and bf16 scripts

* Visualization docsts

* Support older DS

---------

Co-authored-by: Michael Wyatt <mrwyattii@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants