Integrate FlashAttention into Megatron-LM #267

tridao · 2022-12-09T02:07:21Z

We add an option to use FlashAttention in Megatron-LM.
Cc @ekelsen

bryancatanzaro · 2022-12-09T02:13:00Z

🎉

Thank you @tridao for sending this in!

mayank31398 · 2022-12-11T01:14:08Z

Can't wait :)

jaredcasper · 2023-01-03T19:42:18Z

Thanks for the PR! Sorry for the delay, I went on an extended holiday break. :) Reviewing this now.

megatron/model/transformer.py

bryancatanzaro · 2023-01-12T00:40:05Z

🎉

* Enable universal ckpting * Update run scripts * Address PR feedback * Remove line * Fix white lines * Remove redudant changes * Apply to gpt_model only * Code cleanup * Code cleanup * Update training.py Co-authored-by: Michael Wyatt <mrwyattii@gmail.com> * Update training.py Co-authored-by: Michael Wyatt <mrwyattii@gmail.com> * Log loss_scale only valid for fp16 * Add README and bf16 scripts * Visualization docsts --------- Co-authored-by: Michael Wyatt <mrwyattii@gmail.com>

Integrate FlashAttention into Megatron-LM

d693034

NouamaneTazi mentioned this pull request Dec 12, 2022

Add FlashAttention bigscience-workshop/Megatron-DeepSpeed#357

Open

jaredcasper reviewed Jan 3, 2023

View reviewed changes

megatron/model/transformer.py Outdated Show resolved Hide resolved

Remove FA's check for headdim <= 128

9200e43

jaredcasper merged commit c92f10b into NVIDIA:main Jan 11, 2023

RaymondLi0 mentioned this pull request Mar 24, 2023

Add flash-attn bigcode-project/Megatron-LM#41

Merged

leocnj mentioned this pull request Jul 24, 2023

[Feature request] Request to Update Forked Megatron-LM Repository with Flash-Attention Improvement huggingface/accelerate#1766

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate FlashAttention into Megatron-LM #267

Integrate FlashAttention into Megatron-LM #267

tridao commented Dec 9, 2022

bryancatanzaro commented Dec 9, 2022

mayank31398 commented Dec 11, 2022

jaredcasper commented Jan 3, 2023

bryancatanzaro commented Jan 12, 2023

Integrate FlashAttention into Megatron-LM #267

Integrate FlashAttention into Megatron-LM #267

Conversation

tridao commented Dec 9, 2022

bryancatanzaro commented Dec 9, 2022

mayank31398 commented Dec 11, 2022

jaredcasper commented Jan 3, 2023

bryancatanzaro commented Jan 12, 2023