Support for Megatron-VLM training #806

1049451037 · 2024-05-05T14:19:31Z

In this pull request, we open source our solution for visual-language model training and inference in pure Megatron style code. In this codebase, we support:

Megatron ViT model, and its model weight converter.
Uneven split of pipeline parallel when the first pipeline has ViT. We find it speed up training with a large margin.
Sequence parallel and context parallel support for VLM training (for both ViT & LM), which is non-trivial when we need to promise the ViT of all ranks receiving gradients. (Since sp and cp split sequence, some of ranks only contains text tokens.)
Detached pp size for ViT and GPT. (Since megatron use a global mpu for all models.)
Multi-modal inference code.

The running example is in examples/llava folder.

Hope that our work can contribute to the open source community. If there are any questions, welcome feedback!

jon-barker · 2024-05-07T21:56:29Z

Hi. Thanks for creating this PR. We (NVIDIA) are actually planning to release VLM training functionality in Megatron core in the next couple of weeks. As you may have seen, we've been pushing out some preparatory code to support this. Our initial example release is going to be pretraining and SFT for a llava architecture model using llama3 and clip backbones and a general multimodal webdataset based dataloader. We're reviewing your PR internally to see if we can incorporate any of your work alongside ours and will be sure to credit you as such if we do.

Thanks again!

1049451037 · 2024-05-08T02:30:20Z

Thank you for your attention! Looking forward to the official implementation!

wangxiang2713 · 2024-06-13T08:29:35Z

Thank you for your attention! Looking forward to the official implementation!

Hello, i have a question about this PR: how will vit and llm split in PP stage with independent_parallel = True? Thank you!

1049451037 · 2024-06-13T08:48:22Z

@wangxiang2713 ViT will be in the first stage of LM.

felipeliliti · 2024-06-13T10:12:51Z

Me fale mais suas dúvidas Em qui, 13 de jun de 2024 05:48, Qingsong Lv ***@***.***> escreveu:

…

@wangxiang2713 <https://github.com/wangxiang2713> ViT will be in the first stage of LM. — Reply to this email directly, view it on GitHub <#806 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BH4G62IYR7X62XTHLIL7OUTZHFMG3AVCNFSM6AAAAABHHYOBIGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRVGAZDSMJSHE> . You are receiving this because you commented.Message ID: ***@***.***>

1049451037 added 10 commits April 25, 2024 14:48

support VLM training

2846d56

support vlm uneven split pipeline & inference

bd515f7

megatron vit

dc13771

update sat loader

9c2211d

update convert

84e2a1c

add llava example

2b6a3d1

merge main

d54122e

fix mpu call

7ef7d6f

support sp & cp for vit

ea9fb53

add image args

9eaa07e

felipeliliti approved these changes May 6, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Megatron-VLM training #806

Support for Megatron-VLM training #806

1049451037 commented May 5, 2024 •

edited

jon-barker commented May 7, 2024

1049451037 commented May 8, 2024

wangxiang2713 commented Jun 13, 2024

1049451037 commented Jun 13, 2024

felipeliliti commented Jun 13, 2024 via email

Support for Megatron-VLM training #806

Are you sure you want to change the base?

Support for Megatron-VLM training #806

Conversation

1049451037 commented May 5, 2024 • edited

jon-barker commented May 7, 2024

1049451037 commented May 8, 2024

wangxiang2713 commented Jun 13, 2024

1049451037 commented Jun 13, 2024

felipeliliti commented Jun 13, 2024 via email

1049451037 commented May 5, 2024 •

edited