Skip to content

[integrating megablocks with open_lm] Question about megablocks + FSDP #57

@kernelmachine

Description

@kernelmachine

Hello! I'm trying to integrate megablocks with our open source LLM training library (open_lm), which uses native torch FSDP.

For some reason I am consistently seeing worse performance than my dense baselines at the same compute budget. I'm a bit stumped as to why, and I was wondering if you could provide any pointers on things to watch out for wrt integrations.

To integrate your library:

Am I missing anything else?

One hypothesis I have is that something is going wrong when I use Megablocks with FSDP.

Here is our FSDP wrapper:

https://github.com/mlfoundations/open_lm/blob/main/open_lm/main.py#L454-L462

Is there anything I need to change in the FSDP arguments to make sure FSDP doesn't interfere with all2alls? Currently it wraps the Transformer Block module, of which the MoE is a part of.

Thanks for your help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions