New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Improve shard args in decentralized runtime #321

Merged

ZYHowell merged 4 commits into main from fix-attention-mask

Feb 4, 2022

Collaborator

ZYHowell commented Feb 2, 2022

When using benchmark with dummy input, we now create and shard nonzero buffers on mesh workers instead of drivers. This makes 3x speedup, because eval_shape is surprisingly costly;
When an arg is a batch_arg and is split into microbatches, we now shard it to workers first and split it into microbatches on workers then. This makes #microbatchx speedup in most cases (the tensor is small but the number of microbatches is large).

ZYHowell added 4 commits

January 28, 2022 01:40


          shard dummy buffers on worker

d77b2cf


          Happy Lunar New Year

6fda3aa


          add function in decentralized rt

7963bdb


          delete original data in batch

f5566e6

ZYHowell merged commit 4cc3083 into main

ZYHowell deleted the fix-attention-mask branch

February 4, 2022 01:46

ZYHowell mentioned this pull request

attention_mask should not be sent repeatedly #177

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment