Support embedding & Test auto-sharding on the whole BERT model & Refine auto-sharding interface #49

merrymercy · 2021-07-06T22:38:22Z

Monkey patch the nn.Embed in flax to use one-hot + matmul instead of gather/scatter
Test auto-sharing solver on the whole BERT model (copied from huggingface). The result (transformer + embedding) has exactly the same partition strategy and communication cost as Megatron-LM's solution. I will do some benchmark in the next PR.
Refine the auto-sharding interface to better fit Combining Manual Pipeline Parallelism & Automatic SPMD Parallelism #46

Dependency:
need to update flax>=0.3.4

merrymercy force-pushed the whole-bert branch from 3d930cd to 56de26c Compare July 6, 2021 23:01

merge into one commit

0099f59

merrymercy force-pushed the whole-bert branch from 39c2bd8 to 0099f59 Compare July 10, 2021 06:31

clean the interface of auto-sharding

59837af

merrymercy changed the title ~~Support embedding & Test auto-sharding on the whole BERT model~~ Support embedding & Test auto-sharding on the whole BERT model & Refine auto-sharding interface Jul 10, 2021

rename

d185b2b

merrymercy merged commit 47fb454 into master Jul 10, 2021

merrymercy deleted the whole-bert branch July 10, 2021 21:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support embedding & Test auto-sharding on the whole BERT model & Refine auto-sharding interface #49

Support embedding & Test auto-sharding on the whole BERT model & Refine auto-sharding interface #49

merrymercy commented Jul 6, 2021 •

edited

Loading

Support embedding & Test auto-sharding on the whole BERT model & Refine auto-sharding interface #49

Support embedding & Test auto-sharding on the whole BERT model & Refine auto-sharding interface #49

Conversation

merrymercy commented Jul 6, 2021 • edited Loading

merrymercy commented Jul 6, 2021 •

edited

Loading