Why vocabulary is divided by GPU number and how to load it? #21

Aurora-slz · 2022-07-05T07:02:03Z

When I train a pretraining model, the vocabulary is divided by the number of GPU. So I can't directly load it with origin model in downstream tasks.
How should I do?Thanks!

duzx16 · 2022-07-06T03:39:56Z

The vocabulary size is divided by MP_SIZE (which is the model parallel size). Did you set MP_SIZE to the number of GPUs? To use the original model, MP_SIZE should be set to 1.

SeonggwanAhn · 2022-07-06T13:35:39Z

In addition, you can refer to this for setting MP_SIZE.

Aurora-slz · 2022-07-07T11:47:09Z

Thanks! I used the change_mp.py to combine them into one and solve the problem.

duzx16 closed this as completed Jul 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why vocabulary is divided by GPU number and how to load it? #21

Why vocabulary is divided by GPU number and how to load it? #21

Aurora-slz commented Jul 5, 2022

duzx16 commented Jul 6, 2022

SeonggwanAhn commented Jul 6, 2022

Aurora-slz commented Jul 7, 2022

Why vocabulary is divided by GPU number and how to load it? #21

Why vocabulary is divided by GPU number and how to load it? #21

Comments

Aurora-slz commented Jul 5, 2022

duzx16 commented Jul 6, 2022

SeonggwanAhn commented Jul 6, 2022

Aurora-slz commented Jul 7, 2022