You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I train a pretraining model, the vocabulary is divided by the number of GPU. So I can't directly load it with origin model in downstream tasks.
How should I do?Thanks!
The text was updated successfully, but these errors were encountered:
The vocabulary size is divided by MP_SIZE (which is the model parallel size). Did you set MP_SIZE to the number of GPUs? To use the original model, MP_SIZE should be set to 1.
When I train a pretraining model, the vocabulary is divided by the number of GPU. So I can't directly load it with origin model in downstream tasks.
How should I do?Thanks!
The text was updated successfully, but these errors were encountered: