You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
As shown in the figure above, shared_embedding and other parameters are distinguished when building the bucket. When the data_end_index of the parameter before shared_embedding is not divisible by self.data_parallel_world_size, there will be a problem here.
To Reproduce
Expected behavior
Stack trace/logs
Environment (please complete the following information):
Megatron-LM commit ID :c3677e09aa4e2eec37048307bd795928b8f8324a
PyTorch version: 2.2.0a0+81ea7a4
CUDA version: -
NCCL version: -
Proposed fix
Additional context
The text was updated successfully, but these errors were encountered:
Describe the bug
![image](https://private-user-images.githubusercontent.com/39549453/333088788-c1e3ea24-e371-4818-9d9f-b916bb34e0fe.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE0MDA5OTgsIm5iZiI6MTcyMTQwMDY5OCwicGF0aCI6Ii8zOTU0OTQ1My8zMzMwODg3ODgtYzFlM2VhMjQtZTM3MS00ODE4LTlkOWYtYjkxNmJiMzRlMGZlLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MTklMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzE5VDE0NTEzOFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWYxYjZmYjA1MmZhMDgwMTBhZDk3ODE0ZGVhODE5YzExMmVkZGIyNGNlOWRjOWZjNjI1ZDQ0YzYxYmYwZGI3ZGMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.397nx30SD2kLcX8rSSnsWObnd1rCjTNiZLxMVCg-cfo)
As shown in the figure above,
shared_embedding
and other parameters are distinguished when building thebucket
. When thedata_end_index
of the parameter beforeshared_embedding
is not divisible byself.data_parallel_world_size
, there will be a problem here.To Reproduce
Expected behavior
Stack trace/logs
![image](https://private-user-images.githubusercontent.com/39549453/333090111-6eaa6826-ca3f-4c3d-ab1a-b25d6443c6d8.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE0MDA5OTgsIm5iZiI6MTcyMTQwMDY5OCwicGF0aCI6Ii8zOTU0OTQ1My8zMzMwOTAxMTEtNmVhYTY4MjYtY2EzZi00YzNkLWFiMWEtYjI1ZDY0NDNjNmQ4LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MTklMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzE5VDE0NTEzOFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTdjODU3Y2NhYjc2OWRhZDlhNGYzYTliYTc4OGQ0YmQ3YTY2YTUyNWU4MzQ2OGE4Zjk1ZjExNjAwNmIxM2Y5Y2QmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.HzSCM01Sy5nH9vU2lV3AxtsmI3qNhw13OsyXIHuS2yU)
Environment (please complete the following information):
Proposed fix
Additional context
The text was updated successfully, but these errors were encountered: