Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] The problems with bucket and shared_embedding. #835

Open
Baibaifan opened this issue May 23, 2024 · 2 comments
Open

[BUG] The problems with bucket and shared_embedding. #835

Baibaifan opened this issue May 23, 2024 · 2 comments

Comments

@Baibaifan
Copy link

Describe the bug
image
As shown in the figure above, shared_embedding and other parameters are distinguished when building the bucket. When the data_end_index of the parameter before shared_embedding is not divisible by self.data_parallel_world_size, there will be a problem here.

To Reproduce

Expected behavior

Stack trace/logs
image

Environment (please complete the following information):

  • Megatron-LM commit ID :c3677e09aa4e2eec37048307bd795928b8f8324a
  • PyTorch version: 2.2.0a0+81ea7a4
  • CUDA version: -
  • NCCL version: -

Proposed fix

Additional context

@wangxicoding
Copy link
Contributor

I fixed it, but the megatron has not responded👀
#762

@Baibaifan
Copy link
Author

I fixed it, but the megatron has not responded👀 #762

It’s so awesome, I must give u a Turing Award.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants