Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Likely bug in Alibi positional embeddings together with Tensor Parallelism. #227

Closed
DanielHesslow opened this issue Jan 7, 2022 · 4 comments
Assignees
Labels
bug Something isn't working Good First Issue Good for newcomers

Comments

@DanielHesslow
Copy link
Collaborator

The way alibi is implemented we first generate the alibi attention bias tensor

def _build_alibi_tensor(max_seq_len, num_attention_heads, batch_size):

and then extract the part of it:
matmul_result = alibi[:output_size[0]*output_size[1], :, :output_size[3]]

When we run with tensor parallelism, the attention heads will be distributed. We then need to extract the correct part of alibi tensor corresponding to our tensor parallel rank.

In order to fix this we need to:

  1. Create a test that makes sure that we get the same output with and without tensor parallelism.
  2. Extract the correct part of the alibi_tensor corresponding to our tensor_parallel rank.
@DanielHesslow DanielHesslow mentioned this issue Jan 7, 2022
@thomasw21 thomasw21 added Good First Issue Good for newcomers bug Something isn't working labels Jan 7, 2022
@thomasw21
Copy link
Member

@stas00 for visibility in case we're going for Alibi and TP > 1

@stas00
Copy link
Member

stas00 commented Jan 27, 2022

After yesterday's call we are definitely going for TP=8.

And it does sounds like we are going for Alibi - at least that's what @ibeltagy said in the channel.

@ibeltagy
Copy link
Member

Yes, we certainly want to use ALiBi in the final model, and we need to fix the ALiBi/TP bug.

@DanielHesslow
Copy link
Collaborator Author

Closing this since the fix has been unit tested and is now merged into main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Good First Issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

4 participants