Likely bug in Alibi positional embeddings together with Tensor Parallelism. #227

DanielHesslow · 2022-01-07T16:18:45Z

The way alibi is implemented we first generate the alibi attention bias tensor

Megatron-DeepSpeed/megatron/model/transformer.py

Line 591 in 7ab5c05

def _build_alibi_tensor(max_seq_len, num_attention_heads, batch_size):

and then extract the part of it:

Megatron-DeepSpeed/megatron/model/transformer.py

Line 291 in 7ab5c05

matmul_result = alibi[:output_size[0]*output_size[1], :, :output_size[3]]

When we run with tensor parallelism, the attention heads will be distributed. We then need to extract the correct part of alibi tensor corresponding to our tensor parallel rank.

In order to fix this we need to:

Create a test that makes sure that we get the same output with and without tensor parallelism.
Extract the correct part of the alibi_tensor corresponding to our tensor_parallel rank.

thomasw21 · 2022-01-26T23:48:36Z

@stas00 for visibility in case we're going for Alibi and TP > 1

stas00 · 2022-01-27T00:42:38Z

After yesterday's call we are definitely going for TP=8.

And it does sounds like we are going for Alibi - at least that's what @ibeltagy said in the channel.

ibeltagy · 2022-01-27T00:44:42Z

Yes, we certainly want to use ALiBi in the final model, and we need to fix the ALiBi/TP bug.

DanielHesslow · 2022-02-01T15:48:37Z

Closing this since the fix has been unit tested and is now merged into main.

DanielHesslow mentioned this issue Jan 7, 2022

Fix alibi #222

Merged

thomasw21 added Good First Issue Good for newcomers bug Something isn't working labels Jan 7, 2022

thomasw21 assigned DanielHesslow and thomasw21 Jan 27, 2022

DanielHesslow mentioned this issue Jan 31, 2022

Alibi Tensor Parallel Fix #244

Merged

DanielHesslow closed this as completed Feb 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Likely bug in Alibi positional embeddings together with Tensor Parallelism. #227

Likely bug in Alibi positional embeddings together with Tensor Parallelism. #227

DanielHesslow commented Jan 7, 2022

thomasw21 commented Jan 26, 2022

stas00 commented Jan 27, 2022

ibeltagy commented Jan 27, 2022

DanielHesslow commented Feb 1, 2022

Likely bug in Alibi positional embeddings together with Tensor Parallelism. #227

Likely bug in Alibi positional embeddings together with Tensor Parallelism. #227

Comments

DanielHesslow commented Jan 7, 2022

thomasw21 commented Jan 26, 2022

stas00 commented Jan 27, 2022

ibeltagy commented Jan 27, 2022

DanielHesslow commented Feb 1, 2022