You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When we run with tensor parallelism, the attention heads will be distributed. We then need to extract the correct part of alibi tensor corresponding to our tensor parallel rank.
In order to fix this we need to:
Create a test that makes sure that we get the same output with and without tensor parallelism.
Extract the correct part of the alibi_tensor corresponding to our tensor_parallel rank.
The text was updated successfully, but these errors were encountered:
The way alibi is implemented we first generate the alibi attention bias tensor
Megatron-DeepSpeed/megatron/model/transformer.py
Line 591 in 7ab5c05
and then extract the part of it:
Megatron-DeepSpeed/megatron/model/transformer.py
Line 291 in 7ab5c05
When we run with tensor parallelism, the attention heads will be distributed. We then need to extract the correct part of alibi tensor corresponding to our tensor parallel rank.
In order to fix this we need to:
The text was updated successfully, but these errors were encountered: