bug fix - alibi causal: for ith query, bias is m · [−(i − 1), ..., −2, −1, 0] for the first i keys #7105

LydiaXiaohongLi · 2023-07-25T13:22:26Z

What does this PR do ?

Fix alibi position embedding for causal attention.

Collection: NLP

Changelog

Existing: returns (1, num_heads, 1, key_length)
Fixed: returns (1, num_heads, query_length, key_length), where for ith query, the bias is m · [−(i − 1), ..., −2, −1, 0] for the first i keys, as per the alibi paper

PR Type:

Bugfix

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

…, −1, 0] for the first i keys

hsiehjackson · 2023-07-27T20:56:25Z

If it is causal attention, we can use a singleton trick to the alibi attention bias. We don't need (1, num_heads, query_length, key_length) for our bias, but we only need (1, num_heads, 1, key_length) because after softmax they are all the same.
The original bias is the following if we have length 5:

[ 0,  0,  0,  0, 0]
[-1,  0,  0,  0, 0]
[-2, -1,  0,  0, 0]
[-3, -2, -1,  0, 0]
[-4, -3, -2, -1, 0]

A singleton trick bias can be the following:

[-4, -3, -2, -1, 0]
[-4, -3, -2, -1, 0]
[-4, -3, -2, -1, 0]
[-4, -3, -2, -1, 0]
[-4, -3, -2, -1, 0]

With a causal mask will be the following. You can find after softmax, it is the same if we use singleton trick.

[ 0,  0,  0,  0, 0]
[-4,  0,  0,  0, 0]
[-4, -3,  0,  0, 0]
[-4, -3, -2,  0, 0]
[-4, -3, -2, -1, 0]

For more details, you can find the authors's code here.
Code: https://github.com/ofirpress/attention_with_linear_biases/blob/master/fairseq/models/transformer.py#L760-L762
Discussion: ofirpress/attention_with_linear_biases#5

We can find a little difference from the author's singleton trick is :

[0, 1, 2, 3, 4, 5, ..., n]

And our implementation is:

[-n, ..., -5, -4, -3, -2, -1, 0]

LydiaXiaohongLi · 2023-07-28T04:52:50Z

Thank you so much!

bug fix - alibi causal: for ith query, bias is m · [−(i − 1), ..., −2…

62db148

…, −1, 0] for the first i keys

yzhang123 requested a review from hsiehjackson July 27, 2023 20:35

LydiaXiaohongLi closed this Jul 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug fix - alibi causal: for ith query, bias is m · [−(i − 1), ..., −2, −1, 0] for the first i keys #7105

bug fix - alibi causal: for ith query, bias is m · [−(i − 1), ..., −2, −1, 0] for the first i keys #7105

LydiaXiaohongLi commented Jul 25, 2023

hsiehjackson commented Jul 27, 2023

LydiaXiaohongLi commented Jul 28, 2023

bug fix - alibi causal: for ith query, bias is m · [−(i − 1), ..., −2, −1, 0] for the first i keys #7105

bug fix - alibi causal: for ith query, bias is m · [−(i − 1), ..., −2, −1, 0] for the first i keys #7105

Conversation

LydiaXiaohongLi commented Jul 25, 2023

What does this PR do ?

Changelog

Who can review?

Additional Information

hsiehjackson commented Jul 27, 2023

LydiaXiaohongLi commented Jul 28, 2023