Usage of alpha #10

alibabadoufu · 2019-10-13T10:08:10Z

Hi,

May I know if we need to define a new trainable parameter for each head per layer for the alpha value? Could anyone be kind enough to show a simple example of how it could be used in normal transformer?

Thanks!

goncalomcorreia · 2019-10-14T12:02:46Z

Hi,

We did define a trainable parameter for each head per layer. Here's a small code snippet that we used:

class AlphaChooser(torch.nn.Module):

    def __init__(self, head_count):
        """head_count (int): number of attention heads"""
        super(AlphaChooser, self).__init__()
        self.pre_alpha = nn.Parameter(torch.randn(head_count))

    def forward(self):
        alpha = 1 + torch.sigmoid(self.pre_alpha)
        return torch.clamp(alpha, min=1.01, max=2)

However, it's possible to have a single alpha per layer, or per transformer block!

alibabadoufu · 2019-10-14T13:29:27Z

Thanks so much! @goncalomcorreia
Nice work though!

hihihihiwsf · 2019-12-19T08:23:10Z

Hello!

How can I compute the entmax_bisect when the size of alpha is larger than 1 ?
When I use this:
p_attn = entmax_bisect(x, alpha, n_iter=25)
where alpha=tensor([1.3691, 1.5766, 1.7588, 1.9206],grad_fn=), the shape of x is [batch_size, 4, d,d].
There arises the error that
The expanded size of the tensor (1) must match the existing size (4) at non-singleton dimension 3.

alibabadoufu closed this as completed Oct 14, 2019

bpopeters mentioned this issue Feb 10, 2020

entmax_bisect leads to loss becoming nan #16

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Usage of alpha #10

Usage of alpha #10

alibabadoufu commented Oct 13, 2019

goncalomcorreia commented Oct 14, 2019 •

edited

alibabadoufu commented Oct 14, 2019

hihihihiwsf commented Dec 19, 2019

Usage of alpha #10

Usage of alpha #10

Comments

alibabadoufu commented Oct 13, 2019

goncalomcorreia commented Oct 14, 2019 • edited

alibabadoufu commented Oct 14, 2019

hihihihiwsf commented Dec 19, 2019

goncalomcorreia commented Oct 14, 2019 •

edited