You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Firstly, is my assumption correct, that the adding of -inf values in "pad_attn_bias_unsqueeze" has the same purpose as the attention_mask in BERT, so that there will be no attention to padded nodes?
If this is correct, why do you add +1 to x in the padding functions? As the attention is restricted not to attend there anyway, there could be any values in the padded nodes, so 0 could still be just as a regular feature value.
I talk about the padding like in
def pad_2d_unsqueeze(x, padlen):
x = x + 1 # pad id = 0 -> THIS LINE
xlen, xdim = x.size()
if xlen < padlen:
new_x = x.new_zeros([padlen, xdim], dtype=x.dtype)
new_x[:xlen, :] = x
x = new_x
return x.unsqueeze(0)
which is used to pad x.
The text was updated successfully, but these errors were encountered:
Hi,
I have a few questions about the node padding.
Firstly, is my assumption correct, that the adding of -inf values in "pad_attn_bias_unsqueeze" has the same purpose as the attention_mask in BERT, so that there will be no attention to padded nodes?
If this is correct, why do you add +1 to x in the padding functions? As the attention is restricted not to attend there anyway, there could be any values in the padded nodes, so 0 could still be just as a regular feature value.
I talk about the padding like in
which is used to pad x.
The text was updated successfully, but these errors were encountered: