You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I find in function attn_head() (in utils/layers.py)
'''
simplest self-attention possible
f_1 = tf.layers.conv1d(seq_fts, 1, 1)
f_2 = tf.layers.conv1d(seq_fts, 1, 1)
logits = f_1 + tf.transpose(f_2, [0, 2, 1])
coefs = tf.nn.softmax(tf.nn.leaky_relu(logits) + bias_mat)
'''
In my understanding,the codes equals to $$f_1 W_1 + f_2 W_2$$
but in the paper, the chose attention mechanism use concatenation, and $$W_1 = W_2 = W$$
Did I get something wrong?
The text was updated successfully, but these errors were encountered:
Hello, I have encountered the same problem as you.
Also, in ./utils/layers.py, I didn't understand how the code calculates the correlation between f_1 and f_2.
I have seen the code for the pytorch version, and I think the implementation of the two is different in this place.
The pytorch version of the code is sensitive to the choice of random seeds, and switching to a different random seed can make the results very different.
The way in which attention heads are implemented here is exactly equivalent to the one in the paper, and it uses TensorFlow broadcasting semantics heavily.
I find in function attn_head() (in utils/layers.py)
'''
simplest self-attention possible
f_1 = tf.layers.conv1d(seq_fts, 1, 1)$$f_1 W_1 + f_2 W_2$$
$$W_1 = W_2 = W$$
f_2 = tf.layers.conv1d(seq_fts, 1, 1)
logits = f_1 + tf.transpose(f_2, [0, 2, 1])
coefs = tf.nn.softmax(tf.nn.leaky_relu(logits) + bias_mat)
'''
In my understanding,the codes equals to
but in the paper, the chose attention mechanism use concatenation, and
Did I get something wrong?
The text was updated successfully, but these errors were encountered: