Confusion about some codes #15

DexterZeng · 2018-12-05T15:12:18Z

Hi, Thank you for sharing the codes!
I find the codes below (in def sp_attn_head.py) is a bit hard to understand, since they are not directly corresponding to the original description in text:

f_1 = tf.layers.conv1d(seq_fts, 1, 1)
f_2 = tf.layers.conv1d(seq_fts, 1, 1)

f_1 = tf.reshape(f_1, (nb_nodes, 1))
f_2 = tf.reshape(f_2, (nb_nodes, 1))

f_1 = adj_mat*f_1
f_2 = adj_mat * tf.transpose(f_2, [1,0])

logits = tf.sparse_add(f_1, f_2)

lrelu = tf.SparseTensor(indices=logits.indices, 
values=tf.nn.leaky_relu(logits.values), 
dense_shape=logits.dense_shape)

coefs = tf.sparse_softmax(lrelu)

seq_fts = tf.layers.conv1d(seq, out_sz, 1, use_bias=False) can be regarded as multiplying a weight matrix W (converting F features to F’ features). However, I do not understand the following steps:

f_1 = tf.layers.conv1d(seq_fts, 1, 1)
f_2 = tf.layers.conv1d(seq_fts, 1, 1)
f_1 = adj_mat*f_1
f_2 = adj_mat * tf.transpose(f_2, [1,0])
logits = tf.sparse_add(f_1, f_2)

I am aware that the codes are different since it is conducted on a “matrix” level instead of node level. But I cannot see how the attention mechanism is achieved by these steps. Can you help explain a bit?

Many thanks,
Weixin.

The text was updated successfully, but these errors were encountered:

PetarV- · 2018-12-05T16:47:18Z

Hi Weixin,

Thanks for your interest in the GAT work and for reaching out. I'll try to elaborate on those lines in more detail.

As our layer is a single-layer NN, we can decompose it from W[h_i||h_j] to W_1h_i + W_2h_j (with W_1 and W_2 being separate linear transformations).

As W_1 and W_2 are pointwise applied to each node, we may express them as a 1D convolution with kernel size 1 across the node feature matrix. Applying these two convolutions gives us f_1 (which is W_1h_i) and f_2 (which is W_2h_j).

In the dense GAT layer (https://github.com/PetarV-/GAT/blob/master/utils/layers.py#L13-L17), we compute W_1h_i + W_2h_j for all (i, j) pairs, by simply adding f_1 to transpose(f_2). This implies adding something of shape (batch, nodes, 1) to something of shape (batch, 1, nodes), and through the power of broadcasting we obtain the full (unnormalised) attention coefficient matrix of shape (batch, nodes, nodes), with TensorFlow automatically performing this sum for all pairs of positions. You can then normalise the coefficients by applying LeakyReLU, bias and softmax.

For the sparse layer, the idea is pretty much the same, only we do not wish to use more memory than is necessary. This is why we multiply the values with the sparse matrix's entries, to preserve only the positions i and j that are actually to be used.

Does that help? Feel free to reach out if you need more info.

Thanks,
Petar

DexterZeng · 2018-12-06T02:18:25Z

Hi Petar:

Thank you for your detailed explanation, which helps a lot!

Many thanks,
Weixin.

KnightOfTheMoonlight · 2019-10-02T13:43:15Z

Hi, Petar, thanks for your explanation! However I have some questions about the bias matrix. How to generate a bias matrix, what's the physical meaning of bias, and why you choose this generation setting?
This looks like the default bias matrix generation for dense adjacency matrix.

GAT/utils/process.py

Line 14 in ea3aeaf

def adj_to_bias(adj, sizes, nhood=1):

GAT/utils/process.py

Line 25 in ea3aeaf

return -1e9 * (1.0 - mt)

def adj_to_bias(adj, sizes, nhood=1):
    nb_graphs = adj.shape[0]
    mt = np.empty(adj.shape)
    for g in range(nb_graphs):
        mt[g] = np.eye(adj.shape[1])
        for _ in range(nhood):
            mt[g] = np.matmul(mt[g], (adj[g] + np.eye(adj.shape[1])))
        for i in range(sizes[g]):
            for j in range(sizes[g]):
                if mt[g][i][j] > 0.0:
                    mt[g][i][j] = 1.0
    return -1e9 * (1.0 - mt)

songhuiming · 2020-04-28T03:18:39Z

Hi, Petar, thanks for your explanation! However I have some questions about the bias matrix. How to generate a bias matrix, what's the physical meaning of bias, and why you choose this generation setting?
This looks like the default bias matrix generation for dense adjacency matrix.

GAT/utils/process.py

Line 14 in ea3aeaf

def adj_to_bias(adj, sizes, nhood=1):

GAT/utils/process.py

Line 25 in ea3aeaf

return -1e9 * (1.0 - mt)
def adj_to_bias(adj, sizes, nhood=1):
    nb_graphs = adj.shape[0]
    mt = np.empty(adj.shape)
    for g in range(nb_graphs):
        mt[g] = np.eye(adj.shape[1])
        for _ in range(nhood):
            mt[g] = np.matmul(mt[g], (adj[g] + np.eye(adj.shape[1])))
        for i in range(sizes[g]):
            for j in range(sizes[g]):
                if mt[g][i][j] > 0.0:
                    mt[g][i][j] = 1.0
    return -1e9 * (1.0 - mt)

until mt[g][i][j] = 1.0, it is to add self-loop to the adj matrix.

DexterZeng closed this as completed Dec 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confusion about some codes #15

Confusion about some codes #15

DexterZeng commented Dec 5, 2018

PetarV- commented Dec 5, 2018

DexterZeng commented Dec 6, 2018

KnightOfTheMoonlight commented Oct 2, 2019

songhuiming commented Apr 28, 2020

Confusion about some codes #15

Confusion about some codes #15

Comments

DexterZeng commented Dec 5, 2018

PetarV- commented Dec 5, 2018

DexterZeng commented Dec 6, 2018

KnightOfTheMoonlight commented Oct 2, 2019

songhuiming commented Apr 28, 2020