Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to retrieve the attention weights of a specific node? #608

Closed
sgdantas opened this issue Jun 5, 2019 · 3 comments

Comments

@sgdantas
Copy link

commented Jun 5, 2019

Hey!
I was wondering if it's possible to retrieve the attention weights of a specific node. By printing the alpha dimension I see that the attention is batched (Number of nodes with same degree x degree x 1)

def reduce_func(self, nodes):
        # reduce UDF for equation (3) & (4)
        # equation (3)
        alpha = F.softmax(nodes.mailbox['e'], dim=1)
        print(alpha.shape)
        # equation (4)
        h = torch.sum(alpha * nodes.mailbox['z'], dim=1)
        return {'h': h}

However, a node might be connected to other nodes with different degrees. I wanted to retrieve the attention that other nodes pay to a specific node, is that possible?

Thanks!

@mufeili

This comment has been minimized.

Copy link
Member

commented Jun 5, 2019

It's possible. Take our PyTorch GAT Implementation as an example. The edge attentions are stored in g.edata['a_drop']. We can fetch the attentions as follows:

from scipy.sparse import lil_matrix

def preprocess_attention(edge_atten, g, to_normalize=True):
    """Organize attentions in the form of csr sparse adjacency
    matrices from attention on edges. 

    Parameters
    ----------
    edge_atten : numpy.array of shape (# edges, # heads, 1)
        Un-normalized attention on edges.
    g : dgl.DGLGraph.
    to_normalize : bool
        Whether to normalize attention values over incoming
        edges for each node.
    """
    n_nodes = g.number_of_nodes()
    num_heads = edge_atten.shape[1]
    all_head_A = [lil_matrix((n_nodes, n_nodes)) for _ in range(num_heads)]
    for i in range(n_nodes):
        predecessors = list(g.predecessors(i))
        edges_id = g.edge_ids(predecessors, i)
        for j in range(num_heads):
            all_head_A[j][i, predecessors] = edge_atten[edges_id, j, 0].data.cpu().numpy()
    if to_normalize:
        for j in range(num_heads):
            all_head_A[j] = normalize(all_head_A[j], norm='l1').tocsr()
    return all_head_A

# Take the attention from one layer as an example
# num_edges x num_heads x 1
A = self.g.edata['a_drop']
# list of length num_heads, each entry is csr of shape (num_nodes, num_nodes)       
A = preprocess_attention(A, self.g)       

Now A[h][i, j] will give you the attention of edge j -> i in head h. For non-existing edges this value will be zero.

@sgdantas

This comment has been minimized.

Copy link
Author

commented Jun 5, 2019

Thank you, it works!

@sgdantas sgdantas closed this Jun 5, 2019

@mufeili

This comment has been minimized.

Copy link
Member

commented Jun 5, 2019

Glad to know that @sgdantas F :). Let me know if you have any follow up questions. Also for questions like this we encourage users to post on our discussion forum so that more users can be benefited.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.