Extract features from Graph attention network #40

monk1337 · 2019-11-19T12:39:40Z

I am trying to extract only features from graph attention network, I was using Gcn as feature extractor and I want to replace it with GAT

gc1 = GraphConvolution( input_dim = 300,  output_dim = 1024, 'first_layer')( features_matrix, adj_matrix )
gc2 = GraphConvolution(input_dim = 1024, output_dim = 10 ,'second_layer') (gc1, adj_matrix)

Where GraphConvolution layer is defined as :

class GraphConvolution():
    """Basic graph convolution layer for undirected graph without edge labels."""
    def __init__(self, input_dim, output_dim, name, dropout=0., act=tf.nn.relu):
        self.name = name
        self.vars = {}

        with tf.variable_scope(self.name + '_vars'):
            self.vars['weights'] = weight_variable_glorot(input_dim, output_dim, name='weights')
        self.dropout = dropout
        
        self.act = act

    def __call__(self, inputs, adj):
        
        with tf.name_scope(self.name):        
            x = inputs
            x = tf.nn.dropout(x, 1 - self.dropout)
            x = tf.matmul(x, self.vars['weights'])
            x = tf.matmul(adj, x)
            outputs = self.act(x)
        return outputs

Now to replace gcn layer with GAT, I tried this :

from gat import GAT

# Because Gat is accepting 3d input [ batch, node, features ]
features_matrix     = tf.expand_dims(features_matrix, axis = 0)
adj_matrix          = tf.expand_dims(adj_matrix, axis = 0)

gat_logits = GAT.inference( inputs = features_matrix, 
                                 nb_classes  = 10, 
                                 nb_nodes    = 22, 
                                 training    = True,
                                 attn_drop   = 0.0, 
                                 ffd_drop    = 0.0,
                                 bias_mat    = adj_matrix,
                                 hid_units   = [8], 
                                 n_heads     = [8, 1],
                                 residual    = False, 
                                 activation  = tf.nn.elu)

Now I want to get just the logits from GAT as features and it should learn the features too, so I set training = True

But the accuracy from GCN features I was getting around 90% but in GAT features I am not able to get accuracy more than 80 %, instead, it should increase the accuracy compared to GCN.

Is there anything I am missing in the network or my hyperparameters are not correct to compare to the hyperparameters i was using in GCN.

@PetarV- @gcucurull Can you suggest me how I can extract feature from GAT and if I am doing correct way then why I am not getting good accuracy.

Thank you

The text was updated successfully, but these errors were encountered:

gcucurull · 2019-11-19T13:38:25Z

Looking at your GCN, it looks like the first layer has a hidden size of 1024 units, whereas for GAT you have set hid_units to [8]. This means that the first layer has 8 heads with a hidden size of size 8 each, which is much smaller than the hidden size used in the GCN.

You can try changing hid_units to [128] or [256], which will increase the hidden size of each head in the first layer, increasing the capacity of the model.

monk1337 · 2019-11-20T02:09:08Z

@gcucurull And the code which I am using is correct? I had doubt that maybe I am not using the network properly.

gcucurull · 2019-11-20T09:19:27Z

@monk1337 Yes, it looks good.

monk1337 · 2019-11-20T12:02:37Z

@gcucurull Thanks for the quick response. I have one more doubt, So in attention heads, we are passing a list [8, 1]

I went through the code and the got the idea that it's for the output layer.

for i in range(n_heads[-1]):
            out.append(layers.attn_head(h_1, bias_mat=bias_mat,
                out_sz=nb_classes, activation=lambda x: x,
                in_drop=ffd_drop, coef_drop=attn_drop, residual=False))
        logits = tf.add_n(out) / n_heads[-1]

But what should be the ratio between input heads and output heads? How it is affecting output?

gcucurull · 2019-11-20T14:09:31Z

The output layer is the one computing the logits, if you use multiple heads, the final logits will be the average over the logits produced by each output head.

However, in our experiments we always used only 1 output head, that's why it is set to [8,1].

There isn't really a ratio between input heads and output heads since the number of output heads should be 1. The number of input heads basically controls the number of parameters of the model and its expressive power, so you might want to increase or decrease depending on your task.

monk1337 · 2019-11-20T18:15:58Z

@gcucurull Thanks for the response. Is it input heads are same as no_of_classes? Or what is the default head size should I use if I have big graph?

gcucurull · 2019-11-21T10:20:26Z

The number of input heads and number of classes is not related.

8 input heads worked well for our case so I suggest starting with that value and tweaking it empirically. Increasing it will increase the capacity of the model, decreasing it will reduce it but also speeds things up and lower the memory consumption.

monk1337 · 2019-11-22T04:30:35Z

@gcucurull I tried to experiment with no of heads and hidden units from range 2 to 1024 but couldn't get accuracy nearby GCN layer which I showed above. GCN is producing 90% accuracy and Gat is not crossing more than 85% after many combinations of hidden units and no of heads. I also tried to add two layers of GAT, let me know if it is correct :

        logits_graph = GAT.inference( inputs = realtion_batch, 
                                 nb_classes  = 800, 
                                 nb_nodes    = 22, 
                                 training    = True,
                                 attn_drop   = 0.0, 
                                 ffd_drop    = 0.0,
                                 bias_mat    = adj_batch,
                                 hid_units   = [8],
                                 n_heads     = [8,1],
                                 residual    = False, 
                                 activation  = tf.nn.elu)
        
        
        logits_graph_s = GAT.inference( inputs = logits_graph, 
                                 nb_classes  = 256,
                                 nb_nodes    = 22, 
                                 training    = True,
                                 attn_drop   = 0.0, 
                                 ffd_drop    = 0.0,
                                 bias_mat    = adj_batch,
                                 hid_units   = [8],
                                 n_heads     = [8,1],
                                 residual    = False, 
                                 activation  = tf.nn.elu)

But when I tried these two layers, accuracy is 0.0 for 100 epochs.

Why Gat is not performing better than GCN?

gcucurull · 2019-11-22T09:55:29Z

The code is not quite ok.

First of all, if you want to have multiple GAT layers, you don't have to call GAT.inference twice, you have to increase the number of elements in the hid_units list. Also, why do you set nb_classes to 800? Do you really have 800 classes? You also seem to be working with very small graphs, with nb_nodes set to 22.

The correct way to have a GAT model with 2 layers, with 8 heads per layer and 128 units per head is the following:

    logits_graph = GAT.inference( inputs = realtion_batch, 
                             nb_classes  = NUMBER_OF_OUTPUT_CLASSES, 
                             nb_nodes    = NUMBER_OF_NODES, 
                             training    = True,
                             attn_drop   = 0.0, 
                             ffd_drop    = 0.0,
                             bias_mat    = adj_batch,
                             hid_units   = [128, 128],
                             n_heads     = [8, 8, 1],
                             residual    = False, 
                             activation  = tf.nn.elu)

monk1337 · 2019-11-22T10:13:24Z

@gcucurull n_heads are [8,8,1] or [128,128,1] ?

gcucurull · 2019-11-22T15:48:18Z

Sorry, you are right, it is [8, 8, 1], I edited the message to correct it.

gcucurull · 2020-02-14T09:13:20Z

Did this work?

monk1337 · 2020-02-14T10:30:37Z

Yup

PetarV- closed this as completed Feb 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract features from Graph attention network #40

Extract features from Graph attention network #40

monk1337 commented Nov 19, 2019 •

edited

gcucurull commented Nov 19, 2019 •

edited

monk1337 commented Nov 20, 2019

gcucurull commented Nov 20, 2019

monk1337 commented Nov 20, 2019

gcucurull commented Nov 20, 2019

monk1337 commented Nov 20, 2019

gcucurull commented Nov 21, 2019

monk1337 commented Nov 22, 2019

gcucurull commented Nov 22, 2019 •

edited

monk1337 commented Nov 22, 2019

gcucurull commented Nov 22, 2019

gcucurull commented Feb 14, 2020

monk1337 commented Feb 14, 2020

Extract features from Graph attention network #40

Extract features from Graph attention network #40

Comments

monk1337 commented Nov 19, 2019 • edited

gcucurull commented Nov 19, 2019 • edited

monk1337 commented Nov 20, 2019

gcucurull commented Nov 20, 2019

monk1337 commented Nov 20, 2019

gcucurull commented Nov 20, 2019

monk1337 commented Nov 20, 2019

gcucurull commented Nov 21, 2019

monk1337 commented Nov 22, 2019

gcucurull commented Nov 22, 2019 • edited

monk1337 commented Nov 22, 2019

gcucurull commented Nov 22, 2019

gcucurull commented Feb 14, 2020

monk1337 commented Feb 14, 2020

monk1337 commented Nov 19, 2019 •

edited

gcucurull commented Nov 19, 2019 •

edited

gcucurull commented Nov 22, 2019 •

edited