Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract features from Graph attention network #40

Closed
monk1337 opened this issue Nov 19, 2019 · 13 comments
Closed

Extract features from Graph attention network #40

monk1337 opened this issue Nov 19, 2019 · 13 comments

Comments

@monk1337
Copy link

monk1337 commented Nov 19, 2019

I am trying to extract only features from graph attention network, I was using Gcn as feature extractor and I want to replace it with GAT

gc1 = GraphConvolution( input_dim = 300,  output_dim = 1024, 'first_layer')( features_matrix, adj_matrix )
gc2 = GraphConvolution(input_dim = 1024, output_dim = 10 ,'second_layer') (gc1, adj_matrix)

Where GraphConvolution layer is defined as :

class GraphConvolution():
    """Basic graph convolution layer for undirected graph without edge labels."""
    def __init__(self, input_dim, output_dim, name, dropout=0., act=tf.nn.relu):
        self.name = name
        self.vars = {}

        with tf.variable_scope(self.name + '_vars'):
            self.vars['weights'] = weight_variable_glorot(input_dim, output_dim, name='weights')
        self.dropout = dropout
        
        self.act = act

    def __call__(self, inputs, adj):
        
        with tf.name_scope(self.name):        
            x = inputs
            x = tf.nn.dropout(x, 1 - self.dropout)
            x = tf.matmul(x, self.vars['weights'])
            x = tf.matmul(adj, x)
            outputs = self.act(x)
        return outputs

Now to replace gcn layer with GAT, I tried this :

from gat import GAT

# Because Gat is accepting 3d input [ batch, node, features ]
features_matrix     = tf.expand_dims(features_matrix, axis = 0)
adj_matrix          = tf.expand_dims(adj_matrix, axis = 0)

gat_logits = GAT.inference( inputs = features_matrix, 
                                 nb_classes  = 10, 
                                 nb_nodes    = 22, 
                                 training    = True,
                                 attn_drop   = 0.0, 
                                 ffd_drop    = 0.0,
                                 bias_mat    = adj_matrix,
                                 hid_units   = [8], 
                                 n_heads     = [8, 1],
                                 residual    = False, 
                                 activation  = tf.nn.elu)

Now I want to get just the logits from GAT as features and it should learn the features too, so I set training = True

But the accuracy from GCN features I was getting around 90% but in GAT features I am not able to get accuracy more than 80 %, instead, it should increase the accuracy compared to GCN.

Is there anything I am missing in the network or my hyperparameters are not correct to compare to the hyperparameters i was using in GCN.

@PetarV- @gcucurull Can you suggest me how I can extract feature from GAT and if I am doing correct way then why I am not getting good accuracy.

Thank you

@gcucurull
Copy link
Contributor

gcucurull commented Nov 19, 2019

Looking at your GCN, it looks like the first layer has a hidden size of 1024 units, whereas for GAT you have set hid_units to [8]. This means that the first layer has 8 heads with a hidden size of size 8 each, which is much smaller than the hidden size used in the GCN.

You can try changing hid_units to [128] or [256], which will increase the hidden size of each head in the first layer, increasing the capacity of the model.

@monk1337
Copy link
Author

@gcucurull And the code which I am using is correct? I had doubt that maybe I am not using the network properly.

@gcucurull
Copy link
Contributor

@monk1337 Yes, it looks good.

@monk1337
Copy link
Author

@gcucurull Thanks for the quick response. I have one more doubt, So in attention heads, we are passing a list [8, 1]

I went through the code and the got the idea that it's for the output layer.

for i in range(n_heads[-1]):
            out.append(layers.attn_head(h_1, bias_mat=bias_mat,
                out_sz=nb_classes, activation=lambda x: x,
                in_drop=ffd_drop, coef_drop=attn_drop, residual=False))
        logits = tf.add_n(out) / n_heads[-1]

But what should be the ratio between input heads and output heads? How it is affecting output?

@gcucurull
Copy link
Contributor

The output layer is the one computing the logits, if you use multiple heads, the final logits will be the average over the logits produced by each output head.

However, in our experiments we always used only 1 output head, that's why it is set to [8,1].

There isn't really a ratio between input heads and output heads since the number of output heads should be 1. The number of input heads basically controls the number of parameters of the model and its expressive power, so you might want to increase or decrease depending on your task.

@monk1337
Copy link
Author

@gcucurull Thanks for the response. Is it input heads are same as no_of_classes? Or what is the default head size should I use if I have big graph?

@gcucurull
Copy link
Contributor

The number of input heads and number of classes is not related.

8 input heads worked well for our case so I suggest starting with that value and tweaking it empirically. Increasing it will increase the capacity of the model, decreasing it will reduce it but also speeds things up and lower the memory consumption.

@monk1337
Copy link
Author

@gcucurull I tried to experiment with no of heads and hidden units from range 2 to 1024 but couldn't get accuracy nearby GCN layer which I showed above. GCN is producing 90% accuracy and Gat is not crossing more than 85% after many combinations of hidden units and no of heads. I also tried to add two layers of GAT, let me know if it is correct :

        logits_graph = GAT.inference( inputs = realtion_batch, 
                                 nb_classes  = 800, 
                                 nb_nodes    = 22, 
                                 training    = True,
                                 attn_drop   = 0.0, 
                                 ffd_drop    = 0.0,
                                 bias_mat    = adj_batch,
                                 hid_units   = [8],
                                 n_heads     = [8,1],
                                 residual    = False, 
                                 activation  = tf.nn.elu)
        
        
        logits_graph_s = GAT.inference( inputs = logits_graph, 
                                 nb_classes  = 256,
                                 nb_nodes    = 22, 
                                 training    = True,
                                 attn_drop   = 0.0, 
                                 ffd_drop    = 0.0,
                                 bias_mat    = adj_batch,
                                 hid_units   = [8],
                                 n_heads     = [8,1],
                                 residual    = False, 
                                 activation  = tf.nn.elu)

But when I tried these two layers, accuracy is 0.0 for 100 epochs.

Why Gat is not performing better than GCN?

@gcucurull
Copy link
Contributor

gcucurull commented Nov 22, 2019

The code is not quite ok.

First of all, if you want to have multiple GAT layers, you don't have to call GAT.inference twice, you have to increase the number of elements in the hid_units list. Also, why do you set nb_classes to 800? Do you really have 800 classes? You also seem to be working with very small graphs, with nb_nodes set to 22.

The correct way to have a GAT model with 2 layers, with 8 heads per layer and 128 units per head is the following:

    logits_graph = GAT.inference( inputs = realtion_batch, 
                             nb_classes  = NUMBER_OF_OUTPUT_CLASSES, 
                             nb_nodes    = NUMBER_OF_NODES, 
                             training    = True,
                             attn_drop   = 0.0, 
                             ffd_drop    = 0.0,
                             bias_mat    = adj_batch,
                             hid_units   = [128, 128],
                             n_heads     = [8, 8, 1],
                             residual    = False, 
                             activation  = tf.nn.elu)

@monk1337
Copy link
Author

@gcucurull n_heads are [8,8,1] or [128,128,1] ?

@gcucurull
Copy link
Contributor

Sorry, you are right, it is [8, 8, 1], I edited the message to correct it.

@gcucurull
Copy link
Contributor

Did this work?

@monk1337
Copy link
Author

Yup

@PetarV- PetarV- closed this as completed Feb 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants