-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: Why Use metapath2vec for Input Features? #17
Comments
Hi Alex: Thanks for your interest in our work. Regarding the input feature for training GNN, or more broadly, any NN, there's still not exist the best way to handle every situation. But I can have some simple intuitions to choose input features for GNN (1) For some type of nodes that might emerge in the testing data, like new papers, new authors, we should choose some "inductive" features. For example, text, degree, their attributes, etc. The reason we use XLNet as a feature extractor is because people have proved that a powerful pre-trained contextualized embedding model like BERT, XLNet can already capture the linguistic and semantic meaning of the text, so that our GNN model can focus more on their interaction between other nodes instead of learning all the linguistic knowledge from scratch. (Previously people leverage shallow word embedding to solve this task.) Also, a more ideal way is to also finetune the XLnet model in an end-to-end manner, which is very common in NLP tasks. But due to the limitation of computational resources, we just treat XLNet as a feature extractor. (2) For some type of nodes that are always in the graph, like conference, topics, we can actually assign a learnable embedding to it and let it learns end-to-end. I've done such an experiment, and in some small graphs, it shows superior results than a fixed input vector. But for Microsoft academic graph, the number of topic and conference are still in a super large number. Still, due to the computational resource issue, we just do a tradeoff and assign them with an initial vector learned by shallow embedding technique (i.e., metapath2vec). But I'd highly recommend other people try learning an embedding from scratch for these types of nodes. |
Yeah, I can see that point. In my own work using graph and language models jointly to predict events (https://iopscience.iop.org/article/10.1088/2632-072X/aba83d) I used I think my major question/concern was just that running the sampling for On a somewhat unrelated note, do you all have any ideas on how the model could be updated to make inductive predictions on previously unseen nodes (like GraphSAGE/PinSAGE)? I saw you had inductive modeling for time, but didn't see anything in the aggregation that implied inductive learning for new nodes. If that were possible, this would be extremely awesome. I'm presently working on a project and am trying to decide between HGT and PinSAGE because I want it to include inductive learning for time (HGT) as well as new nodes (PinSAGE), but I don't know if someone has figured out how to do both yet. Thanks again for your time! So nice chatting with someone working on networks and language! |
Obviously out method support inductive prediction. For any new nodes, you
can just run the sampling to get neighborhood, and calculate their
embedding based on HGT. (Actually every gnn model that uses fixed attribute
should support inductive prediction).
…On Thu, Sep 3, 2020, 17:52 Alex Ruch ***@***.***> wrote:
Yeah, I can see that point. In my own work using graph and language models
jointly to predict events (
https://iopscience.iop.org/article/10.1088/2632-072X/aba83d) I used
metapath2vec and doc2vec for input features for authors, submissions, and
subreddits; however, I can see your point of how if you are able to use the
metapath2vec embeddings to initialize the node features for the HGT model
it will help speed up training. (I also got excited about the possibility
of avoiding having to code metapaths in part because in networks like
Reddit they can be quite complicated depending on what one wants to
emphasize in the network.)
I think my major question/concern was just that running the sampling for
metapath2vec can take *forever* so I wasn't sure if the time it takes to
do the sampling and then run the embedding helped the HGT model train so
much faster that it offset those costs. I saw in another post you liked to
the original C++ code, and I know elsewhere people have improved the
sampling to be 4-16 times faster, but even in my network of 35.5 million
nodes and 190 million edges it took a *while* (and the sample file for
the online algorithm was ~50GB+). Again, I could certainly see it helping –
it's certainly helping it not have a cold start problem, which is nice.
On a somewhat unrelated note, do you all have any ideas on how the model
could be updated to make inductive predictions on previously unseen nodes
(like GraphSAGE/PinSAGE)? I saw you had inductive modeling for time, but
didn't see anything in the aggregation that implied inductive learning for
new nodes. If that were possible, this would be extremely awesome. I'm
presently working on a project and am trying to decide between HGT and
PinSAGE because I want it to include inductive learning for time (HGT) as
well as new nodes (PinSAGE), but I don't know if someone has figured out
how to do both yet.
Thanks again for your time! So nice chatting with someone working on
networks and language!
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#17 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHREXR4PWUYZ3QILYOMSF6LSEA26LANCNFSM4QWPJH5A>
.
|
While reading through your fascinating paper, I noticed that you all do a huge amount of work initializing the input features. For example, you noted that "For the field, venue, and institute nodes, we use the metapath2vec model [3] to train their node embeddings by reflecting the heterogeneous network structures."
Having worked with
metapath2vec
and knowledge graphs quite a bit myself, this must have taken a good deal of time and quite a bit of RAM. It confused me to see this kind of processing in the paper given that you said the HGT model should learn metapaths itself. I was expecting to see the HGT model learn these feature representations without requiring all that work up front.So my question is this: why bother with these steps? Was it simply to speed up training? (The same question applies abstractly to why you used XLNet for papers.)
Many thanks in advance!
Best,
Alex
The text was updated successfully, but these errors were encountered: