Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameters for Les Miserables dataset #2

Closed
mewwts opened this issue Oct 13, 2016 · 15 comments
Closed

Parameters for Les Miserables dataset #2

mewwts opened this issue Oct 13, 2016 · 15 comments

Comments

@mewwts
Copy link

mewwts commented Oct 13, 2016

Hi,

Thanks for node2vec - such an interesting idea.

Could I ask you to specify some additional parameters for the case study 4.1 in you paper so that I can reproduce the community-result?

For the top example you set p=1, q=0.5, but I'm wondering what you specified num_walks, walk_length for the random walk generation, as well as size, window, min_count, sg and iter for Word2Vec.

Hope this isn't too cumbersome to reply to. Thanks again!

@aditya-grover
Copy link
Owner

aditya-grover commented Oct 13, 2016

Please direct all questions regarding the paper to adityag@cs.stanford.edu. Feel free to open an issue if there is any clarification specific to the node2vec implementation provided in this repository.

@mewwts
Copy link
Author

mewwts commented Oct 14, 2016

Sure ¯_(ツ)_/¯

@Tixierae
Copy link

@mewwts did you get the answer?

@mewwts
Copy link
Author

mewwts commented Jan 23, 2018

Hey @Tixierae - I got some parameters from @aditya-grover back then. For word2vec
size=8, window=2, sg=1, iter=1. I was however not able to replicate the results.

@Tixierae
Copy link

@mewwts many thanks for the quick reply! So they did use a non-default window size (the default is 10). It seems indeed to be a critical tuning parameter that really depends on the graph (e.g. see Figure 2 of Watch your step: Learning graph embeddings through attention - from Google).
My guess is that the window size should be to some extent proportional to the size of the graph and to its diameter. It may be harmful to use a window of size 10 if the shortest path between any two nodes in the graph is, say, 3.
Do you know by any chance what values of num_walks and walk_length they used?

@mewwts
Copy link
Author

mewwts commented Jan 23, 2018

Exactly, @Tixierae! Thanks for linking to that paper, looks like a good read. Printed it now.

I was not able to find the values of those parameters sadly. The email I got from @aditya-grover said the random-walk parameters were set to "very low values" due to network size being small.

@Tixierae
Copy link

thanks @mewwts !
@aditya-grover What would you recommend for num_walks, walk_length and window when the graph is small/very dense? Any rule of thumb to set window size based on graph density/diameter?
PS: I know it may not be the best place to ask, but some quick feedback would be very much welcome and would benefit more people than tru private messaging. Thanks much in advance!

@mewwts
Copy link
Author

mewwts commented Jan 23, 2018

@Tixierae I think the best thing you can do for now is try to grid search these parameters. The network is quite small right?

@Tixierae
Copy link

Tixierae commented Jan 23, 2018

@mewwts yes, each network is small, but I have thousands of them, for several datasets. The final task is graph classification, for which I am 10-fold cross validating a 2D CNN, with many epochs for each fold (I'm using this approach). So, I can do a coarse grid search, but each combination of parameters is quite costly to test. Hence, getting good priors would help a lot.

@Tixierae
Copy link

Tixierae commented Feb 7, 2018

@mewwts section 8 of this paper: http://projekter.aau.dk/projekter/files/259997796/mi109f17___Vertex_Similarity.pdf

@mewwts
Copy link
Author

mewwts commented Feb 8, 2018

Thanks @Tixierae - interesting!

@annaguldberg
Copy link

annaguldberg commented May 2, 2020

Hi, I have a network of 311 nodes. It is quite dense with an average shortest path of 2. I have used p=1, q=2 and kept the window size and walk length very small, but are not getting great results. Does anyone have any suggestions to what could be wrong?
G311 npg

@bianxintong
Copy link

@mewwts section 8 of this paper: http://projekter.aau.dk/projekter/files/259997796/mi109f17___Vertex_Similarity.pdf

I was having a hard time replicating the homophily result (structural equivalence was somehow easier to replicate, idk why), thanks to this study, i was finally able to go from this:
image

to:
image
if I resize the node by node degree, I obtain as far the best approximation of the image in the paper that i can get:
image

I guess when the graph is so small, we need to repeat the walk many times to make word2vec actually learn something; and since the window size so small, we need to walk a long way the get the surrounding community structure. And, the window size is definitely important.

@sarmad-MOAHAMMED
Copy link

@mewwts section 8 of this paper: http://projekter.aau.dk/projekter/files/259997796/mi109f17___Vertex_Similarity.pdf

Hi,
Could you share the code for this project ?

Thanks.

@bianxintong
Copy link

bianxintong commented Feb 27, 2021

@mewwts section 8 of this paper: http://projekter.aau.dk/projekter/files/259997796/mi109f17___Vertex_Similarity.pdf

Hi,
Could you share the code for this project ?

Thanks.

Edited on 24-03-2021:
first I compiled the node2vec bin, then did:
!./node2vec -i:lesmisDir.edgelist -o:lesmisDir.emb -d:16 -l:8 -r:100 -k:2 -p:1 -q:0.5 -e:1
then I did a 5 cluster kmeans clustering
then export the result to gephi for graphing.

I found the node2vec bin worked better than open source implementation (stellargraph in this case)

I stumbled upon my notes of replicating the results today, so I modified this comment. I was frustrated by the amount of effort to replicate the result to be honest that was why I didn't document well my process. But I think that's more like a problem of node2vec itself, that the hyperparameters are really sensitive and really depends on your graph.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants