Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node2vec uses CBOW instead of skip-gram #40

Open
ubalklen opened this issue Jun 2, 2021 · 4 comments
Open

node2vec uses CBOW instead of skip-gram #40

ubalklen opened this issue Jun 2, 2021 · 4 comments

Comments

@ubalklen
Copy link

ubalklen commented Jun 2, 2021

Node2vec and DeepWalk original proposals are built upon the skip-gram model. By default, nodevectors does not set the parameter w2vparams["sg"] to 1, therefore the underlying Word2Vec model uses the default value of 0, which means using CBOW instead of skip-gram. This has major consequences in the quality of the embeddings.

@VHRanger
Copy link
Owner

VHRanger commented Jun 2, 2021

Thanks, can you confirm empirically that 1 is better than 0?

If so, I'll change the default along with other udpates this week.

@ubalklen
Copy link
Author

ubalklen commented Jun 2, 2021

It is for my graphs, but I'm not sure if this is always the case. There is some discussion about which one is better, but in the context of NLP. I couldn't find anyone discussing that in the context of graph embeddings.

Anyway, I suggest you to not only use 1 as the default, but also force w2vparams["sg"] to 1 instead of leaving this decision to the programmer, or make this a separate parameter in the same way you did with w2vparams["workers"]. The reason is that it is very easy for the programmer to forget to set this parameter when he wants to customize other w2v parameters (this was exactly how I stumbled upon this). And node2vec was built explicitly with skip-gram in mind.

@rn123
Copy link

rn123 commented Jun 22, 2021

Check the reference below for skip-gram vs CBOW. The quality difference between the two seems to be the fault of a longstanding implementation error in the original word2vec and Gensim implementations.

İrsoy, Ozan, Adrian Benton, and Karl Stratos. “Koan: A Corrected CBOW Implementation.” ArXiv:2012.15332 [Cs, Stat], December 30, 2020. http://arxiv.org/abs/2012.15332.

@gojomo
Copy link

gojomo commented Feb 22, 2022

Just passing through, noticed this issue in the course of answering someone's Node2Vec/Word2Vec interaction question, thought I'd mention:

  • In the decade since word2vec arrived on the scene, I've not seen a strong consensus emerge on whether SG or CBOW is generally better. There are some reasons to suspect CBOW's batching of more operations allows more data/training in the same amount of time... but for some datasets/purposes/parameters/evaluations, SG still does better. So while you might want to default to SG if the original specification of 'Node2Vec' implied it creating a user expectation for that mode, perhaps, but otherwise there may not be a strong reason to nudge users any particular way - leaving it up to the advanced optimizers to dig deeper.
  • I'm not convinced that the 'Koan' paper authors have identified anything 'incorrect' about usual word2vec implementations (including the original word2vec code, & later Facebook FastText code, by some of the original word2vec paper authors). Their paper shows a mix of both better & worse evaluations after their change (depending on the evaluation – & some of the claimed improvements might be due to other changes in the performance or behavior of their code, besides their reinterp of CBOW backpropagation. In particular, tinkering with Gensim Word2Vec parameters cbow_mean, alpha, & (new since their paper) shrink_windows might match or exceed any benefits they've observed (on similar data/evals), without fully adopting their CBOW interpretation. See my longer comments in a Gensim issue for more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants