-
Notifications
You must be signed in to change notification settings - Fork 625
Description
In line 523 of word2vec there is a formula:
g = (1 - vocab[word].code[d] - f) * alpha;
Can you please help me understand its logic?
Since f is the cross product of embedding and context in the case of hierarchical softmax we want it to be as close as possible to the turn (0 or 1) in a Huffman tree we have to take for this previous word (embedding) and current word's node index (context). In this case we just need
g = (vocab[word].code[d] - f)*alpha
Taking into account that vocab[word].code[d] could be 0 or 1 only, the "1 - vocab[word].code[d]" is just the inversion left-to-right-and-back nodes; what's its purpose?
I summed up some details here: https://datascience.stackexchange.com/questions/129865/intuition-behind-g-variable-calculation-in-the-original-word2vec-implementation