Gensim's word2vec has a loss of 0 from epoch 1? #2920

LusKrew · 2020-08-20T17:07:34Z

I am using the Word2vec module of Gensim library to train a word embedding, the dataset is 400k sentences with 100k unique words (its not english)

I'm using this code to monitor and calculate the loss :


class MonitorCallback(CallbackAny2Vec):
    def __init__(self, test_words):
        self._test_words = test_words




def on_epoch_end(self, model):
    print("Model loss:", model.get_latest_training_loss())  # print loss
    for word in self._test_words:  # show wv logic changes
        print(model.wv.most_similar(word))



monitor = MonitorCallback(["MyWord"])  # monitor with demo words


w2v_model = gensim.models.word2vec.Word2Vec(size=W2V_SIZE, window=W2V_WINDOW, min_count=W2V_MIN_COUNT  , callbacks=[monitor])

w2v_model.build_vocab(tokenized_corpus)


words = w2v_model.wv.vocab.keys()
vocab_size = len(words)
print("Vocab size", vocab_size)


print("[*] Training...")

w2v_model.train(tokenized_corpus, total_examples=len(tokenized_corpus), epochs=W2V_EPOCH)

The problem is from epoch 1 the loss is 0 and the vector of the monitored words dont change at all!

[*] Training...
Model loss: 0.0
Model loss: 0.0
Model loss: 0.0
Model loss: 0.0

so what is the problem here? is this normal? the tokenized corpus is a list of lists that are something like tokenized_corpus[0] = [ "word1" , "word2" , ...]

I googled and seems like some of the old versions of gensim had problem with calculating loss function, but they are from almost a year ago and it seems like it should be fixed right now?

I tried the code provided in the answer of this question as well but still the loss is 0 :

https://stackoverflow.com/questions/52038651/loss-does-not-decrease-during-training-word2vec-gensim

The text was updated successfully, but these errors were encountered:

gojomo · 2020-08-20T17:37:47Z

You haven't used the compute_loss=True argument to the Word2Vec initialization to enable loss-tallying at all, per docs at https://radimrehurek.com/gensim/models/word2vec.html#gensim.models.word2vec.Word2Vec

After you do that, you may encounter other bugs with the current loss-tracking, which you can read about in detail via the open issues: https://github.com/RaRe-Technologies/gensim/issues?q=is%3Aissue+is%3Aopen+loss+in%3Atitle+

Unless/until you're sure your concern is a bug, questions are better handled via Stack Overflow (where I also answered your question) or the project discussion list, to reserve this issue-tracker for bugs & feature requests.

gojomo closed this as completed Aug 20, 2020

mpenkov mentioned this issue Oct 28, 2020

Update changelog for 4.0.0 release #2981

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gensim's word2vec has a loss of 0 from epoch 1? #2920

Gensim's word2vec has a loss of 0 from epoch 1? #2920

LusKrew commented Aug 20, 2020 •

edited

gojomo commented Aug 20, 2020

Gensim's word2vec has a loss of 0 from epoch 1? #2920

Gensim's word2vec has a loss of 0 from epoch 1? #2920

Comments

LusKrew commented Aug 20, 2020 • edited

gojomo commented Aug 20, 2020

LusKrew commented Aug 20, 2020 •

edited