-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize FastText.load_fasttext_model
#2340
Conversation
FastText.load_fasttext_model
FastText.load_fasttext_model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mpenkov, what's still missing
- ngram byte-based func (FB port) in "glued" form I guess (to avoid "".join().split())
- more tests (esp. for previous note)
- final measurements (by time) for loading (before VS after VS FB impl) in 2 variants
- load model and retrieve vector to word
- load model and start training
backward compatibility fix for fastText fixes in 3.7 break compatibility with old models #2341Fixed by Fix backward compatibility issue: loadingFastTextKeyedVectors
usingKeyedVectors
(missing attributecompatible_hash
) #2349
@@ -704,6 +708,14 @@ def train(self, sentences=None, corpus_file=None, total_examples=None, total_wor | |||
>>> model.train(sentences, total_examples=model.corpus_count, epochs=model.epochs) | |||
|
|||
""" | |||
cant_train = hasattr(self.trainables, 'syn1neg') and self.trainables.syn1neg is None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stupid question: what if self.trainables does not have syn1neg
attr at all, so can model train ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I don't see any other code that sets syn1neg to None. So, the new code uses that value to mean "cannot continue training".
If trainables does not have syn1neg at all, it is possible to start training.
I benchmarked the model loading in this PR against 3.7.0 using: from gensim.models import FastText
import logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(filename)s:%(lineno)s - %(message)s')
m = FastText.load_fasttext_format("cc.ru.300.bin") Before: 13 min |
FastText.load_fasttext_model
FastText.load_fasttext_model
We're still considerably slower than the FB app:
|
@mpenkov how much of that time is loading vs access? (in gensim and in fb) EDIT: n/m, I see there are just a few words accessed, so this must be all loading. |
@piskvorky you are right, all of it is loading, retrieve a vector by word works fast. Note: I guess reason in "retrieve vectors for vocab" ( |
Should fix #1261