-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jackson's ObjectMapper is completely thread safe and should not be re-instantiated every time #2170
Comments
It's not really clear for me, how it can be your perf killer, since VectorsConfiguration object is instantiated only once per model creation. Can you show me your code, that reproduces such behaviour? |
Yes, indeed. I have just 3 models that I instantiate on startup. The code in WordVectorSerializer in loadModel iterates over each VocabWord and calls VocabularyWord.fromJson(wordJson). This literally takes 8 seconds to run (cumulatively of course). Which means that every time I run a test in IDE I pay an 8 sec penalty. It is literally the longest operation in test startup. I sure would love to remove the wait. The other important point is to perform a review of ObjectMapper usage in the code base as a whole. It may potentially reveal other problem areas, and the fix will be really simple. Small effort, big reward. |
|
Word2Vec w = WordVectorSerializer.loadFullModel(file.getAbsolutePath()); Load a model from a file? |
I mean ObjectMapper is expensive to instantiate. Really expensive. Try it for yourself and profile it. |
Aha. Don't use loadFullModel please, unless you're going to uptrain models. It's going to be deprecated soon, in favour of unified saving for w2v/d2v/glove model file. However, i'll make sure ObjectMapper won't be abused there. |
what should I use today? |
Are you going to uptrain models, or train once and use it as vectors lookup table for other tasks? |
Train once. |
The best thing is writeWordVectors(Word2Vec_object) to save, and loadTxtVectors(File). In this case you only save vectors, without syn1/syn1neg etc, so twice less ram used after restoration. |
Thank you! |
Question about why you are deprecating writeFullModel(). The problem with using writeWordVectors() to save a model is that then you lose the VocabCache that's part of the model, so can't really do any analysis on the model after it's been saved. If you deprecate that I have to go back to my custom serializer that serializes the word vectors and vocab cache in different files which is really a pain. |
No-no, no worries please. Nothing will be lost. For cases when people need full exact copy of w2v/d2v/whatever model (like for further additional training), there's already available methods for saving them. I just want to get rid of 100500 different signatures in WordVectorsSerializer, and make it as simple as possible. Like always save full model, and just have an option to restore model partially if only weights are needed, or restore full model if HS states are needed or huffman codes needed as well. |
And no, we're not going to remove any deprecated methods without reasonable grace period. |
Mapper reuse was added: #2171 |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
My application is using deeplearning4j. I was just profiling slow loading of Word2Vet model, and the profiler (YourKit) showed that literally all of the time is spent in com.fasterxml.jackson.databind.ObjectMapper instantiations. Please see org.deeplearning4j.models.embeddings.loader.VectorsConfiguration#mapper. ObjectMapper is a completely thread-safe service class, it is meant to be used as singleton across the lifetime of the application. It is also very expensive to create. It makes sense to fix VectorsConfiguration class. It is also prudent to do a review of all usages of ObjectMapper and ensure that it is not instantiated every time. This is a performance killer.
The text was updated successfully, but these errors were encountered: