Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jackson's ObjectMapper is completely thread safe and should not be re-instantiated every time #2170

Closed
kyrill007 opened this issue Oct 8, 2016 · 16 comments

Comments

@kyrill007
Copy link

My application is using deeplearning4j. I was just profiling slow loading of Word2Vet model, and the profiler (YourKit) showed that literally all of the time is spent in com.fasterxml.jackson.databind.ObjectMapper instantiations. Please see org.deeplearning4j.models.embeddings.loader.VectorsConfiguration#mapper. ObjectMapper is a completely thread-safe service class, it is meant to be used as singleton across the lifetime of the application. It is also very expensive to create. It makes sense to fix VectorsConfiguration class. It is also prudent to do a review of all usages of ObjectMapper and ensure that it is not instantiated every time. This is a performance killer.

@raver119 raver119 self-assigned this Oct 8, 2016
@raver119
Copy link
Contributor

raver119 commented Oct 8, 2016

It's not really clear for me, how it can be your perf killer, since VectorsConfiguration object is instantiated only once per model creation.

Can you show me your code, that reproduces such behaviour?

@kyrill007
Copy link
Author

Yes, indeed. I have just 3 models that I instantiate on startup. The code in WordVectorSerializer in loadModel iterates over each VocabWord and calls VocabularyWord.fromJson(wordJson). This literally takes 8 seconds to run (cumulatively of course). Which means that every time I run a test in IDE I pay an 8 sec penalty. It is literally the longest operation in test startup. I sure would love to remove the wait. The other important point is to perform a review of ObjectMapper usage in the code base as a whole. It may potentially reveal other problem areas, and the fix will be really simple. Small effort, big reward.

@raver119
Copy link
Contributor

raver119 commented Oct 8, 2016

Can you show me your code, that reproduces such behaviour?

@kyrill007
Copy link
Author

Word2Vec w = WordVectorSerializer.loadFullModel(file.getAbsolutePath());

Load a model from a file?

@kyrill007
Copy link
Author

I mean ObjectMapper is expensive to instantiate. Really expensive. Try it for yourself and profile it.

@raver119
Copy link
Contributor

raver119 commented Oct 8, 2016

Aha.

Don't use loadFullModel please, unless you're going to uptrain models. It's going to be deprecated soon, in favour of unified saving for w2v/d2v/glove model file.

However, i'll make sure ObjectMapper won't be abused there.

@kyrill007
Copy link
Author

what should I use today?

@raver119
Copy link
Contributor

raver119 commented Oct 8, 2016

Are you going to uptrain models, or train once and use it as vectors lookup table for other tasks?

@kyrill007
Copy link
Author

Train once.

@raver119
Copy link
Contributor

raver119 commented Oct 8, 2016

The best thing is writeWordVectors(Word2Vec_object) to save, and loadTxtVectors(File).

In this case you only save vectors, without syn1/syn1neg etc, so twice less ram used after restoration.
And this method is compatible with any other framework out there, since it's basically csv.

@kyrill007
Copy link
Author

Thank you!

@dkincaid
Copy link

dkincaid commented Oct 8, 2016

Question about why you are deprecating writeFullModel(). The problem with using writeWordVectors() to save a model is that then you lose the VocabCache that's part of the model, so can't really do any analysis on the model after it's been saved. If you deprecate that I have to go back to my custom serializer that serializes the word vectors and vocab cache in different files which is really a pain.

@raver119
Copy link
Contributor

raver119 commented Oct 8, 2016

No-no, no worries please. Nothing will be lost.

For cases when people need full exact copy of w2v/d2v/whatever model (like for further additional training), there's already available methods for saving them.

I just want to get rid of 100500 different signatures in WordVectorsSerializer, and make it as simple as possible. Like always save full model, and just have an option to restore model partially if only weights are needed, or restore full model if HS states are needed or huffman codes needed as well.

@raver119
Copy link
Contributor

raver119 commented Oct 8, 2016

And no, we're not going to remove any deprecated methods without reasonable grace period.

@raver119
Copy link
Contributor

raver119 commented Oct 8, 2016

Mapper reuse was added: #2171

@raver119 raver119 closed this as completed Oct 8, 2016
@lock
Copy link

lock bot commented Jan 20, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Jan 20, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants