Jackson's ObjectMapper is completely thread safe and should not be re-instantiated every time #2170

kyrill007 · 2016-10-08T15:12:42Z

My application is using deeplearning4j. I was just profiling slow loading of Word2Vet model, and the profiler (YourKit) showed that literally all of the time is spent in com.fasterxml.jackson.databind.ObjectMapper instantiations. Please see org.deeplearning4j.models.embeddings.loader.VectorsConfiguration#mapper. ObjectMapper is a completely thread-safe service class, it is meant to be used as singleton across the lifetime of the application. It is also very expensive to create. It makes sense to fix VectorsConfiguration class. It is also prudent to do a review of all usages of ObjectMapper and ensure that it is not instantiated every time. This is a performance killer.

raver119 · 2016-10-08T17:06:43Z

It's not really clear for me, how it can be your perf killer, since VectorsConfiguration object is instantiated only once per model creation.

Can you show me your code, that reproduces such behaviour?

kyrill007 · 2016-10-08T17:20:22Z

Yes, indeed. I have just 3 models that I instantiate on startup. The code in WordVectorSerializer in loadModel iterates over each VocabWord and calls VocabularyWord.fromJson(wordJson). This literally takes 8 seconds to run (cumulatively of course). Which means that every time I run a test in IDE I pay an 8 sec penalty. It is literally the longest operation in test startup. I sure would love to remove the wait. The other important point is to perform a review of ObjectMapper usage in the code base as a whole. It may potentially reveal other problem areas, and the fix will be really simple. Small effort, big reward.

raver119 · 2016-10-08T17:22:19Z

Can you show me your code, that reproduces such behaviour?

kyrill007 · 2016-10-08T17:27:25Z

Word2Vec w = WordVectorSerializer.loadFullModel(file.getAbsolutePath());

Load a model from a file?

kyrill007 · 2016-10-08T17:28:27Z

I mean ObjectMapper is expensive to instantiate. Really expensive. Try it for yourself and profile it.

raver119 · 2016-10-08T17:30:44Z

Aha.

Don't use loadFullModel please, unless you're going to uptrain models. It's going to be deprecated soon, in favour of unified saving for w2v/d2v/glove model file.

However, i'll make sure ObjectMapper won't be abused there.

kyrill007 · 2016-10-08T17:31:32Z

what should I use today?

raver119 · 2016-10-08T17:32:47Z

Are you going to uptrain models, or train once and use it as vectors lookup table for other tasks?

kyrill007 · 2016-10-08T17:33:34Z

Train once.

raver119 · 2016-10-08T17:37:21Z

The best thing is writeWordVectors(Word2Vec_object) to save, and loadTxtVectors(File).

In this case you only save vectors, without syn1/syn1neg etc, so twice less ram used after restoration.
And this method is compatible with any other framework out there, since it's basically csv.

kyrill007 · 2016-10-08T17:38:42Z

Thank you!

dkincaid · 2016-10-08T18:23:52Z

Question about why you are deprecating writeFullModel(). The problem with using writeWordVectors() to save a model is that then you lose the VocabCache that's part of the model, so can't really do any analysis on the model after it's been saved. If you deprecate that I have to go back to my custom serializer that serializes the word vectors and vocab cache in different files which is really a pain.

raver119 · 2016-10-08T18:29:29Z

No-no, no worries please. Nothing will be lost.

For cases when people need full exact copy of w2v/d2v/whatever model (like for further additional training), there's already available methods for saving them.

I just want to get rid of 100500 different signatures in WordVectorsSerializer, and make it as simple as possible. Like always save full model, and just have an option to restore model partially if only weights are needed, or restore full model if HS states are needed or huffman codes needed as well.

raver119 · 2016-10-08T18:30:29Z

And no, we're not going to remove any deprecated methods without reasonable grace period.

raver119 · 2016-10-08T18:41:42Z

Mapper reuse was added: #2171

lock · 2019-01-20T17:56:50Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

raver119 self-assigned this Oct 8, 2016

raver119 closed this as completed Oct 8, 2016

lock bot locked and limited conversation to collaborators Jan 20, 2019

eclipsewebmaster unassigned raver119 Jun 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jackson's ObjectMapper is completely thread safe and should not be re-instantiated every time #2170

Jackson's ObjectMapper is completely thread safe and should not be re-instantiated every time #2170

kyrill007 commented Oct 8, 2016

raver119 commented Oct 8, 2016

kyrill007 commented Oct 8, 2016

raver119 commented Oct 8, 2016

kyrill007 commented Oct 8, 2016

kyrill007 commented Oct 8, 2016

raver119 commented Oct 8, 2016

kyrill007 commented Oct 8, 2016

raver119 commented Oct 8, 2016

kyrill007 commented Oct 8, 2016

raver119 commented Oct 8, 2016

kyrill007 commented Oct 8, 2016

dkincaid commented Oct 8, 2016

raver119 commented Oct 8, 2016

raver119 commented Oct 8, 2016

raver119 commented Oct 8, 2016

lock bot commented Jan 20, 2019

Jackson's ObjectMapper is completely thread safe and should not be re-instantiated every time #2170

Jackson's ObjectMapper is completely thread safe and should not be re-instantiated every time #2170

Comments

kyrill007 commented Oct 8, 2016

raver119 commented Oct 8, 2016

kyrill007 commented Oct 8, 2016

raver119 commented Oct 8, 2016

kyrill007 commented Oct 8, 2016

kyrill007 commented Oct 8, 2016

raver119 commented Oct 8, 2016

kyrill007 commented Oct 8, 2016

raver119 commented Oct 8, 2016

kyrill007 commented Oct 8, 2016

raver119 commented Oct 8, 2016

kyrill007 commented Oct 8, 2016

dkincaid commented Oct 8, 2016

raver119 commented Oct 8, 2016

raver119 commented Oct 8, 2016

raver119 commented Oct 8, 2016

lock bot commented Jan 20, 2019