New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory increase from 0.90.2 to 0.90.3 on java client API #3624
Comments
It seems like there is no leak, since 5 BulkRequests * 5000 end up being 25k index requests (which hold the source to be index, which is the bytes array). How big is each index request you have, I mean, what is the size of each document end up being? Does it add up to the memory used? Nothing jumps to mind with changes we did around it. What is the behavior in 0.90.2? How much memory is it using? As a side note, 25k index requests is probably too much in how much it ends up with size in bytes per request (assuming its correct). You don't want to send a 100mb bulk request, probably make sense to make that smaller so it will be more efficient on the network (probably between 10-20mb). |
Thanks Kimchy for your answer. {
_index: customers
_id: customer
_version: 1378062062000
_score: 1
_source: {
identifiantadb: 92492430
identifiantclientagent: z1234
identifiantintermediaire: 987654
codepopulation: AZE
statutclient: PP
idxnaturepersonne: PART
civilite: 01
prenom: Jean
codepostal: 75001
localite: PARIS
identifiantrcedisplay: 0124924655
identifiantclientagentdisplay: C0156
idstatutclient: 2
nompersonne: DUPONT
adressedisplay: 2 RUE DE PARIS
codepostalprincipal: 78380
localiteprincipale: BOUGIVAL
telephoneprincipal: 0102030405
telephones: 0102030405
}
} So 25k index requets (5 threads x 1 bulk request of 5000 index requests) of 1Kb bytes max should required 15 Mb of memory. I really don't understand why 800 Mb seems to be used. A BulkRequest contains 5000 index requets thus should have a size of 5Mb. Previously, with the ES 0.90.2 client, the memory allocation was much lower. I have done some memory snapshots to reach the same number of IndexRequest How to explain why 26 as much memory are required? |
Thats very strange then... . Since there are different ways to use the bulk API, si there a chance for a stand alone simple recreation of this (with dummy data for example?). This will make figuring this out on my end much faster (I will try nonetheless). If you can do it quickly, I can try and see if there is a problem, to get it fixed for the 0.90.4 release (slated for early next week...). [Update]: Also, I assume, but just double checking, that the code on your end is exactly the same, and you just replace the ES jar file. |
Mmm, I think I found where its coming from..., we changed the default value for our output stream ( |
Btw, you can work around it by creating your own XContentBuilder while providing the |
We changed the default of BytesStreamOutput (used in various places in ES) to 32k from 1k with the assumption that most stream tend to be large. This doesn't hold for example when indexing small documents and adding them using XContentBuilder (which will have a large overhead). Default the buffer size to 2k now, but be relatively aggressive in expanding the buffer when below 256k (double it), and just use oversize (1/8th) when larger to try and minimize garbage and buffer copies. relates to #3624 closes #3638
We changed the default of BytesStreamOutput (used in various places in ES) to 32k from 1k with the assumption that most stream tend to be large. This doesn't hold for example when indexing small documents and adding them using XContentBuilder (which will have a large overhead). Default the buffer size to 2k now, but be relatively aggressive in expanding the buffer when below 256k (double it), and just use oversize (1/8th) when larger to try and minimize garbage and buffer copies. relates to #3624 closes #3638
I fixed it in #3638, should be good now, will be part of next 0.90.4. Feel free to reopen if you still have problems!. |
Thank you Kimchy for having found the origin of my problem and fix it in the 0.90.4 version. XContentBuilder content = new XContentBuilder(JsonXContent.jsonXContent, new BytesStreamOutput(1024)).startObject(); I relanch the batch with a Xmx of 512 and my OutOfMemoryError goes away. |
We changed the default of BytesStreamOutput (used in various places in ES) to 32k from 1k with the assumption that most stream tend to be large. This doesn't hold for example when indexing small documents and adding them using XContentBuilder (which will have a large overhead). Default the buffer size to 2k now, but be relatively aggressive in expanding the buffer when below 256k (double it), and just use oversize (1/8th) when larger to try and minimize garbage and buffer copies. relates to elastic#3624 closes elastic#3638
Context
Our Java batch indexes 13 millions of document with the ElasticSearch Java API.
And our ElasticSearch cluster contains a single data node and a single shard.
Problem
By upgrading the ElasticSearch server and client from version 0.90.2 to 0.90.3, the batch stops with an OutOfMemoryError. This memory error only occurs on client side (ie. the batch).
Until now, 512 Mb will be enough to run the batch without memory problem (as well as in production environment). With the old 0.19.2 version, we neither had this problem.
By updating the Xmx value to 1g, the batch falls again. The Xms has to be set to 1300Mb in order the batch finish with success.
By downgrading client version from 0.90.3 to 0.90.2 (the ES cluster still runs with the 0.90.3 version), our batch problem goes away and 512 Mb of memory are enough.
So I believe a change between the 0.90.2 and 0.90.3 versions causes ES to require more memory (example: increase byte[] array buffer default size ?)
More informations
The batch uses the BulkRequestBuilder API. Bulk request contains 5 000 request. At most 20 threads could be running in parallel. But they are not writing to ES at the same time.
As you can see on the below screenshot, a OutOfMemory hprof dump indicates that 800Mb of byte[] is coming from the org.elasticsearch.common.bytes.BytesArray structure.
We have 25 000 org.elasticSearch.action.index.IndexRequest in memory.
Do other ElasticSearch users have this kind of memory problem with the 0.90.3 version?
To solve our issue, I see many possibilities:
The last two solutions have a flaw: the batch will run more longer.
The text was updated successfully, but these errors were encountered: