Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory increase from 0.90.2 to 0.90.3 on java client API #3624

Closed
arey opened this issue Sep 5, 2013 · 7 comments
Closed

Memory increase from 0.90.2 to 0.90.3 on java client API #3624

arey opened this issue Sep 5, 2013 · 7 comments
Assignees

Comments

@arey
Copy link
Contributor

arey commented Sep 5, 2013

Context

Our Java batch indexes 13 millions of document with the ElasticSearch Java API.
And our ElasticSearch cluster contains a single data node and a single shard.

Problem

By upgrading the ElasticSearch server and client from version 0.90.2 to 0.90.3, the batch stops with an OutOfMemoryError. This memory error only occurs on client side (ie. the batch).
Until now, 512 Mb will be enough to run the batch without memory problem (as well as in production environment). With the old 0.19.2 version, we neither had this problem.
By updating the Xmx value to 1g, the batch falls again. The Xms has to be set to 1300Mb in order the batch finish with success.
By downgrading client version from 0.90.3 to 0.90.2 (the ES cluster still runs with the 0.90.3 version), our batch problem goes away and 512 Mb of memory are enough.
So I believe a change between the 0.90.2 and 0.90.3 versions causes ES to require more memory (example: increase byte[] array buffer default size ?)

More informations

The batch uses the BulkRequestBuilder API. Bulk request contains 5 000 request. At most 20 threads could be running in parallel. But they are not writing to ES at the same time.
As you can see on the below screenshot, a OutOfMemory hprof dump indicates that 800Mb of byte[] is coming from the org.elasticsearch.common.bytes.BytesArray structure.
We have 25 000 org.elasticSearch.action.index.IndexRequest in memory.

batch_memory_live_objects

Do other ElasticSearch users have this kind of memory problem with the 0.90.3 version?

To solve our issue, I see many possibilities:

  • Increase batch Xmx to 1300Mb. Our production environnement as more than 40 millions of documents so I fears this value will be not enough.
  • Use the 0.90.2 version of ElasticSearch for our batch
  • Wait a new version of ElasticSearch that fix this problem.
  • Decrease the number of request in bulk
  • Decrease the number of threads

The last two solutions have a flaw: the batch will run more longer.

@kimchy
Copy link
Member

kimchy commented Sep 5, 2013

It seems like there is no leak, since 5 BulkRequests * 5000 end up being 25k index requests (which hold the source to be index, which is the bytes array). How big is each index request you have, I mean, what is the size of each document end up being? Does it add up to the memory used?

Nothing jumps to mind with changes we did around it. What is the behavior in 0.90.2? How much memory is it using?

As a side note, 25k index requests is probably too much in how much it ends up with size in bytes per request (assuming its correct). You don't want to send a 100mb bulk request, probably make sense to make that smaller so it will be more efficient on the network (probably between 10-20mb).

@ghost ghost assigned kimchy Sep 5, 2013
@arey
Copy link
Contributor Author

arey commented Sep 5, 2013

Thanks Kimchy for your answer.
As you can see, JSON document sources are very small (between 500 bytes and 1Kb in a text editor).
Here an anonymous example :

{
    _index: customers
    _id: customer
    _version: 1378062062000
    _score: 1
    _source: {
        identifiantadb: 92492430
        identifiantclientagent: z1234
        identifiantintermediaire: 987654
        codepopulation: AZE
        statutclient: PP
        idxnaturepersonne: PART
        civilite: 01
        prenom: Jean
        codepostal: 75001
        localite: PARIS
        identifiantrcedisplay: 0124924655
        identifiantclientagentdisplay: C0156
        idstatutclient: 2
        nompersonne: DUPONT
        adressedisplay: 2 RUE DE PARIS
        codepostalprincipal: 78380
        localiteprincipale: BOUGIVAL
        telephoneprincipal: 0102030405
        telephones: 0102030405
    }
}

So 25k index requets (5 threads x 1 bulk request of 5000 index requests) of 1Kb bytes max should required 15 Mb of memory. I really don't understand why 800 Mb seems to be used. A BulkRequest contains 5000 index requets thus should have a size of 5Mb.

Previously, with the ES 0.90.2 client, the memory allocation was much lower. I have done some memory snapshots to reach the same number of IndexRequest
The following screenshot has 25k IndexRequest like the upper one. Compared to the 800 Mb of the 0.90.3, only 29 Mb are used wigth the 0.90.2 :

batch_memory_live_objects_0 19 2

How to explain why 26 as much memory are required?

@kimchy
Copy link
Member

kimchy commented Sep 5, 2013

Thats very strange then... . Since there are different ways to use the bulk API, si there a chance for a stand alone simple recreation of this (with dummy data for example?). This will make figuring this out on my end much faster (I will try nonetheless). If you can do it quickly, I can try and see if there is a problem, to get it fixed for the 0.90.4 release (slated for early next week...).

[Update]: Also, I assume, but just double checking, that the code on your end is exactly the same, and you just replace the ES jar file.

@kimchy
Copy link
Member

kimchy commented Sep 5, 2013

Mmm, I think I found where its coming from..., we changed the default value for our output stream (BytesStreamOutput) initial array size to 32k, which makes sense in most cases, but potentially not in your case (this is the output stream we create for each XContentBuilder you create). Might make sense to reduce it when building a document using XContent.

@kimchy
Copy link
Member

kimchy commented Sep 5, 2013

Btw, you can work around it by creating your own XContentBuilder while providing the BytesOutputStream for it yourself, with a size value.

kimchy added a commit that referenced this issue Sep 6, 2013
We changed the default of BytesStreamOutput (used in various places in ES) to 32k from 1k with the assumption that most stream tend to be large. This doesn't hold for example when indexing small documents and adding them using XContentBuilder (which will have a large overhead).

Default the buffer size to 2k now, but be relatively aggressive in expanding the buffer when below 256k (double it), and just use oversize (1/8th) when larger to try and minimize garbage and buffer copies.

relates to #3624
closes #3638
kimchy added a commit that referenced this issue Sep 6, 2013
We changed the default of BytesStreamOutput (used in various places in ES) to 32k from 1k with the assumption that most stream tend to be large. This doesn't hold for example when indexing small documents and adding them using XContentBuilder (which will have a large overhead).

Default the buffer size to 2k now, but be relatively aggressive in expanding the buffer when below 256k (double it), and just use oversize (1/8th) when larger to try and minimize garbage and buffer copies.

relates to #3624
closes #3638
@kimchy
Copy link
Member

kimchy commented Sep 6, 2013

I fixed it in #3638, should be good now, will be part of next 0.90.4. Feel free to reopen if you still have problems!.

@kimchy kimchy closed this as completed Sep 6, 2013
@arey
Copy link
Contributor Author

arey commented Sep 6, 2013

Thank you Kimchy for having found the origin of my problem and fix it in the 0.90.4 version.
At the moment, I don't know if we could wait the ES release next week. I will check.
By waiting, I used your suggestion to instantiate by myself the BytesStreamOutput with a size of 1024.

XContentBuilder content = new XContentBuilder(JsonXContent.jsonXContent, new BytesStreamOutput(1024)).startObject();

I relanch the batch with a Xmx of 512 and my OutOfMemoryError goes away.
So, if we can't wait the 0.90.4 release, I have a work around.

mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
We changed the default of BytesStreamOutput (used in various places in ES) to 32k from 1k with the assumption that most stream tend to be large. This doesn't hold for example when indexing small documents and adding them using XContentBuilder (which will have a large overhead).

Default the buffer size to 2k now, but be relatively aggressive in expanding the buffer when below 256k (double it), and just use oversize (1/8th) when larger to try and minimize garbage and buffer copies.

relates to elastic#3624
closes elastic#3638
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants