Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More memory limiting for HttpPostEmitter #5300

Merged
merged 4 commits into from Jan 26, 2018
Merged

Conversation

jon-wei
Copy link
Contributor

@jon-wei jon-wei commented Jan 26, 2018

This PR adjusts the default maxBatchSize for the HttpPostEmitter such that the default configuration (no maxBatchSize or batchQueueSizeLimit specified) will limit the product of maxBatchSize * batchQueueSizeLimit to 10% of the available JVM heap.

This can be useful since not all Druid nodes will be provisioned with the same amount of memory. In particular, the middle manager is often provisioned with a much smaller heap than the other nodes, which can result in OOM errors from buffered events when using the default values cluster-wide.

This PR also adjusts the large event handling such that they increment and decrement the approximateBuffersToEmitCount counter. Previously it was theoretically possible for the large event queue to keep growing past the configured queue limit.

@@ -222,7 +222,7 @@ The Druid servers [emit various metrics](../operations/metrics.html) and alerts
|`druid.emitter.http.basicAuthentication`|Login and password for authentification in "login:password" form, e. g. `druid.emitter.http.basicAuthentication=admin:adminpassword`|not specified = no authentification|
|`druid.emitter.http.flushTimeOut`|The timeout after which an event should be sent to the endpoint, even if internal buffers are not filled, in milliseconds.|not specified = no timeout|
|`druid.emitter.http.batchingStrategy`|The strategy of how the batch is formatted. "ARRAY" means `[event1,event2]`, "NEWLINES" means `event1\nevent2`, ONLY_EVENTS means `event1event2`.|ARRAY|
|`druid.emitter.http.maxBatchSize`|The maximum batch size, in bytes.|5191680 (i. e. 5 MB)|
|`druid.emitter.http.maxBatchSize`|The maximum batch size, in bytes.|the minimum of (10% of JVM heap size divided by 50) or (5191680 (i. e. 5 MB))|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In memory constrained settings I think it would be better to limit the batchQueueSizeLimit rather than the maxBatchSize. It's because very small batches might be an inefficient way to transfer metrics to the http server. We might want to limit both though. It's common to set Xmx to something small (like 64MB) for middleManagers, and in that case, probably 2 batches in the queue * 2.5 MB per batch makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adjusted this to lower the queue limit first, while also keeping a minimum of 2 queue items

Copy link
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gianm gianm merged commit 2a89270 into apache:master Jan 26, 2018
jon-wei added a commit to implydata/druid-public that referenced this pull request Mar 12, 2018
* More memory limiting for HttpPostEmitter

* Less aggressive large events test

* Fix tests

* Restrict batch queue size first, keep minimum of 2 queue items
@gianm gianm added this to the 0.12.1 milestone Apr 9, 2018
gianm pushed a commit to gianm/druid that referenced this pull request Apr 9, 2018
* More memory limiting for HttpPostEmitter

* Less aggressive large events test

* Fix tests

* Restrict batch queue size first, keep minimum of 2 queue items
fjy pushed a commit that referenced this pull request Apr 10, 2018
* More memory limiting for HttpPostEmitter

* Less aggressive large events test

* Fix tests

* Restrict batch queue size first, keep minimum of 2 queue items
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants