Improve worker pool reservation to avoid starvation #36

jnadler · 2022-03-01T00:02:41Z

During Elasticsearch indexing, when a batch indexing error occurs we retry with exponential backoff. The worker pool checkout/checkin was managed outside of the block with the backoff sleep(). This caused a condition where if lots of bad documents are arriving and causing batch errors, lots of workers will be unavailable because they are doing backoff sleeps, and worker pool exhaustion is likely.

This change attempts to address that by moving the worker pool checkout/checkin closer to the batch indexing API invocation, which is appropriate as the worker pool is intended to bound the concurrency that Elasticsearch sees.

…etry backoff

Move worker pool reservations down stack to avoid starvation during r…

303c172

…etry backoff

marc-lebourdais approved these changes Mar 1, 2022

View reviewed changes

jnadler merged commit 2e182d4 into master Mar 1, 2022

jnadler deleted the reduce-retry-worker-starvation branch March 1, 2022 16:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve worker pool reservation to avoid starvation #36

Improve worker pool reservation to avoid starvation #36

jnadler commented Mar 1, 2022

Improve worker pool reservation to avoid starvation #36

Improve worker pool reservation to avoid starvation #36

Conversation

jnadler commented Mar 1, 2022