Limit request size #16011

danielmitterdorfer · 2016-01-15T08:41:12Z

Overly large bulk request can threaten the stability of Elasticsearch. Hence we want to limit the size of a bulk request.

Although the need originally arose for bulk requests, the solution will apply to requests in general and not just bulk requests.

There will be two (configurable) limits:

A limit on the size (in bytes) of an individual request
A limit on the size (in bytes) of all requests that are currently in flight

dadoonet · 2016-01-15T09:58:27Z

Just a thought: I wonder if the default limit should be something like the default http.max_content_length divided by two or something along those lines?
May be we should (also?) check that this max size is actually not bigger than http.max_content_length and reject the settings in such a case?

danielmitterdorfer · 2016-01-15T12:28:41Z

Thanks for your input @dadoonet! I am not sure about your suggestion on the check of http.max_content_length though as this relies on a protocol level setting whereas I'd try to enforce bulk request size in a protocol-independent fashion. I mean this should also work if somebody uses only the transport protocol and then I wouldn't do a validation of this setting based on a http related one. But tbh I did not look into the code yet. wdyt?

dadoonet · 2016-01-15T12:38:27Z

ha! That is correct. I thought http.max_content_length was applied to transport layer as well...
Ignore me then! :D

danielmitterdorfer · 2016-01-15T12:39:54Z

Ok, thanks for the clarification.

danielmitterdorfer · 2016-01-18T16:47:11Z

Design Parameters

After looking at the source, I have identified a few options which I'd like to discuss before implementing this feature. Below are some design parameters which are worth considering from my point of view.

Type of endpoint

A (bulk) request can hit the cluster via the following endpoints:

HTTP: hint: the REST handler reissues the HTTP request as a transport request via the client interface.
Transport (or more precise: remote transport)
Local transport: An optimization to avoid the overhead of a remote call when just forwarding a request within the same process.

Applicability of the limit

This means whether we want to apply the limit for all types of requests or just for a limited number of explicitly defined request types (like bulk requests).

When to check the limit

Apply during deserialization: This would allow us to abort request handling as soon as we know the limit is hit but at the expense of implementing this for all protocols separately. For the transport protocol, the right place looks like MessageChannelHandler#handleRequest().
Apply before a (transport) request is handled (i.e. in the appropriate subclass of TransportAction): This would simplify the validation as we would just check once for all endpoint types (as we know that HTTP requests are reissued as transport requests). However, at this point in time the request is already deserialized thus consuming memory and all we can do is preserve any additional system resources by aborting request processing.

Request size calculation

To know whether a request has (b)reached the limit, we need to calculate its size. Considering BulkRequest for now, we have two options on how to determine the request size:

BulkRequest#estimatedSize(): This works but has shortcomings in corner cases (see e.g. BulkRequest estimated size in bytes does not include script param size #15589)
If we enforce the limit already during deserialization we can calculate request size exactly, e.g. by writing a custom StreamInput implementation that counts the bytes deserialized.

Proposal

Based on the options above, I want to sketch a simple solution proposal:

Considering that it is likely we want limit checks not only for bulk requests but also similar ones, like multi-gets, we should not tie this too specifically to bulk requests.

Hence, each for each request type that should be size-limited, the corresponding TransportRequest class implements a new interface RequestSizeEstimator (draft name):

public interface RequestSizeEstimator {
  int getRequestSizeInBytes();
}

We define one configurable request size limit (default e.g. 50MB) for all request types and implement limit breach detection in a high-level fashion as an ActionFilter. This ActionFilter checks whether the TransportRequest implements RequestSizeEstimator and checks against this limit. If the limit is breached, we throw an appropriate exception indicating a client error (4xx in HTTP speak).

In summary, the pros and cons of this solution are:

Pros:

The limit is only checked in a single place in the code base
Easily extensible: We can add checks for new request types easily by having the corresponding transport request class implement RequestSizeEstimator.

Cons:

We deserialize the full request before we check the limit. This means we still hit a potential memory spike but we avoid using any other resources (CPU, I/O) after that point.
The size calculation may not be entirely accurate (see my comments above).

Feedback is very much appreciated.

danielmitterdorfer · 2016-01-19T12:51:56Z

Based on feedback I got by @bleskes, here's a revised approach:

We limit the size of requests on protocol level during deserialization. We consider two cases:

Receiving a request: We will apply one global limit for all requests that are in flight
Sending a request (internally in the cluster): We will apply a limit for each individual request

Limiting on protocol level has a couple of advantages:

We cancel request processing as soon as we hit the configured memory limit
It allows us to introduce a limit per action (not in the scope of this ticket though)

In the first step, we will implement an action-independent limit which applies to all actions not just bulk actions.

Transports that need to be considered are Netty transport and local transport (for testing purposes). HTTP is unaffected because we just stream data from the HTTP layer to the transport layer (no up-front allocation).

We need to check whether circuit breakers are a feasible option to detect limit breaches.

danielmitterdorfer · 2016-01-21T09:01:53Z

@dakrone: Could you share your thoughts whether using CircuitBreaker is feasible for limiting memory usage in the following two scenarios?

Limiting the memory usage of all in-flight requests on a single node level.
Limiting the memory usage of each individual outgoing request of a node.

In both cases the memory usage will be determined by the size of the deserialized request payload.

(more details above; that's just the summary for you)

Scenario 1

I have seen that there is already a "request" circuit breaker but its BreakerSettings are (a) cluster-wide and (b) relative to the amount of available heap memory. I'd tackle this by introducing new BreakerSettings for all requests in flight at node level (which would have to be a new value in org.elasticsearch.common.settings.Settings.Scope)) and with an absolute but user-configurable limit (e.g. 50mb).

Scenario 2

We also want to limit memory usage on a single request basis (not across all requests that are in flight) and tbh I think that circuit breakers are not a good fit because we would need a new circuit breaker instance for each request as far as I understand it. However, it would make the implementation more uniform.

It would be great to hear your input on this.

dakrone · 2016-02-01T17:50:43Z

@danielmitterdorfer

Could you share your thoughts whether using CircuitBreaker is feasible for
limiting memory usage in the following two scenarios?

Limiting the memory usage of all in-flight requests on a single node level.
Limiting the memory usage of each individual outgoing request of a node.

I have seen that there is already a "request" circuit breaker but its
BreakerSettings are (a) cluster-wide and (b) relative to the amount of
available heap memory. I'd tackle this by introducing new BreakerSettings for
all requests in flight at node level (which would have to be a new value in
org.elasticsearch.common.settings.Settings.Scope)) and with an absolute but
user-configurable limit (e.g. 50mb).

This is exactly what the circuit breaker is for, you can easily define a new
breaker (and breaker type) in HierarchyCircuitBreakerService to do just that.
I would recommend this heartily :) (and I'm happy to help)

We also want to limit memory usage on a single request basis (not across all
requests that are in flight) and tbh I think that circuit breakers are not a
good fit because we would need a new circuit breaker instance for each request
as far as I understand it. However, it would make the implementation more
uniform.

To me, this doesn't sound like a good fit for the circuit breaker (since it is
designed to be per-node, not per-request). Instead, this sounds like a better
fit for the validation framework, since that works on the per-request level.

danielmitterdorfer · 2016-02-02T09:59:01Z

Thanks for your feedback! I'll implement it based on your pointers. Would be great if you could take a look at it then.

danielmitterdorfer · 2016-02-02T15:26:50Z

@dakrone,

I have started implementing request size limiting based on your pointers (see danielmitterdorfer/elasticsearch@802f6ed). I have added a new circuit breaker to HierarchyCircuitBreakerService but then fell over CircuitBreakerServiceIT#testParentChecking(). The test spuriously hangs and the reason is that the parent circuit breaker tripped and we cannot get any request through at the transport layer afterwards. I am not sure whether we want the dependency among circuit breakers at all in this case. wdyt?

Other things I have noted:

I am setting an absolute limit for my new circuit breaker but it looks to me that the intention is that all leaf circuit breaker limits add up to 100%, so I think I should reduce the other values accordingly and create a relative setting instead of an absolute one, right?
I have introduced an adapter interface Limiter in my implementation and either use one that delegates to a circuit breaker (for scenario 1; in-flight requests) and one were I manually check the bytes used (scenario 2, per request limiting). I throw a CircuitBreakingException in both implementations but for the second case I think it is a bit too implementation specific and does not fit well. I can now either introduce a new exception or rename / move CircuitBreakingException to something like QuotaExceededException which would fit in both cases.

dakrone · 2016-02-04T18:18:55Z

The test spuriously hangs and the reason is that the parent circuit breaker
tripped and we cannot get any request through at the transport layer
afterwards. I am not sure whether we want the dependency among circuit
breakers at all in this case. wdyt?

This came up when I added the request breaker on BigArrays, because we always
want to be able to allocate a buffer to send an error response! The way that I
worked around it before was splitting it into a circuit-breaking and
non-circuit-breaking category, where the non-circuit-breaking did the
addWithoutBreaking(numBytes) so that the usage was still tracked, it just
didn't prevent requests from being sent out.

So in your case, I think we may want to limit which requests fall into the
circuit breaking category (index requests, bulk requests, etc), and which don't
(errors, cluster health, cluster state, etc).

What do you think?

I am setting an absolute limit for my new circuit breaker but it looks to me
that the intention is that all leaf circuit breaker limits add up to 100%, so
I think I should reduce the other values accordingly and create a relative
setting instead of an absolute one, right?

No, they definitely don't have to add up to 100%, that's why the parent is
there. The idea is that you can have something like this:

fielddata: 40%
big_arrays: 30%
requests (your new one): 50mb

parent: 60%

Which allows us to limit individual parts to certain amounts (like absolute
amounts), while still limiting circuit breaking memory across the node on the
whole (the parent breaker).

I definitely think this new breaker should be absolute instead of relative.
You're on the right track with it.

I have introduced an adapter interface Limiter in my implementation and either
use one that delegates to a circuit breaker (for scenario 1; in-flight
requests) and one were I manually check the bytes used (scenario 2, per
request limiting). I throw a CircuitBreakingException in both implementations
but for the second case I think it is a bit too implementation specific and
does not fit well. I can now either introduce a new exception or rename / move
CircuitBreakingException to something like QuotaExceededException which would
fit in both cases.

What makes you think it is too implementation specific? You could always
sub-class CircuitBreakingException if you wanted it to be a separate type, but
I don't think it's necessarily bad to keep it as such. It is a circuit breaker,
just on the per-request side instead of the node side, so with a good message I
think that can be made clear to the user.

bleskes · 2016-02-04T19:37:01Z

because we always
want to be able to allocate a buffer to send an error response!

I think it makes sense to never circuit break a response (both when sending and receiving).

danielmitterdorfer · 2016-02-09T13:49:30Z

@dakrone: First of all, thanks for your thoughts.

So in your case, I think we may want to limit which requests fall into the circuit breaking category (index requests, bulk requests, etc), and which don't (errors, cluster health, cluster state, etc).

With the current implementation this is not as easy as I've added a LimitingInputStream and an LimitingOutputStream which wraps the actual streams. Therefore, size limiting is transparent to the user. We could make the client aware though and expose an additional method for disabling limit tracking.

Another approach I am thinking of is to decide at creation time of stream based on the action whether it needs to break or not. If yes, we wrap the original one in a limiting stream, otherwise we just leave the stream as is. For me, the most appropriate place to add this support seems to be TransportRequestOptions.

For now I have stabilized the test in question by increasing the limit so it still breaks as intended but does not hit the request size limit. I think this is ok given that we need some minimum amount to handle the request at all.

[The child circuit breaker limits] definitely don't have to add up to 100%
[...]
I definitely think this new breaker should be absolute instead of relative.
You're on the right track with it.

That's great, then I'll leave that part as is. Thanks for the clarification. :)

I think it is a bit too implementation specific [to throw a CircuitBreakingException in all cases]
What makes you think it is too implementation specific?

My thought was that we should throw CircuitBreakingException only when a "proper" circuit breaker is involved (as the name of the exception indicates) but I am fine with using CircuitBreakingException in the other case as well.

I think it makes sense to never circuit break a response (both when sending and receiving).

@bleskes: Thanks for the hint. I have reduced the scope of size limiting to requests only (i.e. TransportStatus.isRequest(status) == true).

danielmitterdorfer · 2016-02-11T12:28:33Z

I have pushed another commit on the feature branch. We exhibit the following behaviour when a (bulk) request hits the limit:

Transport client: CircuitBreakingException is thrown
Node client: We get errors on the individual items of a bulk request (i.e. BulkItemResponse.isFailure() == true). This is due to the fact that the node client does not hit the transport layer but starts executing TransportBulkAction directly (i.e. in the same process).
REST API: This bypasses the transport layer so the recommended practice is to use http.max_content_length. While I think we could add support in NettyHttpServerTransport these two mechanisms are quite similar so I'd avoid reimplementing it there.

I am not too happy that we behave differently depending on the protocol but as this is implemented on (transport) protocol level, it is not much of a surprise. Wdyt @bleskes?

Dieken · 2016-03-05T09:11:52Z

@danielmitterdorfer I think the http.max_content_length is to limit content length for any single http request, not the total size of in-flight http requests.

I made a patch Dieken@f2d487e against v2.2.0 to limit total size of in-flight HTTP bulk requests as a temporary solution, because it seems your patch hasn't finished and it may be not merged to 2.x.

danielmitterdorfer · 2016-03-07T07:20:51Z

@Dieken You're right: http.max_content_length limits only individual requests so we need to tackle this differently.

Regarding my changes: Support on transport level is finished but we need to iron out some details (property names, default values).

Your patch looks fine for the bulk use case but I just want to raise your awareness of one (minor) thing though: You use LongAdder which is a JDK 8 class but Elasticsearch 2.x should be built with JDK 7.

bleskes · 2016-03-07T08:11:28Z

for people following - @danielmitterdorfer and I went through the code together and Daniel is working on another iteration.

With this commit we limit the size of all in-flight requests on transport level. The size is guarded by a circuit breaker and is based on the content size of each request. By default we use 100% of available heap meaning that the parent circuit breaker will limit the maximum available size. This value can be changed by adjusting the setting network.breaker.inflight_requests.limit Relates elastic#16011

With this commit we limit the size of all in-flight requests on HTTP level. The size is guarded by the same circuit breaker that is also used on transport level. Similarly, the size that is used is HTTP content length. Relates elastic#16011

With this commit we limit the size of all in-flight requests on transport level. The size is guarded by a circuit breaker and is based on the content size of each request. By default we use 100% of available heap meaning that the parent circuit breaker will limit the maximum available size. This value can be changed by adjusting the setting network.breaker.inflight_requests.limit Relates #16011

With this commit we limit the size of all in-flight requests on HTTP level. The size is guarded by the same circuit breaker that is also used on transport level. Similarly, the size that is used is HTTP content length. Relates #16011

danielmitterdorfer added >enhancement resiliency v2.3.0 labels Jan 15, 2016

danielmitterdorfer self-assigned this Jan 15, 2016

clintongormley mentioned this issue Jan 15, 2016

Add safeguards to prevent simple user errors #11511

Closed

24 tasks

clintongormley added the :Bulk label Jan 15, 2016

clintongormley mentioned this issue Jan 26, 2016

Warn users when their bulk requests are too large #12811

Closed

danielmitterdorfer changed the title ~~Limit the size of a bulk request~~ Limit request size Feb 29, 2016

danielmitterdorfer mentioned this issue Feb 29, 2016

ES client node may get OOM if the data node lags #16852

Closed

jsvd mentioned this issue Mar 7, 2016

Deprecate flush_size and use pipeline batch size as ES bulk request size logstash-plugins/logstash-output-elasticsearch#382

Closed

danielmitterdorfer mentioned this issue Mar 16, 2016

Limit request size #17133

Merged

clintongormley added v2.4.0 and removed v2.3.0 labels Mar 16, 2016

danielmitterdorfer closed this as completed in #17133 Apr 13, 2016

clintongormley removed the v2.4.0 label Apr 13, 2016

danielmitterdorfer removed their assignment Apr 14, 2016

lcawl added :Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. and removed :Bulk labels Feb 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit request size #16011

Limit request size #16011

danielmitterdorfer commented Jan 15, 2016

dadoonet commented Jan 15, 2016

danielmitterdorfer commented Jan 15, 2016

dadoonet commented Jan 15, 2016

danielmitterdorfer commented Jan 15, 2016

danielmitterdorfer commented Jan 18, 2016

danielmitterdorfer commented Jan 19, 2016

danielmitterdorfer commented Jan 21, 2016

dakrone commented Feb 1, 2016

danielmitterdorfer commented Feb 2, 2016

danielmitterdorfer commented Feb 2, 2016

dakrone commented Feb 4, 2016

bleskes commented Feb 4, 2016

danielmitterdorfer commented Feb 9, 2016

danielmitterdorfer commented Feb 11, 2016

Dieken commented Mar 5, 2016

danielmitterdorfer commented Mar 7, 2016

bleskes commented Mar 7, 2016

Limit request size #16011

Limit request size #16011

Comments

danielmitterdorfer commented Jan 15, 2016

dadoonet commented Jan 15, 2016

danielmitterdorfer commented Jan 15, 2016

dadoonet commented Jan 15, 2016

danielmitterdorfer commented Jan 15, 2016

danielmitterdorfer commented Jan 18, 2016

Design Parameters

Type of endpoint

Applicability of the limit

When to check the limit

Request size calculation

Proposal

danielmitterdorfer commented Jan 19, 2016

danielmitterdorfer commented Jan 21, 2016

Scenario 1

Scenario 2

dakrone commented Feb 1, 2016

danielmitterdorfer commented Feb 2, 2016

danielmitterdorfer commented Feb 2, 2016

dakrone commented Feb 4, 2016

bleskes commented Feb 4, 2016

danielmitterdorfer commented Feb 9, 2016

danielmitterdorfer commented Feb 11, 2016

Dieken commented Mar 5, 2016

danielmitterdorfer commented Mar 7, 2016

bleskes commented Mar 7, 2016