New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search query fails with large number of indices #4054

Closed
joschi opened this Issue Aug 3, 2017 · 5 comments

Comments

Projects
None yet
3 participants
@joschi
Contributor

joschi commented Aug 3, 2017

Graylog (or rather Jest) is sending HTTP requests with a large initial line (URI path and query string) to the Elasticsearch HTTP API if a large number of indices is included in the search query (e. g. when searching in "All messages").

Related topic: https://community.graylog.org/t/unable-to-search-in-all-messages/1922

Expected Behavior

Search queries covering a lot of indices should work.

Current Behavior

Search queries covering a lot of indices fail with an internal server error (HTTP 500) and produce an error message in the Elasticsearch logs:

[WARN ][http.netty               ] [ElasticsearchNodeName] Caught exception while handling client http traffic, closing connection [id: 0xecc07e39, /10.1.2.3:54321 => /10.1.2.3:9200]
org.jboss.netty.handler.codec.frame.TooLongFrameException: An HTTP line is larger than 4096 bytes.

Possible Solution

Patch Jest to send index names in the POST body.

Steps to Reproduce (for bugs)

  1. Create lots of indices (so that the list of index names is longer than 4 KB)
  2. Run search query covering all indices
  3. ???
  4. Profit!

Your Environment

  • Graylog Version: 2.3.0
  • Elasticsearch Version: 2.x, 5.x

@joschi joschi added this to the 2.3.1 milestone Aug 3, 2017

@joschi joschi self-assigned this Aug 3, 2017

@bernd

This comment has been minimized.

Member

bernd commented Aug 3, 2017

I think we should check the length of the URL and use _all instead of the list of indices if it's too long.

The only drawback which comes to mind is that we might touch more indices than needed. This might be an issue with older Elasticsearch versions because of fielddata loading, but shouldn't be an issue with Elasticsearch 5. (AFAIK)

Increasing an Elasticsearch setting, which requires an Elasticsearch restart, as a workaround until we have a proper fix is okay, but this is not really a solution in my opinion. Since this is HTTP, there might be proxies in between Graylog and ES which might not support this limit. (mentioned by @kroepke)

@joschi

This comment has been minimized.

Contributor

joschi commented Aug 3, 2017

Patch Jest to send index names in the POST body.

Elasticsearch doesn't support providing index names in the HTTP request body when using HTTP POST requests via Request Body Search.

@hc4

This comment has been minimized.

Contributor

hc4 commented Aug 3, 2017

Actualy you could split such request in multiple requests with shorter indices list (internally elastic will do one request for each index anyway)

@joschi

This comment has been minimized.

Contributor

joschi commented Aug 3, 2017

@hc4 True, but we'd like to avoid the overhead of multiple HTTP requests.

@hc4

This comment has been minimized.

Contributor

hc4 commented Aug 3, 2017

I have 2 more ideas, but they both hacky :)

  1. Create temporary index alias and search against it
  2. Compact index list using wildcards: Assume we want to search in indices from 19 to 31, then index list could be compacted to: index_19,index_2*,index_30,index_31

@wafflebot wafflebot bot added the in progress label Aug 7, 2017

joschi added a commit that referenced this issue Aug 7, 2017

Return write index alias instead of individual indices in Searches#de…
…termineAffectedIndices()

If the number of indices determined in `Searches#determineAffectedIndices()` is larger than
`Searches.MAX_INDICES_PER_QUERY`, return the covering write index aliases of the respective
index sets instead of the individual index names.

Fixes #4054

@wafflebot wafflebot bot assigned bernd Aug 9, 2017

@bernd bernd closed this in #4062 Aug 11, 2017

bernd added a commit that referenced this issue Aug 11, 2017

Return index set wildcard instead of individual indices if index list…
… is too long (#4062)

* Return write index alias instead of individual indices in Searches#determineAffectedIndices()

If the number of indices determined in `Searches#determineAffectedIndices()` is larger than
`Searches.MAX_INDICES_PER_QUERY`, return the covering write index aliases of the respective
index sets instead of the individual index names.

Fixes #4054

* Use index wildcard instead of write index alias (duh!)

* Use correct indices in Searches#search(SearchesConfig)

* Reduce number of indices in Searches#fieldStats() if necessary

@wafflebot wafflebot bot removed the in progress label Aug 11, 2017

joschi added a commit that referenced this issue Aug 11, 2017

Return index set wildcard instead of individual indices if index list…
… is too long (#4062)

* Return write index alias instead of individual indices in Searches#determineAffectedIndices()

If the number of indices determined in `Searches#determineAffectedIndices()` is larger than
`Searches.MAX_INDICES_PER_QUERY`, return the covering write index aliases of the respective
index sets instead of the individual index names.

Fixes #4054

* Use index wildcard instead of write index alias (duh!)

* Use correct indices in Searches#search(SearchesConfig)

* Reduce number of indices in Searches#fieldStats() if necessary

(cherry picked from commit 3dc4ca1)

bernd added a commit that referenced this issue Aug 11, 2017

Return index set wildcard instead of individual indices if index list…
… is too long (#4062) (#4078)

* Return write index alias instead of individual indices in Searches#determineAffectedIndices()

If the number of indices determined in `Searches#determineAffectedIndices()` is larger than
`Searches.MAX_INDICES_PER_QUERY`, return the covering write index aliases of the respective
index sets instead of the individual index names.

Fixes #4054

* Use index wildcard instead of write index alias (duh!)

* Use correct indices in Searches#search(SearchesConfig)

* Reduce number of indices in Searches#fieldStats() if necessary

(cherry picked from commit 3dc4ca1)

joschi added a commit that referenced this issue Sep 14, 2017

Use Multi Search in Searches to prevent large HTTP request headers
Instead of artificially restricting the number of indices in a search request and
resorting to using index wildcards if the list of indices threatens to hit the default
HTTP header size limit of Elasticsearch (4 KB), the `Searches` class is now using the
Multi Search API of Elasticsearch which allows specifying the list of indices in the
request body instead of the URI path.

Refs https://www.elastic.co/guide/en/elasticsearch/reference/5.6/search-multi-search.html
Refs elastic/elasticsearch#26360
Refs #4054

joschi added a commit that referenced this issue Sep 19, 2017

Use Multi Search in Searches to prevent large HTTP request headers
Instead of artificially restricting the number of indices in a search request and
resorting to using index wildcards if the list of indices threatens to hit the default
HTTP header size limit of Elasticsearch (4 KB), the `Searches` class is now using the
Multi Search API of Elasticsearch which allows specifying the list of indices in the
request body instead of the URI path.

Refs https://www.elastic.co/guide/en/elasticsearch/reference/5.6/search-multi-search.html
Refs elastic/elasticsearch#26360
Refs #4054

bernd added a commit that referenced this issue Sep 20, 2017

Use Multi Search in Searches to prevent large HTTP request headers (#…
…4149)

Instead of artificially restricting the number of indices in a search request and
resorting to using index wildcards if the list of indices threatens to hit the default
HTTP header size limit of Elasticsearch (4 KB), the `Searches` class is now using the
Multi Search API of Elasticsearch which allows specifying the list of indices in the
request body instead of the URI path.

Refs https://www.elastic.co/guide/en/elasticsearch/reference/5.6/search-multi-search.html
Refs elastic/elasticsearch#26360
Refs #4054
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment