Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search query fails with large number of indices #4054

Closed
joschi opened this issue Aug 3, 2017 · 5 comments
Closed

Search query fails with large number of indices #4054

joschi opened this issue Aug 3, 2017 · 5 comments
Assignees
Milestone

Comments

@joschi
Copy link
Contributor

@joschi joschi commented Aug 3, 2017

Graylog (or rather Jest) is sending HTTP requests with a large initial line (URI path and query string) to the Elasticsearch HTTP API if a large number of indices is included in the search query (e. g. when searching in "All messages").

Related topic: https://community.graylog.org/t/unable-to-search-in-all-messages/1922

Expected Behavior

Search queries covering a lot of indices should work.

Current Behavior

Search queries covering a lot of indices fail with an internal server error (HTTP 500) and produce an error message in the Elasticsearch logs:

[WARN ][http.netty               ] [ElasticsearchNodeName] Caught exception while handling client http traffic, closing connection [id: 0xecc07e39, /10.1.2.3:54321 => /10.1.2.3:9200]
org.jboss.netty.handler.codec.frame.TooLongFrameException: An HTTP line is larger than 4096 bytes.

Possible Solution

Patch Jest to send index names in the POST body.

Steps to Reproduce (for bugs)

  1. Create lots of indices (so that the list of index names is longer than 4 KB)
  2. Run search query covering all indices
  3. ???
  4. Profit!

Your Environment

  • Graylog Version: 2.3.0
  • Elasticsearch Version: 2.x, 5.x
@joschi joschi added this to the 2.3.1 milestone Aug 3, 2017
@joschi joschi self-assigned this Aug 3, 2017
@bernd
Copy link
Member

@bernd bernd commented Aug 3, 2017

I think we should check the length of the URL and use _all instead of the list of indices if it's too long.

The only drawback which comes to mind is that we might touch more indices than needed. This might be an issue with older Elasticsearch versions because of fielddata loading, but shouldn't be an issue with Elasticsearch 5. (AFAIK)

Increasing an Elasticsearch setting, which requires an Elasticsearch restart, as a workaround until we have a proper fix is okay, but this is not really a solution in my opinion. Since this is HTTP, there might be proxies in between Graylog and ES which might not support this limit. (mentioned by @kroepke)

@joschi
Copy link
Contributor Author

@joschi joschi commented Aug 3, 2017

Patch Jest to send index names in the POST body.

Elasticsearch doesn't support providing index names in the HTTP request body when using HTTP POST requests via Request Body Search.

@hc4
Copy link
Contributor

@hc4 hc4 commented Aug 3, 2017

Actualy you could split such request in multiple requests with shorter indices list (internally elastic will do one request for each index anyway)

@joschi
Copy link
Contributor Author

@joschi joschi commented Aug 3, 2017

@hc4 True, but we'd like to avoid the overhead of multiple HTTP requests.

@hc4
Copy link
Contributor

@hc4 hc4 commented Aug 3, 2017

I have 2 more ideas, but they both hacky :)

  1. Create temporary index alias and search against it
  2. Compact index list using wildcards: Assume we want to search in indices from 19 to 31, then index list could be compacted to: index_19,index_2*,index_30,index_31
@ghost ghost added the in progress label Aug 7, 2017
joschi pushed a commit that referenced this issue Aug 7, 2017
Jochen Schalanda
…termineAffectedIndices()

If the number of indices determined in `Searches#determineAffectedIndices()` is larger than
`Searches.MAX_INDICES_PER_QUERY`, return the covering write index aliases of the respective
index sets instead of the individual index names.

Fixes #4054
@ghost ghost assigned bernd Aug 9, 2017
@bernd bernd closed this in #4062 Aug 11, 2017
bernd added a commit that referenced this issue Aug 11, 2017
… is too long (#4062)

* Return write index alias instead of individual indices in Searches#determineAffectedIndices()

If the number of indices determined in `Searches#determineAffectedIndices()` is larger than
`Searches.MAX_INDICES_PER_QUERY`, return the covering write index aliases of the respective
index sets instead of the individual index names.

Fixes #4054

* Use index wildcard instead of write index alias (duh!)

* Use correct indices in Searches#search(SearchesConfig)

* Reduce number of indices in Searches#fieldStats() if necessary
@ghost ghost removed the in progress label Aug 11, 2017
joschi added a commit that referenced this issue Aug 11, 2017
… is too long (#4062)

* Return write index alias instead of individual indices in Searches#determineAffectedIndices()

If the number of indices determined in `Searches#determineAffectedIndices()` is larger than
`Searches.MAX_INDICES_PER_QUERY`, return the covering write index aliases of the respective
index sets instead of the individual index names.

Fixes #4054

* Use index wildcard instead of write index alias (duh!)

* Use correct indices in Searches#search(SearchesConfig)

* Reduce number of indices in Searches#fieldStats() if necessary

(cherry picked from commit 3dc4ca1)
bernd added a commit that referenced this issue Aug 11, 2017
… is too long (#4062) (#4078)

* Return write index alias instead of individual indices in Searches#determineAffectedIndices()

If the number of indices determined in `Searches#determineAffectedIndices()` is larger than
`Searches.MAX_INDICES_PER_QUERY`, return the covering write index aliases of the respective
index sets instead of the individual index names.

Fixes #4054

* Use index wildcard instead of write index alias (duh!)

* Use correct indices in Searches#search(SearchesConfig)

* Reduce number of indices in Searches#fieldStats() if necessary

(cherry picked from commit 3dc4ca1)
joschi pushed a commit that referenced this issue Sep 14, 2017
Jochen Schalanda
Instead of artificially restricting the number of indices in a search request and
resorting to using index wildcards if the list of indices threatens to hit the default
HTTP header size limit of Elasticsearch (4 KB), the `Searches` class is now using the
Multi Search API of Elasticsearch which allows specifying the list of indices in the
request body instead of the URI path.

Refs https://www.elastic.co/guide/en/elasticsearch/reference/5.6/search-multi-search.html
Refs elastic/elasticsearch#26360
Refs #4054
joschi pushed a commit that referenced this issue Sep 19, 2017
Jochen Schalanda
Instead of artificially restricting the number of indices in a search request and
resorting to using index wildcards if the list of indices threatens to hit the default
HTTP header size limit of Elasticsearch (4 KB), the `Searches` class is now using the
Multi Search API of Elasticsearch which allows specifying the list of indices in the
request body instead of the URI path.

Refs https://www.elastic.co/guide/en/elasticsearch/reference/5.6/search-multi-search.html
Refs elastic/elasticsearch#26360
Refs #4054
bernd added a commit that referenced this issue Sep 20, 2017
…4149)

Instead of artificially restricting the number of indices in a search request and
resorting to using index wildcards if the list of indices threatens to hit the default
HTTP header size limit of Elasticsearch (4 KB), the `Searches` class is now using the
Multi Search API of Elasticsearch which allows specifying the list of indices in the
request body instead of the URI path.

Refs https://www.elastic.co/guide/en/elasticsearch/reference/5.6/search-multi-search.html
Refs elastic/elasticsearch#26360
Refs #4054
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants