New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading recent message for extractor fails with warning about too many shards #4510

Closed
lennartkoopmann opened this Issue Jan 24, 2018 · 14 comments

Comments

Projects
None yet
7 participants
@lennartkoopmann
Member

lennartkoopmann commented Jan 24, 2018

screenshot from 2018-01-23 20-06-49

Loading a message to create an extractor fails with the following error:

Unable to perform search query Trying to query 2280 shards, which is over the limit of 1000. This limit exists because querying many shards at the same time can make the job of the coordinating node very CPU and/or memory intensive. It is usually a better idea to have a smaller number of larger shards. Update [action.search.shard_count.limit] to a greater value if you really want to query that many shards at the same time.

The URL looks like no time range is applied to the query. I think it should be bound to only search in the last 1 or 2 hours or so.

Your Environment

  • Graylog Version: 2.4.0
  • Elasticsearch Version: 5.6.4

@lennartkoopmann lennartkoopmann added the bug label Jan 24, 2018

@lennartkoopmann lennartkoopmann added this to the 2.4.2 milestone Jan 24, 2018

@joschi

This comment has been minimized.

Contributor

joschi commented Jan 24, 2018

The RecentMessageLoader should already be restricted to the last hour (as the help text mentions), also see #3367.

const promise = UniversalSearchStore.search('relative', `gl2_source_input:${inputId} OR gl2_source_radio_input:${inputId}`,
{ range: 3600 }, undefined, 1, undefined, undefined, undefined, false);

@lennartkoopmann Are the index ranges in your setup up-to-date and do the indices in your setup contain messages which are older than 1 hour?

@dennisoelkers dennisoelkers self-assigned this Jan 24, 2018

@bernd bernd modified the milestones: 2.4.2, 2.4.3 Jan 24, 2018

dennisoelkers added a commit that referenced this issue Jan 24, 2018

Fixing relative time range used by RecentMessageLoader.
The RecentMessageLoader component is trying to limit the range of the
search request it used to get the last received message of an input to
the last hour. For this, it uses the `range` parameter. This is the
correct parameter name for the backend, but the web interface uses a
small conversion helper that expects the parameter to be named
`relative` for relative time ranges, according to the naming in the old
web interface.

To make the fix as small as possible for a point release, this change
just uses the `relative` parameter. For `master` and subsequent releases
we should either switch over to the `range` parameter or use it in case
the `relative` parameter is not present. This way we do not discourage
consumers of the `UniversalSearchStore` to use the actually correct
parameter naming.

Fixes #4510.

@wafflebot wafflebot bot added the in progress label Jan 24, 2018

dennisoelkers added a commit that referenced this issue Jan 24, 2018

Fixing relative time range used by RecentMessageLoader.
The RecentMessageLoader component is trying to limit the range of the
search request it used to get the last received message of an input to
the last hour. For this, it uses the range parameter. This is the
correct parameter name for the backend, but the web interface uses a
small conversion helper that expects the parameter to be named
relative for relative time ranges, according to the naming in the old
web interface.

This fix is changing the conversion helper to actually also take the
`range` parameter into account and use it if present, otherwise it uses
the `relative` parameter.

Fixes #4510.

bernd added a commit that referenced this issue Jan 24, 2018

Fixing relative time range used by RecentMessageLoader. (#4513)
The RecentMessageLoader component is trying to limit the range of the
search request it used to get the last received message of an input to
the last hour. For this, it uses the `range` parameter. This is the
correct parameter name for the backend, but the web interface uses a
small conversion helper that expects the parameter to be named
`relative` for relative time ranges, according to the naming in the old
web interface.

To make the fix as small as possible for a point release, this change
just uses the `relative` parameter. For `master` and subsequent releases
we should either switch over to the `range` parameter or use it in case
the `relative` parameter is not present. This way we do not discourage
consumers of the `UniversalSearchStore` to use the actually correct
parameter naming.

Fixes #4510.
@bernd

This comment has been minimized.

Member

bernd commented Jan 24, 2018

Fixed in 2.4 via #4514. The fix for master is not done yet so I am leaving the issue open but move it into the 3.0 milestone.

@bernd bernd modified the milestones: 2.4.3, 3.0.0 Jan 24, 2018

@kmerz kmerz closed this in #4514 Jan 25, 2018

kmerz added a commit that referenced this issue Jan 25, 2018

Fixing relative time range used by RecentMessageLoader. (#4514)
The RecentMessageLoader component is trying to limit the range of the
search request it used to get the last received message of an input to
the last hour. For this, it uses the range parameter. This is the
correct parameter name for the backend, but the web interface uses a
small conversion helper that expects the parameter to be named
relative for relative time ranges, according to the naming in the old
web interface.

This fix is changing the conversion helper to actually also take the
`range` parameter into account and use it if present, otherwise it uses
the `relative` parameter.

Fixes #4510.

@wafflebot wafflebot bot removed the in progress label Jan 25, 2018

@TheJCamping

This comment has been minimized.

TheJCamping commented Jan 25, 2018

I noticed that in 2.4.3 editing GROK extractors, and show recent messages for inputs don't have a range limit on queries so my elasticsearch cluster chokes on trying to look up the messages. Can those searches also be modified to limit the amount of data being requested from ES?

@edmundoa

This comment has been minimized.

Member

edmundoa commented Jan 26, 2018

@TheJCamping we would really appreciate if you could provide some more details about the issue you are facing, namely:

  1. In which page and part of the application exactly are you seeing the issue? If it's hard to describe you can upload a screenshot.
  2. Are there any errors in your browser's developer console and/or server logs when the issue occurs?

Thank you in advance.

@TheJCamping

This comment has been minimized.

TheJCamping commented Jan 26, 2018

@edmundoa

On the /system/inputs page selecting "show recent messages" for an input will result in a page that never loads or an error message:
Error Message: Unable to perform search query Details: Search status code: 500 Search response: cannot GET http://192.168.13.37:12900/search/universal/relative?query=gl2_source_input%3A54863813a78e39c792b058d1&range=0&limit=150&sort=timestamp%3Adesc (500)

In the developer tools for chrome. the network page shows a request to this address: http://graylog-01:12900/search/universal/relative?query=gl2_source_input%3A54863813a78e39c792b058d1&range=0&limit=150&sort=timestamp%3Adesc times out. If I change the range to 3000, I can get that request to load.

A similar thing happens when I go to edit an extractor and it tries to load an example message. It goes to a spinning wheel that says "Loading..." . Example of what shows in the network tab:
2018-01-26 10_27_43-2018-01-26 10_25_52-graylog2

In both instances, trying to search the entire elasticsearch cluster becomes bogged down trying to search without a range limit. I know this is partially a lack of resources for the elasticsearch cluster. I have recent (30 days) indexes on fast storage and older ones on slower storage.

Cluster Info:
101 indices with ~5 billion messages using 9.6tb on Elasticsearch 5.6.6 and Graylog 2.4.3

@edmundoa

This comment has been minimized.

Member

edmundoa commented Jan 26, 2018

@TheJCamping Thank you for the detailed report!

The first issue you mentioned is only partially related to this one, so I opened #4533 and I kindly ask you to continue the conversation in there.

The second issue your previous comment seems to be the same one reported in here, so I will reopen the issue so that we can take a look again.

First of all, could you please try editing the extractor in another browser? I know it sounds silly, but I just want to see if for some reason your web interface is running some cached version of the code.

@edmundoa edmundoa reopened this Jan 26, 2018

@TheJCamping

This comment has been minimized.

TheJCamping commented Jan 26, 2018

@edmundoa

I just tried it in Firefox as well as Chrome on another system. This also happening with all extractors on all inputs.

Thank you

@jalogisch jalogisch added the triaged label Jan 29, 2018

@dennisoelkers

This comment has been minimized.

Member

dennisoelkers commented Jan 29, 2018

@TheJCamping We fixed this issue (loading a message before creating an extractor) for 2.4.3. Did you restart your server/reload your web interface before trying again? I have just verified that it works reliably in 2.4.3, so I am closing this issue for now. If you are sure that your server and web interface are up to date (including a full page refresh of the web interface), then reopen the issue please.

@TheJCamping

This comment has been minimized.

TheJCamping commented Jan 29, 2018

@dennisoelkers

I am sure that the server and web interface are up to date, I have restart the server again just to be sure. All of my nodes are at 2.4.3. I have tried on 3 separate systems with 2 different browsers and incognito mode as well.

The error message I posted earlier I think shows that RecentMessageLoader is using relative, not range which is the change that was made here right? #4513

Please let me know if there is any other info I can provide.

@edmundoa

This comment has been minimized.

Member

edmundoa commented Jan 29, 2018

@TheJCamping that's not the change exactly. The URL in the information you provided goes to the right endpoint (it should use /search/universal/relative, but it should also contain a range=3600 query parameter, which is missing in your case, and which caused the initial issue as well.

When you updated your Graylog setup, did you also update all plugins bundled with it? It would be really helpful if you could share the exact version of each plugin in your system (an ls in the plugins directory should be enough for that).

Thank you in advance!

@edmundoa edmundoa reopened this Jan 29, 2018

@TheJCamping

This comment has been minimized.

TheJCamping commented Jan 29, 2018

@edmundoa

Thank you for the explanation.

I am using the yum packages to update Graylog.

Here is the result from the ls:
graylog-plugin-beats-2.4.3.jar
graylog-plugin-collector-2.4.3.jar
graylog-plugin-map-widget-2.4.3.jar
graylog-plugin-pipeline-processor-2.4.3.jar
graylog-plugin-aws-2.4.3.jar
graylog-plugin-cef-2.4.3.jar
graylog-plugin-enterprise-integration-2.4.3.jar
graylog-plugin-netflow-2.4.3.jar
graylog-plugin-threatintel-2.4.3.jar

@edmundoa

This comment has been minimized.

Member

edmundoa commented Jan 31, 2018

@TheJCamping

To be sure I'm executing the same code as you do, I downloaded the OVA image for Graylog 2.4.3 and I could not reproduce the issue. I have tried in a couple of browsers and when I load a recent message from an input in the extractors page, the URL to ask for the message looks right:

Request URL: http://graylog:9000/api/search/universal/relative?query=gl2_source_input%3A5a719c11d73f9505ef1832d6%20OR%20gl2_source_radio_input%3A5a719c11d73f9505ef1832d6&range=3600&limit=1&decorate=false

As you can see it contains the range=3600 query parameter which will limit the search to the last hour.

Could you please provide some more details about the setups where you find this issue? I'm specially interested in the number of nodes and also if there is any load balancer or proxy in between.

@TheJCamping

This comment has been minimized.

TheJCamping commented Feb 1, 2018

@edmundoa

I just download the OVA onto my system to verify. I created a new input TCP Syslog and started sending traffic to it. I created a GROK extractor and then went to edit the extractor. It worked since the Elasticsearch cluster only has a few thousand messages, but the requesting URL was this:

'http://192.168.15.46:9000/api/search/universal/relative?query=gl2_source_input%3A5a7285c3e7a1d606088725f7%20OR%20gl2_source_radio_input%3A5a7285c3e7a1d606088725f7&limit=1'

Still no range in the request.

Were you editing a GROK extractor?

To answer your previous question, I have 3 nodes in my Graylog cluster and no load balancer or proxy in front of the web interfaces.

@edmundoa

This comment has been minimized.

Member

edmundoa commented Feb 1, 2018

@TheJCamping That makes it clear the issue. I could reproduce it now, and it's related to this one but slightly different. This is why we didn't see it until now. I opened another issue for it to avoid confusion, feel free to add any comments in there: #4553. I'll also close this issue now.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment