Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elasticsearch cluster information currently unavailable (too slow) #953

Closed
razvanphp opened this issue Feb 6, 2015 · 3 comments
Closed
Assignees
Labels
Milestone

Comments

@razvanphp
Copy link
Contributor

Hi,

Some days, the system page of our graylog2 (v0.92.3) shows a red message for the ES cluster status saying Cluster information currently unavailable. The sidebar link for detailed indices (/system/indices) always returns a nginx timeout.

I tried some debugging and here is the API response times from GL and ES:

rgrigore@glog-production-master1:~$ time curl -XGET 'http://glog-production-es1:9200/_cluster/health?pretty=true'
{
  "cluster_name" : "glog-production-es",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 5,
  "number_of_data_nodes" : 4,
  "active_primary_shards" : 204,
  "active_shards" : 420,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0
}

real    0m0.023s
user    0m0.012s
sys     0m0.004s
rgrigore@glog-production-master1:~$ time curl http://user:pass@127.0.0.1:12900/system/indexer/cluster/health?pretty=true
{
  "shards" : {
    "initializing" : 0,
    "unassigned" : 0,
    "active" : 384,
    "relocating" : 0
  },
  "status" : "green"
}
real    1m1.248s
user    0m0.012s
sys     0m0.004s
rgrigore@glog-production-master1:~$

For some reason graylog takes to much time to get this information. Hot threads during this time does not show any big load on graylog (only 3%), and some ES nodes are doing "Lucene Merge Thread" tasks. It seems to me that ES still responds imediately, so graylog must be the problem here.

I can provide more debugging messages if necessary.

Thank you!

@edmundoa
Copy link
Contributor

edmundoa commented Feb 6, 2015

Hi,

Thank you for reporting this.

Could you please include some logs from the Graylog server when the issue appears? That may help us to better understand the reason for it.
The nginx timeout you see when the cluster information is not available and you access /system/indices is most likely the issue we fixed in graylog-labs/graylog2-web-interface#1070.

@razvanphp
Copy link
Contributor Author

Thank you for your response.
In the server logs I can't see anything interesting, just a bunch of failed parsed messages:

2015-02-06T14:32:30.144+01:00 ERROR [GelfCodec] Could not parse JSON!
com.fasterxml.jackson.core.JsonParseException: Unexpected character ('<' (code 60)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')

The issue seems to appear in some days, usually is gets back to normal when the index is deflected. We have 30 indices with 20 mil entries each.

@henrikjohansen
Copy link

We are seeing similar behaviour in the latest RC - the Elasticsearch cluster information fluctuates between "Cluster information currently unavailable" and " Elasticsearch cluster is green" for no apparent reason.

@kroepke kroepke added the bug label Feb 11, 2015
@kroepke kroepke self-assigned this Feb 11, 2015
@kroepke kroepke added this to the 1.0.0 milestone Feb 11, 2015
kroepke added a commit that referenced this issue Feb 11, 2015
… out

it makes no sense to offer individual data about the health because those values are fetched together.
remove the methods that can lead to programming errors and inline them

issue #953
dennisoelkers pushed a commit that referenced this issue Aug 25, 2023
Updated GreyNoiseCommunityIpLookupAdapter to exclude 'message' field,…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants