New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elasticsearch cluster information currently unavailable (too slow) #953

Closed
razvanphp opened this Issue Feb 6, 2015 · 3 comments

Comments

Projects
None yet
4 participants
@razvanphp
Contributor

razvanphp commented Feb 6, 2015

Hi,

Some days, the system page of our graylog2 (v0.92.3) shows a red message for the ES cluster status saying Cluster information currently unavailable. The sidebar link for detailed indices (/system/indices) always returns a nginx timeout.

I tried some debugging and here is the API response times from GL and ES:

rgrigore@glog-production-master1:~$ time curl -XGET 'http://glog-production-es1:9200/_cluster/health?pretty=true'
{
  "cluster_name" : "glog-production-es",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 5,
  "number_of_data_nodes" : 4,
  "active_primary_shards" : 204,
  "active_shards" : 420,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0
}

real    0m0.023s
user    0m0.012s
sys     0m0.004s
rgrigore@glog-production-master1:~$ time curl http://user:pass@127.0.0.1:12900/system/indexer/cluster/health?pretty=true
{
  "shards" : {
    "initializing" : 0,
    "unassigned" : 0,
    "active" : 384,
    "relocating" : 0
  },
  "status" : "green"
}
real    1m1.248s
user    0m0.012s
sys     0m0.004s
rgrigore@glog-production-master1:~$

For some reason graylog takes to much time to get this information. Hot threads during this time does not show any big load on graylog (only 3%), and some ES nodes are doing "Lucene Merge Thread" tasks. It seems to me that ES still responds imediately, so graylog must be the problem here.

I can provide more debugging messages if necessary.

Thank you!

@edmundoa

This comment has been minimized.

Member

edmundoa commented Feb 6, 2015

Hi,

Thank you for reporting this.

Could you please include some logs from the Graylog server when the issue appears? That may help us to better understand the reason for it.
The nginx timeout you see when the cluster information is not available and you access /system/indices is most likely the issue we fixed in graylog-labs/graylog2-web-interface#1070.

@razvanphp

This comment has been minimized.

Contributor

razvanphp commented Feb 6, 2015

Thank you for your response.
In the server logs I can't see anything interesting, just a bunch of failed parsed messages:

2015-02-06T14:32:30.144+01:00 ERROR [GelfCodec] Could not parse JSON!
com.fasterxml.jackson.core.JsonParseException: Unexpected character ('<' (code 60)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')

The issue seems to appear in some days, usually is gets back to normal when the index is deflected. We have 30 indices with 20 mil entries each.

@henrikjohansen

This comment has been minimized.

henrikjohansen commented Feb 11, 2015

We are seeing similar behaviour in the latest RC - the Elasticsearch cluster information fluctuates between "Cluster information currently unavailable" and " Elasticsearch cluster is green" for no apparent reason.

@kroepke kroepke added the bug label Feb 11, 2015

@kroepke kroepke self-assigned this Feb 11, 2015

@kroepke kroepke added this to the 1.0.0 milestone Feb 11, 2015

kroepke added a commit that referenced this issue Feb 11, 2015

only do the cluster health request once per REST call to avoid timing…
… out

it makes no sense to offer individual data about the health because those values are fetched together.
remove the methods that can lead to programming errors and inline them

issue #953

@edmundoa edmundoa closed this in b56b5e6 Feb 12, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment