Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warning: "Message not fully read (request)" @ transport.netty #5178

Closed
bluesmanshoes opened this issue Feb 19, 2014 · 13 comments
Closed

Warning: "Message not fully read (request)" @ transport.netty #5178

bluesmanshoes opened this issue Feb 19, 2014 · 13 comments

Comments

@bluesmanshoes
Copy link

Hi there,
Im running an elasticsearch cluster with 2 nodes containing about 6 million events. Today both nodes started to throw the following warning:

[2014-02-19 13:45:07,633][WARN ][transport.netty ] [Node1-TI] Message not fully read (request) for [1920] and action [], resetting
[2014-02-19 13:45:07,633][WARN ][transport.netty ] [Node2-TI] Message not fully read (request) for [1921] and action [], resetting

{
"cluster_name" : "Cluster-TI",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 2,
"active_primary_shards" : 20,
"active_shards" : 40,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}

elasticsearch version: 1.0
java version: jdk1.7.0_51 / JRE 1.7.0_51-b13 on both nodes
OS: RHEL6.5 x64
config: default, except unicast between nodes, loglevel=debug and heap size configuration
ES_HEAP_SIZE="8g"
JAVA_OPTS="-server -d64 $JAVA_OPTS"

Any suggestions what produces the mentioned warning message? Apart from that, cluster is running fine.

@spinscale
Copy link
Contributor

Hey,

did you see any exceptions in the logs or only these messages? Anything more which might help? Did any network outage happen or was there an exception (like a OutOfMemoryException) happening anytime before that event?

Can you paste the output of curl 'localhost:9200/_nodes/jvm?pretty' please?

For testing, you could also remove the JAVA_OPTS, to make sure they dont have any effect (those settings might be redundant anyway, see http://docs.oracle.com/javase/7/docs/technotes/guides/vm/server-class.html)

@bluesmanshoes
Copy link
Author

Hi, and thanks for answering. I will post an extract of the log file later. Here is the output from the curl command:

{
  "cluster_name" : "Cluster-TI",
  "nodes" : {
    "Sgfj_-XCSbG6jaSGIQiH2A" : {
      "name" : "Node2-TI",
      "transport_address" : "inet[/xx.xx.xx.36:9300]",
      "host" : "xxxxxxxxxx",
      "ip" : "xx.xx.xx.36",
      "version" : "1.0.0",
      "build" : "a46900e",
      "http_address" : "inet[/xx.xx.xx.36:9200]",
      "attributes" : {
        "master" : "true"
      },
      "jvm" : {
        "pid" : 21491,
        "version" : "1.7.0_51",
        "vm_name" : "Java HotSpot(TM) 64-Bit Server VM",
        "vm_version" : "24.51-b03",
        "vm_vendor" : "Oracle Corporation",
        "start_time" : 1392901349365,
        "mem" : {
          "heap_init_in_bytes" : 8589934592,
          "heap_max_in_bytes" : 8555069440,
          "non_heap_init_in_bytes" : 24313856,
          "non_heap_max_in_bytes" : 136314880,
          "direct_max_in_bytes" : 8555069440
        },
        "gc_collectors" : [ "ParNew", "ConcurrentMarkSweep" ],
        "memory_pools" : [ "Code Cache", "Par Eden Space", "Par Survivor Space",                                                                                         "CMS Old Gen", "CMS Perm Gen" ]
      }
    },
    "KNvD27sBQFOGvvS3nZ_FTA" : {
      "name" : "Node1-TI",
      "transport_address" : "inet[/xx.xx.xx.35:9300]",
      "host" : "xxxxxxxxxxx",
      "ip" : "xx.xx.xx.35",
      "version" : "1.0.0",
      "build" : "a46900e",
      "http_address" : "inet[/xx.xx.xx.35:9200]",
      "attributes" : {
        "master" : "true"
      },
      "jvm" : {
        "pid" : 22672,
        "version" : "1.7.0_51",
        "vm_name" : "Java HotSpot(TM) 64-Bit Server VM",
        "vm_version" : "24.51-b03",
        "vm_vendor" : "Oracle Corporation",
        "start_time" : 1392901447100,
        "mem" : {
          "heap_init_in_bytes" : 8589934592,
          "heap_max_in_bytes" : 8555069440,
          "non_heap_init_in_bytes" : 24313856,
          "non_heap_max_in_bytes" : 136314880,
          "direct_max_in_bytes" : 8555069440
        },
        "gc_collectors" : [ "ParNew", "ConcurrentMarkSweep" ],
        "memory_pools" : [ "Code Cache", "Par Eden Space", "Par Survivor Space",                                                                                         "CMS Old Gen", "CMS Perm Gen" ]
      }
    }
  }
}

@bluesmanshoes
Copy link
Author

The first Time it starts after an gc (see below), stopped after restarting the process, while gc where ongoing.

[2014-02-17 08:13:15,203][WARN ][monitor.jvm              ] [SPortalCN1-TI] [gc][young][64545][79] duration [1.2s], collections [1]/[1.9s], total [1.2s]/[1.1m], memory [2.9gb]->[3gb]/[7.9gb], all_pools {[young] [166mb]->[6.7mb]/[266.2mb]}{[survivor] [33.2mb]->[33.2mb]/[33.2mb]}{[old] [2.8gb]->[3gb]/[7.6gb]}

Second time it starts after the following parse failure, but restart didn't solved the problem this time.

org.elasticsearch.search.SearchParseException: [events][1]: from[90],size[10]: Parse Failure [Failed to parse source [{"from":90,"size":10,"sort":[{"lastResponse":{"order":"desc"}}]}]]
        at org.elasticsearch.search.SearchService.parseSource(SearchService.java:586)
        at org.elasticsearch.search.SearchService.createContext(SearchService.java:489)
        at org.elasticsearch.search.SearchService.createContext(SearchService.java:474)
        at org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:467)
        at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:239)
        at org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:202)
        at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:80)
        at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216)
        at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$4.run(TransportSearchTypeAction.java:292)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
Caused by: org.elasticsearch.search.SearchParseException: [events][1]: from[90],size[10]: Parse Failure [No mapping found for [lastResponse] in order to sort on]
        at org.elasticsearch.search.sort.SortParseElement.addSortField(SortParseElement.java:198)
        at org.elasticsearch.search.sort.SortParseElement.addCompoundSortField(SortParseElement.java:172)
        at org.elasticsearch.search.sort.SortParseElement.parse(SortParseElement.java:80)
        at org.elasticsearch.search.SearchService.parseSource(SearchService.java:574)
        ... 11 more

@bluesmanshoes
Copy link
Author

The warnings now stopped for no particular reason (no restart, nothing)...

@spinscale
Copy link
Contributor

do you have rolling indices and queries against those (so that one specific index is not queried anymore)?

@meconlin
Copy link

I just updated my cluster to 1.0.0 and am having this same issue, message comes up every 5 seconds.

2014-03-10 17:35:57,894[WARN ][transport.netty          ] [vel-hfs-1-1] Message not fully read (request) for [31614711] and action [], resetting

All of my nodes were upgraded to 1.0.0.
All JVMs are running on Java 1.7.

Any thoughts?

@clintongormley
Copy link

Please post the output of:

curl -XGET "http://localhost:9200/_nodes/jvm?pretty"

@meconlin
Copy link

sure : GIST HERE

You will notice all 54 nodes are using 1.7.0_09 and all are 1.0.0.
More than 12 hours later and one restart and still getting netty WARNs in the long. Somebody is chatting some nonsense on this cluster!

@clintongormley
Copy link

Do you have any Java transport clients which are querying the cluster? Are they the same version?

Could you post some of the lines from the log file with a bit more context?

ta

@meconlin
Copy link

@clintongormley, Thanks, you were right! Found it with a little tcpdump, had an old rexster process on another dev box hammering away on my cluster.

@clintongormley
Copy link

Splendid

@rahst12
Copy link

rahst12 commented May 15, 2014

I'm getting exactly the same warning out. I don't have tcpdump installed on my cluster.. Any suggestions for figuring this?

@rahst12
Copy link

rahst12 commented May 15, 2014

Ha.. Well no help to anyone else, but I found a rogue server in the cluster, so I turned it off. By rogue it was decommissioned a couple elasticsearch upgrades ago, and by mistake got rebooted, and thus the service started by default and it kept trying to join.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants