Warning: "Message not fully read (request)" @ transport.netty #5178

bluesmanshoes · 2014-02-19T12:59:36Z

Hi there,
Im running an elasticsearch cluster with 2 nodes containing about 6 million events. Today both nodes started to throw the following warning:

[2014-02-19 13:45:07,633][WARN ][transport.netty ] [Node1-TI] Message not fully read (request) for [1920] and action [], resetting
[2014-02-19 13:45:07,633][WARN ][transport.netty ] [Node2-TI] Message not fully read (request) for [1921] and action [], resetting

{
"cluster_name" : "Cluster-TI",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 2,
"active_primary_shards" : 20,
"active_shards" : 40,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}

elasticsearch version: 1.0
java version: jdk1.7.0_51 / JRE 1.7.0_51-b13 on both nodes
OS: RHEL6.5 x64
config: default, except unicast between nodes, loglevel=debug and heap size configuration
ES_HEAP_SIZE="8g"
JAVA_OPTS="-server -d64 $JAVA_OPTS"

Any suggestions what produces the mentioned warning message? Apart from that, cluster is running fine.

spinscale · 2014-02-20T08:09:31Z

Hey,

did you see any exceptions in the logs or only these messages? Anything more which might help? Did any network outage happen or was there an exception (like a OutOfMemoryException) happening anytime before that event?

Can you paste the output of curl 'localhost:9200/_nodes/jvm?pretty' please?

For testing, you could also remove the JAVA_OPTS, to make sure they dont have any effect (those settings might be redundant anyway, see http://docs.oracle.com/javase/7/docs/technotes/guides/vm/server-class.html)

bluesmanshoes · 2014-02-20T17:13:11Z

Hi, and thanks for answering. I will post an extract of the log file later. Here is the output from the curl command:

{
  "cluster_name" : "Cluster-TI",
  "nodes" : {
    "Sgfj_-XCSbG6jaSGIQiH2A" : {
      "name" : "Node2-TI",
      "transport_address" : "inet[/xx.xx.xx.36:9300]",
      "host" : "xxxxxxxxxx",
      "ip" : "xx.xx.xx.36",
      "version" : "1.0.0",
      "build" : "a46900e",
      "http_address" : "inet[/xx.xx.xx.36:9200]",
      "attributes" : {
        "master" : "true"
      },
      "jvm" : {
        "pid" : 21491,
        "version" : "1.7.0_51",
        "vm_name" : "Java HotSpot(TM) 64-Bit Server VM",
        "vm_version" : "24.51-b03",
        "vm_vendor" : "Oracle Corporation",
        "start_time" : 1392901349365,
        "mem" : {
          "heap_init_in_bytes" : 8589934592,
          "heap_max_in_bytes" : 8555069440,
          "non_heap_init_in_bytes" : 24313856,
          "non_heap_max_in_bytes" : 136314880,
          "direct_max_in_bytes" : 8555069440
        },
        "gc_collectors" : [ "ParNew", "ConcurrentMarkSweep" ],
        "memory_pools" : [ "Code Cache", "Par Eden Space", "Par Survivor Space",                                                                                         "CMS Old Gen", "CMS Perm Gen" ]
      }
    },
    "KNvD27sBQFOGvvS3nZ_FTA" : {
      "name" : "Node1-TI",
      "transport_address" : "inet[/xx.xx.xx.35:9300]",
      "host" : "xxxxxxxxxxx",
      "ip" : "xx.xx.xx.35",
      "version" : "1.0.0",
      "build" : "a46900e",
      "http_address" : "inet[/xx.xx.xx.35:9200]",
      "attributes" : {
        "master" : "true"
      },
      "jvm" : {
        "pid" : 22672,
        "version" : "1.7.0_51",
        "vm_name" : "Java HotSpot(TM) 64-Bit Server VM",
        "vm_version" : "24.51-b03",
        "vm_vendor" : "Oracle Corporation",
        "start_time" : 1392901447100,
        "mem" : {
          "heap_init_in_bytes" : 8589934592,
          "heap_max_in_bytes" : 8555069440,
          "non_heap_init_in_bytes" : 24313856,
          "non_heap_max_in_bytes" : 136314880,
          "direct_max_in_bytes" : 8555069440
        },
        "gc_collectors" : [ "ParNew", "ConcurrentMarkSweep" ],
        "memory_pools" : [ "Code Cache", "Par Eden Space", "Par Survivor Space",                                                                                         "CMS Old Gen", "CMS Perm Gen" ]
      }
    }
  }
}

bluesmanshoes · 2014-02-20T17:19:41Z

The first Time it starts after an gc (see below), stopped after restarting the process, while gc where ongoing.

[2014-02-17 08:13:15,203][WARN ][monitor.jvm              ] [SPortalCN1-TI] [gc][young][64545][79] duration [1.2s], collections [1]/[1.9s], total [1.2s]/[1.1m], memory [2.9gb]->[3gb]/[7.9gb], all_pools {[young] [166mb]->[6.7mb]/[266.2mb]}{[survivor] [33.2mb]->[33.2mb]/[33.2mb]}{[old] [2.8gb]->[3gb]/[7.6gb]}

Second time it starts after the following parse failure, but restart didn't solved the problem this time.

org.elasticsearch.search.SearchParseException: [events][1]: from[90],size[10]: Parse Failure [Failed to parse source [{"from":90,"size":10,"sort":[{"lastResponse":{"order":"desc"}}]}]]
        at org.elasticsearch.search.SearchService.parseSource(SearchService.java:586)
        at org.elasticsearch.search.SearchService.createContext(SearchService.java:489)
        at org.elasticsearch.search.SearchService.createContext(SearchService.java:474)
        at org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:467)
        at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:239)
        at org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:202)
        at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:80)
        at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216)
        at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$4.run(TransportSearchTypeAction.java:292)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
Caused by: org.elasticsearch.search.SearchParseException: [events][1]: from[90],size[10]: Parse Failure [No mapping found for [lastResponse] in order to sort on]
        at org.elasticsearch.search.sort.SortParseElement.addSortField(SortParseElement.java:198)
        at org.elasticsearch.search.sort.SortParseElement.addCompoundSortField(SortParseElement.java:172)
        at org.elasticsearch.search.sort.SortParseElement.parse(SortParseElement.java:80)
        at org.elasticsearch.search.SearchService.parseSource(SearchService.java:574)
        ... 11 more

bluesmanshoes · 2014-02-24T10:57:15Z

The warnings now stopped for no particular reason (no restart, nothing)...

spinscale · 2014-02-24T11:02:10Z

do you have rolling indices and queries against those (so that one specific index is not queried anymore)?

meconlin · 2014-03-10T21:37:25Z

I just updated my cluster to 1.0.0 and am having this same issue, message comes up every 5 seconds.

2014-03-10 17:35:57,894[WARN ][transport.netty          ] [vel-hfs-1-1] Message not fully read (request) for [31614711] and action [], resetting

All of my nodes were upgraded to 1.0.0.
All JVMs are running on Java 1.7.

Any thoughts?

clintongormley · 2014-03-11T10:05:38Z

Please post the output of:

curl -XGET "http://localhost:9200/_nodes/jvm?pretty"

meconlin · 2014-03-11T12:44:22Z

sure : GIST HERE

You will notice all 54 nodes are using 1.7.0_09 and all are 1.0.0.
More than 12 hours later and one restart and still getting netty WARNs in the long. Somebody is chatting some nonsense on this cluster!

clintongormley · 2014-03-11T13:44:28Z

Do you have any Java transport clients which are querying the cluster? Are they the same version?

Could you post some of the lines from the log file with a bit more context?

ta

meconlin · 2014-03-11T14:48:20Z

@clintongormley, Thanks, you were right! Found it with a little tcpdump, had an old rexster process on another dev box hammering away on my cluster.

clintongormley · 2014-03-11T15:07:10Z

Splendid

rahst12 · 2014-05-15T00:55:40Z

I'm getting exactly the same warning out. I don't have tcpdump installed on my cluster.. Any suggestions for figuring this?

rahst12 · 2014-05-15T01:02:00Z

Ha.. Well no help to anyone else, but I found a rogue server in the cluster, so I turned it off. By rogue it was decommissioned a couple elasticsearch upgrades ago, and by mistake got rebooted, and thus the service started by default and it kept trying to join.

bluesmanshoes closed this as completed Feb 20, 2014

bluesmanshoes reopened this Feb 20, 2014

clintongormley closed this as completed Mar 11, 2014

marcelog mentioned this issue Aug 18, 2014

Aggregations: DateHistogram with negative 'pre_offset' or 'post_offset' value ends with "Message not fully read (response) for" #7312

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Warning: "Message not fully read (request)" @ transport.netty #5178

Warning: "Message not fully read (request)" @ transport.netty #5178

bluesmanshoes commented Feb 19, 2014

spinscale commented Feb 20, 2014

bluesmanshoes commented Feb 20, 2014

bluesmanshoes commented Feb 20, 2014

bluesmanshoes commented Feb 24, 2014

spinscale commented Feb 24, 2014

meconlin commented Mar 10, 2014

clintongormley commented Mar 11, 2014

meconlin commented Mar 11, 2014

clintongormley commented Mar 11, 2014

meconlin commented Mar 11, 2014

clintongormley commented Mar 11, 2014

rahst12 commented May 15, 2014

rahst12 commented May 15, 2014

Warning: "Message not fully read (request)" @ transport.netty #5178

Warning: "Message not fully read (request)" @ transport.netty #5178

Comments

bluesmanshoes commented Feb 19, 2014

spinscale commented Feb 20, 2014

bluesmanshoes commented Feb 20, 2014

bluesmanshoes commented Feb 20, 2014

bluesmanshoes commented Feb 24, 2014

spinscale commented Feb 24, 2014

meconlin commented Mar 10, 2014

clintongormley commented Mar 11, 2014

meconlin commented Mar 11, 2014

clintongormley commented Mar 11, 2014

meconlin commented Mar 11, 2014

clintongormley commented Mar 11, 2014

rahst12 commented May 15, 2014

rahst12 commented May 15, 2014