NullPointerException in TransportShardBulkAction #4224

spinscale · 2013-11-21T09:27:11Z

This happened on elasticsearch 0.90.6 with JVM 1.7.0_25

Found this in logs, cannot tell what triggered this. The only thing I know is, there are lots of index/search operations going and there seems to be some cluster instability.

[2013-XX-YY 08:45:22,518][DEBUG][action.bulk              ] [myNode] [logstash-2013.XX.YY][3], node[vXE6ojncQUG-foPsMjVY_w], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.bulk.BulkShardRequest@28c7fa4f]
java.lang.NullPointerException
    at org.elasticsearch.action.bulk.TransportShardBulkAction.shards(TransportShardBulkAction.java:138)
    at org.elasticsearch.action.bulk.TransportShardBulkAction.shards(TransportShardBulkAction.java:75)
    at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performReplicas(TransportShardReplicationOperationAction.java:610)
    at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:557)
    at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:426)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)

The text was updated successfully, but these errors were encountered:

avleen · 2013-11-26T08:29:04Z

This just bit us tonight, too, somewhat out of the blue.

Immediately before this I see:

[2013-11-26 02:41:52,659][INFO ][discovery.zen            ] [myDataNode] master_left [[logstash01.ny4.etsy.com][swmFFvkEQHaBDyYtBSeSbA][inet[/ip.add.re.ss:9300]]{tag=archive, data=false, master=true}], reason [do not exists on master, act as master failure]

The master, and other nodes in the cluster, were just fine.

About 8 seconds later, it found the master again.

DenisUspenskiy · 2013-12-24T15:55:03Z

Hello,
Also have the same problem. Below is the exception stack trace:

[2013-12-24 14:56:53,082][DEBUG][action.bulk] [Trinity] [agentsmith][10], node[gYnN_-hxQly-2bctN2IVkg], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.bulk.BulkShardRequest@37daa067]
java.lang.NullPointerException
at org.elasticsearch.action.bulk.TransportShardBulkAction.shards(TransportShardBulkAction.java:138)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shards(TransportShardBulkAction.java:75)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performReplicas(TransportShardReplicationOperationAction.java:610)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:557)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:426)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)

spinscale · 2013-12-25T13:12:15Z

@DenisUspenskiy did you have some cluster instability as well? Do you have master reelections around that time? Can you reproduce it?

spinscale · 2014-01-13T08:29:51Z

Clinton managed to reproduce it in #4693:

This is reproducible by deleting an index, not waiting for the response, then trying to bulk index into that index, (ie the requests were run in parallel):

ashpynov · 2014-05-29T14:36:16Z

Same call stack here.
On data node such backlog while bulk indexing during master had been restarted. After this some shards on affected index on data node became and stay Unassigned while other is OK. (Index allocation rule is only on data node, no replica, 10 shards). Data node restart do not help. Only index drop.
version is 0.90.7

cdmicacc · 2014-06-18T14:16:14Z

I think we're seeing this, as well. We get it when we close an index (using ElasticSearch 1.1.1): I have a process that is reindexing to a new index using the bulk API. While that is happening, my live system is still writing to the old index. Eventually, the reindex completes and the alias is changed so that writes are directed to the new index. Just after that, I close the old index. ElasticSearch's logs get filled with this for a short time (presumably while the threadpools drain):

[2014-06-17 17:28:30,986][INFO ][cluster.metadata         ] [es11] closing indices [[idx-2014-06-12-11]]
[2014-06-17 17:29:02,154][DEBUG][action.bulk              ] [es11] [idx-2014-06-12-11][4], node[1r5Z85J8TM2e1Tp3KO3NAA], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.bulk.BulkShardRequest@7e39e699]
java.lang.NullPointerException
        at org.elasticsearch.action.bulk.TransportShardBulkAction.shards(TransportShardBulkAction.java:139)
        at org.elasticsearch.action.bulk.TransportShardBulkAction.shards(TransportShardBulkAction.java:76)
        at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performReplicas(TransportShardReplicationOperationAction.java:610)
        at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:557)
        at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:426)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)

clintongormley · 2014-08-08T18:44:02Z

This NPE has been fixed in recent versions. The bulk API can still fail briefly with an index-does-not-exist exception, but this should be fixed by #6790.

Closing

spinscale mentioned this issue Jan 13, 2014

NPE when bulk indexing into an index in the process of being deleted #4693

Closed

clintongormley closed this as completed Aug 8, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NullPointerException in TransportShardBulkAction #4224

NullPointerException in TransportShardBulkAction #4224

spinscale commented Nov 21, 2013

avleen commented Nov 26, 2013

DenisUspenskiy commented Dec 24, 2013

spinscale commented Dec 25, 2013

spinscale commented Jan 13, 2014

ashpynov commented May 29, 2014

cdmicacc commented Jun 18, 2014

clintongormley commented Aug 8, 2014

NullPointerException in TransportShardBulkAction #4224

NullPointerException in TransportShardBulkAction #4224

Comments

spinscale commented Nov 21, 2013

avleen commented Nov 26, 2013

DenisUspenskiy commented Dec 24, 2013

spinscale commented Dec 25, 2013

spinscale commented Jan 13, 2014

ashpynov commented May 29, 2014

cdmicacc commented Jun 18, 2014

clintongormley commented Aug 8, 2014