Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NullPointerException in TransportShardBulkAction #4224

Closed
spinscale opened this issue Nov 21, 2013 · 7 comments
Closed

NullPointerException in TransportShardBulkAction #4224

spinscale opened this issue Nov 21, 2013 · 7 comments

Comments

@spinscale
Copy link
Contributor

This happened on elasticsearch 0.90.6 with JVM 1.7.0_25

Found this in logs, cannot tell what triggered this. The only thing I know is, there are lots of index/search operations going and there seems to be some cluster instability.

[2013-XX-YY 08:45:22,518][DEBUG][action.bulk              ] [myNode] [logstash-2013.XX.YY][3], node[vXE6ojncQUG-foPsMjVY_w], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.bulk.BulkShardRequest@28c7fa4f]
java.lang.NullPointerException
    at org.elasticsearch.action.bulk.TransportShardBulkAction.shards(TransportShardBulkAction.java:138)
    at org.elasticsearch.action.bulk.TransportShardBulkAction.shards(TransportShardBulkAction.java:75)
    at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performReplicas(TransportShardReplicationOperationAction.java:610)
    at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:557)
    at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:426)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)
@avleen
Copy link

avleen commented Nov 26, 2013

This just bit us tonight, too, somewhat out of the blue.

Immediately before this I see:

[2013-11-26 02:41:52,659][INFO ][discovery.zen            ] [myDataNode] master_left [[logstash01.ny4.etsy.com][swmFFvkEQHaBDyYtBSeSbA][inet[/ip.add.re.ss:9300]]{tag=archive, data=false, master=true}], reason [do not exists on master, act as master failure]

The master, and other nodes in the cluster, were just fine.

About 8 seconds later, it found the master again.

@DenisUspenskiy
Copy link

Hello,
Also have the same problem. Below is the exception stack trace:

[2013-12-24 14:56:53,082][DEBUG][action.bulk] [Trinity] [agentsmith][10], node[gYnN_-hxQly-2bctN2IVkg], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.bulk.BulkShardRequest@37daa067]
java.lang.NullPointerException
at org.elasticsearch.action.bulk.TransportShardBulkAction.shards(TransportShardBulkAction.java:138)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shards(TransportShardBulkAction.java:75)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performReplicas(TransportShardReplicationOperationAction.java:610)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:557)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:426)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)

@spinscale
Copy link
Contributor Author

@DenisUspenskiy did you have some cluster instability as well? Do you have master reelections around that time? Can you reproduce it?

@spinscale
Copy link
Contributor Author

Clinton managed to reproduce it in #4693:

This is reproducible by deleting an index, not waiting for the response, then trying to bulk index into that index, (ie the requests were run in parallel):

@ashpynov
Copy link

Same call stack here.
On data node such backlog while bulk indexing during master had been restarted. After this some shards on affected index on data node became and stay Unassigned while other is OK. (Index allocation rule is only on data node, no replica, 10 shards). Data node restart do not help. Only index drop.
version is 0.90.7

@cdmicacc
Copy link

I think we're seeing this, as well. We get it when we close an index (using ElasticSearch 1.1.1): I have a process that is reindexing to a new index using the bulk API. While that is happening, my live system is still writing to the old index. Eventually, the reindex completes and the alias is changed so that writes are directed to the new index. Just after that, I close the old index. ElasticSearch's logs get filled with this for a short time (presumably while the threadpools drain):

[2014-06-17 17:28:30,986][INFO ][cluster.metadata         ] [es11] closing indices [[idx-2014-06-12-11]]
[2014-06-17 17:29:02,154][DEBUG][action.bulk              ] [es11] [idx-2014-06-12-11][4], node[1r5Z85J8TM2e1Tp3KO3NAA], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.bulk.BulkShardRequest@7e39e699]
java.lang.NullPointerException
        at org.elasticsearch.action.bulk.TransportShardBulkAction.shards(TransportShardBulkAction.java:139)
        at org.elasticsearch.action.bulk.TransportShardBulkAction.shards(TransportShardBulkAction.java:76)
        at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performReplicas(TransportShardReplicationOperationAction.java:610)
        at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:557)
        at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:426)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)

@clintongormley
Copy link

This NPE has been fixed in recent versions. The bulk API can still fail briefly with an index-does-not-exist exception, but this should be fixed by #6790.

Closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants