NullPointerException failed to dynamically refresh the mapping in cluster_state from shard #2633

Filirom1 · 2013-02-08T10:29:30Z

For a maintenance operation we disabled allocation and restart one node.

$ curl -XPUT localhost:9023/_cluster/settings -d '{"transient":{"cluster.routing.allocation.disable_allocation":"true"}}
# stop the node tpsmdt03s
# start the node tpsmdt03s
$ curl -XPUT localhost:9023/_cluster/settings -d '{"transient": "cluster.routing.allocation.disable_allocation":"false"}}

Unfortunately, something went wrong with ES:

[2012-11-28 11:02:02,324][INFO ][cluster.routing.allocation.decider] [tpsmdt01s] updating [cluster.routing.allocation.disable_allocation] from [false] to [true]
[2012-11-28 11:08:13,170][INFO ][discovery.zen            ] [tpsmdt01s] master_left [[tpsmdt03s][6Sd72NOiTOayO8nPtlH9dg][inet[/10.26.165.12:9027]]], reason [transport disconnected (with verified connect)]
[2012-11-28 11:08:13,177][INFO ][cluster.service          ] [tpsmdt01s] master {new [tpsmdt01s][JIjtYHpkQBa8_GnvcLM4Nw][inet[/10.26.165.10:9027]], previous [tpsmdt03s][6Sd72NOiTOayO8nPtlH9dg][inet[/10.26.165.12:9027]]}, removed {[tpsmdt03s][6Sd72NOiTOayO8nPtlH9dg][inet[/10.26.165.12:9027]],}, reason: zen-disco-master_failed ([tpsmdt03s][6Sd72NOiTOayO8nPtlH9dg][inet[/10.26.165.12:9027]])
...
[2012-11-28 11:10:02,449][INFO ][cluster.metadata         ] [tpsmdt01s] [4f072ea3a623374256000001] update_mapping [50b5e2f946b88e97370000e5] (dynamic)
[2012-11-28 11:17:12,867][INFO ][cluster.service          ] [tpsmdt01s] added {[tpsmdt03s][m8ckMTZlQ1Kuvd8P7RJtfg][inet[/10.26.165.12:9027]],}, reason: zen-disco-receive(join from node[[tpsmdt03s][m8ckMTZlQ1Kuvd8P7RJtfg][inet[/10.26.165.12:9027]]])
[2012-11-28 11:18:23,379][INFO ][cluster.routing.allocation.decider] [tpsmdt01s] updating [cluster.routing.allocation.disable_allocation] from [true] to [false]
...
2012-11-28 14:40:02,615][INFO ][cluster.metadata         ] [tpsmdt01s] [4f072ea3a623374256000001] update_mapping [50b6143246b88e9737000172] (dynamic)
[2012-11-28 14:40:04,885][INFO ][cluster.metadata         ] [tpsmdt01s] [[4f072ea3a623374256000001]] remove_mapping [50b6143246b88e9737000172]
[2012-11-28 14:45:02,056][INFO ][cluster.metadata         ] [tpsmdt01s] 
...
[4f072ea3a623374256000001] update_mapping [50b6155d46b88e9737000185] (dynamic)
[2012-11-28 14:45:04,359][INFO ][cluster.metadata         ] [tpsmdt01s] [[4f072ea3a623374256000001]] remove_mapping [50b6155d46b88e9737000185]
[2012-11-28 14:45:04,482][WARN ][cluster.metadata         ] [tpsmdt01s] failed to dynamically refresh the mapping in cluster_state from shard
java.lang.NullPointerException
        at org.elasticsearch.cluster.metadata.MetaDataMappingService$1.execute(MetaDataMappingService.java:130)
        at org.elasticsearch.cluster.service.InternalClusterService$2.run(InternalClusterService.java:208)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)

And now, I have a lot of WARN

[2013-02-08 10:45:04,324][INFO ][cluster.metadata         ] [tpsmdt01s] [[4f072ea3a623374256000001]] remove_mapping [5114c91de5c0e0950f0000d2]
[2013-02-08 10:45:04,459][INFO ][cluster.metadata         ] [tpsmdt01s] [[4f072ea3a623374256000001]] remove_mapping [5114c91d511a3418550000d4]
[2013-02-08 10:45:06,599][WARN ][cluster.metadata         ] [tpsmdt01s] failed to dynamically refresh the mapping in cluster_state from shard
java.lang.NullPointerException
[2013-02-08 10:45:06,964][WARN ][cluster.metadata         ] [tpsmdt01s] failed to dynamically refresh the mapping in cluster_state from shard
java.lang.NullPointerException

When I say a lot, I mean it ;)

$ grep 'failed to dynamically refresh the mapping in cluster_state from shard' search-0.19* | wc -l
58051

The text was updated successfully, but these errors were encountered:

Filirom1 · 2013-02-08T11:02:20Z

Deleting the index is not the solution:

$ curl -X DELETE http://tpsmdt03s:9023/4f072ea3a623374256000001
[2013-02-08 11:55:29,855][INFO ][cluster.metadata         ] [tpsmdt01s] [4f072ea3a623374256000001] deleting index
{"ok":true,"acknowledged":false}

$ curl -XPOST http://localhost:9023/4f072ea3a623374256000001/user2/ -d '{
    "name2" : "Shay Banon"
}'
[2013-02-08 12:01:24,479][INFO ][cluster.metadata         ] [tpsmdt01s] [4f072ea3a623374256000001] creating index, cause [auto(index api)], shards [6]/[1], mappings []
[2013-02-08 12:01:25,057][WARN ][cluster.metadata         ] [tpsmdt01s] failed to dynamically refresh the mapping in cluster_state from shard
java.lang.NullPointerException

Filirom1 · 2013-02-08T14:19:08Z

What I understand is that the cluster state is in a poor state.
It can not be updated because a NullPointerException is thrown at MetaDataMappingService.java#L130

Finally the Exception is catched at MetaDataMappingService.java#L145 and the state is rolled back to the previsous one (the bad one).
It will be impossible for us to go away from this Error, because the cluster state is saved on disk and will be restored after a cold cluster restart.

Index creation and deletion are very slow:

$ time curl -XPOST http://localhost:9023/test2/user3/ -d '{
    "name" : "Shay Banon"
}'
{"ok":true,"_index":"test2","_type":"user3","_id":"OHCEGwWsRgKfQi-DMalzPA","_version":1}
real    0m43.143s
user    0m0.003s
sys 0m0.002s

$ time curl -XDELETE http://localhost:9023/test2/
{"ok":true,"acknowledged":false}
real    0m10.027s
user    0m0.001s
sys 0m0.004s

But once created, indexation are fast:

$ time curl -XPOST http://localhost:9023/test2/user3/ -d '{
    "name" : "Shay Banon"
}'
{"ok":true,"_index":"test2","_type":"user3","_id":"qGgGgHxNR3297PkdNA94Lg","_version":1}
real    0m0.010s
user    0m0.002s
sys 0m0.003s

And querying works:

$ time curl http://localhost:9023/test2/_search?q=name:Shay
{"took":2,"timed_out":false,"_shards":{"total":6,"successful":6,"failed":0},"hits":{"total":1,"max_score":0.19178301,"hits":[{"_index":"test2","_type":"user3","_id":"0jNFPBz-RieGCQ-vctYOoA","_score":0.19178301, "_source" : {"name":"Shay Banon"}}]}}
real    0m0.008s
user    0m0.002s
sys 0m0.002s

Filirom1 · 2013-02-13T08:52:05Z

Ok now, I have a better understanding of what happens on the cluster.

The long index creation time is not related to the NullPointerException : https://groups.google.com/d/msg/elasticsearch/hHniXBx8su8/VfEArmcpqQgJ

I am now able to reproduce the NullPointerException:
Two indexations in parallel, and one deletion on the same index/type.

curl -XPOST http://tpsmdt11s:9023/twitter/user -d '{
    "name" : "Shay Banon"
}' &
curl -XPOST http://tpsmdt11s:9023/twitter/user -d '{
    "name" : "Hello You"
}' &
curl -XDELETE http://tpsmdt11s:9023/twitter/user &

[2013-02-13 09:48:21,369][INFO ][cluster.metadata         ] [tpsmdt11s] [twitter] update_mapping [user] (dynamic)
[2013-02-13 09:48:21,386][INFO ][cluster.metadata         ] [tpsmdt11s] [twitter] update_mapping [user] (dynamic)
[2013-02-13 09:48:21,401][INFO ][cluster.metadata         ] [tpsmdt11s] [[twitter]] remove_mapping [user]
[2013-02-13 09:48:21,479][WARN ][cluster.metadata         ] [tpsmdt11s] failed to dynamically refresh the mapping in cluster_state from shard
java.lang.NullPointerException

The same logs in DEBUG : https://gist.github.com/Filirom1/4943162.

I use a 4 nodes cluster in 0.19.11

imotov · 2013-02-14T21:27:12Z

Can you reproduce the NullPointerException issue with 0.20.5 or master?

spinscale · 2013-08-09T12:26:34Z

Closing this as it is quite stale. Happy to reopen, if this still happens with a current 0.90.3 release. Please tell us!

lindstromhenrik · 2013-11-07T14:32:43Z

I can't reproduce this locally but I can confirm that we see this error in 0.90.5:
[2013-11-07 15:26:39,574][WARN ][cluster.action.index ] [NODE] failed to dynamically update the mapping in cluster_state from shard
java.lang.NullPointerException

spinscale · 2013-11-07T21:32:14Z

do you have a stack trace as you could paste here?

javanna · 2013-11-07T22:23:46Z

I think the NPE here might be related to ClusterState#toString method, probably solved with this commit: 123bc98

lindstromhenrik · 2013-11-08T07:05:47Z

I'm working on getting the stack but I haven't been able to reproduce this in an environment that allows me to do this.

Regarding the "toString()" fix we don't use templates so I don't think that is the problem.

My working hypothesis at the moment after looking at the mappings for indices that fails is that they all contain mappings where we have the same property name at different levels and that this also makes the merging of mappings to fail, causing a refresh-mapping giving the same NPE.

spinscale closed this as completed Aug 9, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NullPointerException failed to dynamically refresh the mapping in cluster_state from shard #2633

NullPointerException failed to dynamically refresh the mapping in cluster_state from shard #2633

Filirom1 commented Feb 8, 2013

Filirom1 commented Feb 8, 2013

Filirom1 commented Feb 8, 2013

Filirom1 commented Feb 13, 2013

imotov commented Feb 14, 2013

spinscale commented Aug 9, 2013

lindstromhenrik commented Nov 7, 2013

spinscale commented Nov 7, 2013

javanna commented Nov 7, 2013

lindstromhenrik commented Nov 8, 2013

NullPointerException failed to dynamically refresh the mapping in cluster_state from shard #2633

NullPointerException failed to dynamically refresh the mapping in cluster_state from shard #2633

Comments

Filirom1 commented Feb 8, 2013

Filirom1 commented Feb 8, 2013

Filirom1 commented Feb 8, 2013

Filirom1 commented Feb 13, 2013

imotov commented Feb 14, 2013

spinscale commented Aug 9, 2013

lindstromhenrik commented Nov 7, 2013

spinscale commented Nov 7, 2013

javanna commented Nov 7, 2013

lindstromhenrik commented Nov 8, 2013