Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NullPointerException failed to dynamically refresh the mapping in cluster_state from shard #2633

Closed
Filirom1 opened this issue Feb 8, 2013 · 9 comments

Comments

@Filirom1
Copy link

Filirom1 commented Feb 8, 2013

For a maintenance operation we disabled allocation and restart one node.

$ curl -XPUT localhost:9023/_cluster/settings -d '{"transient":{"cluster.routing.allocation.disable_allocation":"true"}}
# stop the node tpsmdt03s
# start the node tpsmdt03s
$ curl -XPUT localhost:9023/_cluster/settings -d '{"transient": "cluster.routing.allocation.disable_allocation":"false"}}

Unfortunately, something went wrong with ES:

[2012-11-28 11:02:02,324][INFO ][cluster.routing.allocation.decider] [tpsmdt01s] updating [cluster.routing.allocation.disable_allocation] from [false] to [true]
[2012-11-28 11:08:13,170][INFO ][discovery.zen            ] [tpsmdt01s] master_left [[tpsmdt03s][6Sd72NOiTOayO8nPtlH9dg][inet[/10.26.165.12:9027]]], reason [transport disconnected (with verified connect)]
[2012-11-28 11:08:13,177][INFO ][cluster.service          ] [tpsmdt01s] master {new [tpsmdt01s][JIjtYHpkQBa8_GnvcLM4Nw][inet[/10.26.165.10:9027]], previous [tpsmdt03s][6Sd72NOiTOayO8nPtlH9dg][inet[/10.26.165.12:9027]]}, removed {[tpsmdt03s][6Sd72NOiTOayO8nPtlH9dg][inet[/10.26.165.12:9027]],}, reason: zen-disco-master_failed ([tpsmdt03s][6Sd72NOiTOayO8nPtlH9dg][inet[/10.26.165.12:9027]])
...
[2012-11-28 11:10:02,449][INFO ][cluster.metadata         ] [tpsmdt01s] [4f072ea3a623374256000001] update_mapping [50b5e2f946b88e97370000e5] (dynamic)
[2012-11-28 11:17:12,867][INFO ][cluster.service          ] [tpsmdt01s] added {[tpsmdt03s][m8ckMTZlQ1Kuvd8P7RJtfg][inet[/10.26.165.12:9027]],}, reason: zen-disco-receive(join from node[[tpsmdt03s][m8ckMTZlQ1Kuvd8P7RJtfg][inet[/10.26.165.12:9027]]])
[2012-11-28 11:18:23,379][INFO ][cluster.routing.allocation.decider] [tpsmdt01s] updating [cluster.routing.allocation.disable_allocation] from [true] to [false]
...
2012-11-28 14:40:02,615][INFO ][cluster.metadata         ] [tpsmdt01s] [4f072ea3a623374256000001] update_mapping [50b6143246b88e9737000172] (dynamic)
[2012-11-28 14:40:04,885][INFO ][cluster.metadata         ] [tpsmdt01s] [[4f072ea3a623374256000001]] remove_mapping [50b6143246b88e9737000172]
[2012-11-28 14:45:02,056][INFO ][cluster.metadata         ] [tpsmdt01s] 
...
[4f072ea3a623374256000001] update_mapping [50b6155d46b88e9737000185] (dynamic)
[2012-11-28 14:45:04,359][INFO ][cluster.metadata         ] [tpsmdt01s] [[4f072ea3a623374256000001]] remove_mapping [50b6155d46b88e9737000185]
[2012-11-28 14:45:04,482][WARN ][cluster.metadata         ] [tpsmdt01s] failed to dynamically refresh the mapping in cluster_state from shard
java.lang.NullPointerException
        at org.elasticsearch.cluster.metadata.MetaDataMappingService$1.execute(MetaDataMappingService.java:130)
        at org.elasticsearch.cluster.service.InternalClusterService$2.run(InternalClusterService.java:208)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)

And now, I have a lot of WARN

[2013-02-08 10:45:04,324][INFO ][cluster.metadata         ] [tpsmdt01s] [[4f072ea3a623374256000001]] remove_mapping [5114c91de5c0e0950f0000d2]
[2013-02-08 10:45:04,459][INFO ][cluster.metadata         ] [tpsmdt01s] [[4f072ea3a623374256000001]] remove_mapping [5114c91d511a3418550000d4]
[2013-02-08 10:45:06,599][WARN ][cluster.metadata         ] [tpsmdt01s] failed to dynamically refresh the mapping in cluster_state from shard
java.lang.NullPointerException
[2013-02-08 10:45:06,964][WARN ][cluster.metadata         ] [tpsmdt01s] failed to dynamically refresh the mapping in cluster_state from shard
java.lang.NullPointerException

When I say a lot, I mean it ;)

$ grep 'failed to dynamically refresh the mapping in cluster_state from shard' search-0.19* | wc -l
58051
@Filirom1
Copy link
Author

Filirom1 commented Feb 8, 2013

Deleting the index is not the solution:

$ curl -X DELETE http://tpsmdt03s:9023/4f072ea3a623374256000001
[2013-02-08 11:55:29,855][INFO ][cluster.metadata         ] [tpsmdt01s] [4f072ea3a623374256000001] deleting index
{"ok":true,"acknowledged":false}
$ curl -XPOST http://localhost:9023/4f072ea3a623374256000001/user2/ -d '{
    "name2" : "Shay Banon"
}'
[2013-02-08 12:01:24,479][INFO ][cluster.metadata         ] [tpsmdt01s] [4f072ea3a623374256000001] creating index, cause [auto(index api)], shards [6]/[1], mappings []
[2013-02-08 12:01:25,057][WARN ][cluster.metadata         ] [tpsmdt01s] failed to dynamically refresh the mapping in cluster_state from shard
java.lang.NullPointerException

@Filirom1
Copy link
Author

Filirom1 commented Feb 8, 2013

What I understand is that the cluster state is in a poor state.
It can not be updated because a NullPointerException is thrown at MetaDataMappingService.java#L130

Finally the Exception is catched at MetaDataMappingService.java#L145 and the state is rolled back to the previsous one (the bad one).
It will be impossible for us to go away from this Error, because the cluster state is saved on disk and will be restored after a cold cluster restart.

Index creation and deletion are very slow:

$ time curl -XPOST http://localhost:9023/test2/user3/ -d '{
    "name" : "Shay Banon"
}'
{"ok":true,"_index":"test2","_type":"user3","_id":"OHCEGwWsRgKfQi-DMalzPA","_version":1}
real    0m43.143s
user    0m0.003s
sys 0m0.002s

$ time curl -XDELETE http://localhost:9023/test2/
{"ok":true,"acknowledged":false}
real    0m10.027s
user    0m0.001s
sys 0m0.004s

But once created, indexation are fast:

$ time curl -XPOST http://localhost:9023/test2/user3/ -d '{
    "name" : "Shay Banon"
}'
{"ok":true,"_index":"test2","_type":"user3","_id":"qGgGgHxNR3297PkdNA94Lg","_version":1}
real    0m0.010s
user    0m0.002s
sys 0m0.003s

And querying works:

$ time curl http://localhost:9023/test2/_search?q=name:Shay
{"took":2,"timed_out":false,"_shards":{"total":6,"successful":6,"failed":0},"hits":{"total":1,"max_score":0.19178301,"hits":[{"_index":"test2","_type":"user3","_id":"0jNFPBz-RieGCQ-vctYOoA","_score":0.19178301, "_source" : {"name":"Shay Banon"}}]}}
real    0m0.008s
user    0m0.002s
sys 0m0.002s

@Filirom1
Copy link
Author

Ok now, I have a better understanding of what happens on the cluster.

The long index creation time is not related to the NullPointerException : https://groups.google.com/d/msg/elasticsearch/hHniXBx8su8/VfEArmcpqQgJ

I am now able to reproduce the NullPointerException:
Two indexations in parallel, and one deletion on the same index/type.

curl -XPOST http://tpsmdt11s:9023/twitter/user -d '{
    "name" : "Shay Banon"
}' &
curl -XPOST http://tpsmdt11s:9023/twitter/user -d '{
    "name" : "Hello You"
}' &
curl -XDELETE http://tpsmdt11s:9023/twitter/user &
[2013-02-13 09:48:21,369][INFO ][cluster.metadata         ] [tpsmdt11s] [twitter] update_mapping [user] (dynamic)
[2013-02-13 09:48:21,386][INFO ][cluster.metadata         ] [tpsmdt11s] [twitter] update_mapping [user] (dynamic)
[2013-02-13 09:48:21,401][INFO ][cluster.metadata         ] [tpsmdt11s] [[twitter]] remove_mapping [user]
[2013-02-13 09:48:21,479][WARN ][cluster.metadata         ] [tpsmdt11s] failed to dynamically refresh the mapping in cluster_state from shard
java.lang.NullPointerException

The same logs in DEBUG : https://gist.github.com/Filirom1/4943162.

I use a 4 nodes cluster in 0.19.11

@imotov
Copy link
Contributor

imotov commented Feb 14, 2013

Can you reproduce the NullPointerException issue with 0.20.5 or master?

@spinscale
Copy link
Contributor

Closing this as it is quite stale. Happy to reopen, if this still happens with a current 0.90.3 release. Please tell us!

@lindstromhenrik
Copy link
Contributor

I can't reproduce this locally but I can confirm that we see this error in 0.90.5:
[2013-11-07 15:26:39,574][WARN ][cluster.action.index ] [NODE] failed to dynamically update the mapping in cluster_state from shard
java.lang.NullPointerException

@spinscale
Copy link
Contributor

do you have a stack trace as you could paste here?

@javanna
Copy link
Member

javanna commented Nov 7, 2013

I think the NPE here might be related to ClusterState#toString method, probably solved with this commit: 123bc98

@lindstromhenrik
Copy link
Contributor

I'm working on getting the stack but I haven't been able to reproduce this in an environment that allows me to do this.

Regarding the "toString()" fix we don't use templates so I don't think that is the problem.

My working hypothesis at the moment after looking at the mappings for indices that fails is that they all contain mappings where we have the same property name at different levels and that this also makes the merging of mappings to fail, causing a refresh-mapping giving the same NPE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants