New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Internal: Upgrade caused shard data to stay on nodes #7386
Comments
could this be related to #6692 did you upgrade all nodes to 1.3 or do you still have nodes < 1.3.0 in the cluster? |
Only about 1/3 of the nodes before we got warnings about disk space. |
I guess it's not freeing the space unless an upgraded node holds a copy of the shard. That is new in 1.3 and I still try to remember what the background was. Can you check if that assumption is true, are the shards that are not delete allocated on old nodes? |
Well, this is almost certainly the cause: // If all nodes have been upgraded to >= 1.3.0 at some point we get back here and have the chance to
// run this api. (when cluster state is then updated)
if (node.getVersion().before(Version.V_1_3_0)) {
logger.debug("Skip deleting deleting shard instance [{}], a node holding a shard instance is < 1.3.0", shardRouting);
return false;
} 1.3 won't delete stuff from the disks until the whole cluster is 1.3. That's ugly. I run with disks 50% full and the upgrade process almost filled them just with shuffling. Side note: if the shards are still in the routing table it'd be nice to see them. Right now they seem to be invisble to he _cat api. |
@nik9000 this was a temporary thing to add extra safety. It will get lower the more nodes you upgrade. I agree we could expose some more infos here if stuff is still on disk. |
This gave me quite a scare! I was running this upgrade over night with a script with extra sleeping to keep the cluster balanced. It woke me up with 99% disk utilization on one of the nodes. I'll keep pushing the upgrade through carefully. |
For posterity: if you nuke the contents of your node's disk after stopping Elasticsearch 1.2 but before starting Elasticsearch 1.3 then you won't end up with too much data that can't be cleared. The more nodes you upgrade the more shards you'll be able to delete any way - like @s1monw said. |
just to clarify a bit more we added some safety in 1.3 that required a new API and we can only call this API if we know that we are allocated on another 1.3 or newer node that is why we keep the data around longer. thanks for opening this nik! |
The unused shard copies only get deleted if all its active copies can be verified. Maybe shard to be cleaned up had copies on this not yet upgraded node? Unused shard copies should get cleaned up now, if that isn't the case then that is bad. If you enable trace logging for the |
@martijnvg - I'll see what happens once all the cluster goes green after the last upgrade - that'll be in under an hour. Did we do anything to allow changing log levels on the fly? I remember seeing something about it but #6416 is still open. |
And by we I mean you, I guess :) |
:) Well this has been in for a while: #2517 Which allows to change the log settings via the cluster update api. |
OK! Here is something: https://gist.github.com/nik9000/89013550ec78da5808e4 |
That is getting spit out constantly. |
Looks like it is on every node as well. |
Cluster is now green and lots of old data still sitting around. |
@nik9000 this is very odd. The line points at a null clusterName . All the nodes are continuously logging this? Can I ask you to enable debug logging for the root logger and share the log? I hope to get more context into when this can happen. |
I see that cluster name is something that as introduced in 1.1.1. Maybe a coincidence - but I haven't performed a full cluster restart since upgrading to 1.1.0. |
Let me see about that debug logging - seems like that'll be a ton of data. Also - looks like this is the only thing that doesn't check if the cluster name is non null. Probably just a coincidence because it supposed to be non-null since 1.1.1 I guess..... |
@nik9000 I'm not sure I follow what you mean by
I was referring to this line: https://github.com/elasticsearch/elasticsearch/blob/v1.3.2/src/main/java/org/elasticsearch/indices/store/IndicesStore.java#L418 |
@bleskes - sorry, yeah. I was looking at other code that looked at the cluster name and its pretty careful around the cluster name potentially being null. Like I guess what I'm saying is that if the cluster state never picked up the name somehow this looks like the only thing that would break. |
Tried setting logger to debug and didn't get anything super interesting. Here is some of it: https://gist.github.com/nik9000/b9c40805abb4bcbb5b61 |
Thx Nik. I have a theory. Indeed the cluster name as part of the cluster state was introduced in 1.1.1 . When a node of version >=1.1.1 reads the cluster state from an older node, that field will be populated with null. During the upgrade from 1.1.0 this happened and the cluster state in memory has it's name set to null. Since you never restarted the complete cluster since then, all nodes have kept communicating it keep it alive. This trips this new code. A full cluster restart should fix it but that's obviously totally not desirable. I'm still trying to come up with a potential work around... |
@nik9000 do you use dedicated master nodes? it doesn't look so from the logs but I want to double check |
@bleskes no dedicated master nodes. |
@bleskes that's what I was thinking - I was digging through places where the cluster state is built from name and they are pretty rare. Still, it'd take me some time to validate that they never get saved. |
…ter state who misses it ClusterState has a reference to the cluster name since version 1.1.0 (df7474b) . However, if the state was sent from a master of an older version, this name can be set to null. This is an unexpected and can cause bugs. The bad part is that it will never correct it self until a full cluster restart where the cluster state is rebuilt using the code of the latest version. This commit changes the default to the node's cluster name. Relates to elastic#7386
More posterity: this broke for me because when I started the cluster I was using 1.1.0 and I haven't done a full restart since - only rolling restarts. If you are in that boat - do not upgrade to 1.3 until 1.3.3 is released. |
…ter state who misses it ClusterState has a reference to the cluster name since version 1.1.0 (df7474b) . However, if the state was sent from a master of an older version, this name can be set to null. This is an unexpected and can cause bugs. The bad part is that it will never correct it self until a full cluster restart where the cluster state is rebuilt using the code of the latest version. This commit changes the default to the node's cluster name. Relates to #7386 Closes #7414
…ter state who misses it ClusterState has a reference to the cluster name since version 1.1.0 (df7474b) . However, if the state was sent from a master of an older version, this name can be set to null. This is an unexpected and can cause bugs. The bad part is that it will never correct it self until a full cluster restart where the cluster state is rebuilt using the code of the latest version. This commit changes the default to the node's cluster name. Relates to #7386 Closes #7414
…ter state who misses it ClusterState has a reference to the cluster name since version 1.1.0 (df7474b) . However, if the state was sent from a master of an older version, this name can be set to null. This is an unexpected and can cause bugs. The bad part is that it will never correct it self until a full cluster restart where the cluster state is rebuilt using the code of the latest version. This commit changes the default to the node's cluster name. Relates to #7386 Closes #7414
I'm going to close this as it is fixed by the change my in #7414 |
Thanks! |
…ter state who misses it ClusterState has a reference to the cluster name since version 1.1.0 (df7474b) . However, if the state was sent from a master of an older version, this name can be set to null. This is an unexpected and can cause bugs. The bad part is that it will never correct it self until a full cluster restart where the cluster state is rebuilt using the code of the latest version. This commit changes the default to the node's cluster name. Relates to #7386 Closes #7414
Ran into same issue when upgrading v1.2.2 to v1.3.2. Could you please help by answering -
|
The error i has was caused be more doing a full restart some 1.0.1 or so. You can also did it by applying the patch to fix this issue directly to
|
Thanks Nik,, Yes we have been doing rolling upgrade since v1,0,x, and the issue explosed with last upgrade from v1.2.2/ Really curious what is the impact of leaving v1.3.2. So far I only see error traces, but no search/index/alert failures. Also I am not sure how can we make an upgraded node master, is their an option for that? ---- Edit 8:49 PM GMT Time ---- |
On Sep 21, 2014 11:37 AM, "ajhalani" notifications@github.com wrote:
Yup that sounds like this issue then.
The errors at trace log you can ignore. The trouble will be that the disks
That isn't super easy. I can't explain on mobile so it'll have to wait
|
Yea don't worry explaining how to make a node master it if it's not a straightforward option.. As I said in a later edit, did a full cluster restart and issue went away. thanks again! |
Cool! I'm glad it worked for you. @bleskes I've seen a few people with this issue over the past month - maybe 4. I wonder if it is worth thinking of cutting a 1.3.3 soonish to pick this up? |
…ter state who misses it ClusterState has a reference to the cluster name since version 1.1.0 (df7474b) . However, if the state was sent from a master of an older version, this name can be set to null. This is an unexpected and can cause bugs. The bad part is that it will never correct it self until a full cluster restart where the cluster state is rebuilt using the code of the latest version. This commit changes the default to the node's cluster name. Relates to elastic#7386 Closes elastic#7414
In case anyone else comes across this, I've encountered the exact same behavior following the rolling upgrade docshere going from 5.4 to 5.6 |
Upgrade caused shard data to stay on nodes even after it isn't useful any more.
This comes from https://groups.google.com/forum/#!topic/elasticsearch/Mn1N0xmjsL8
What I did:
Started upgrading from Elasticsearch 1.2.1 to Elasticsearch 1.3.2. For each of the 6 nodes I updated:
What happened:
The new version of Elasticsearch came up but didn't remove all the shard data it can't use. This picture from Whatson shows the problem pretty well:
https://wikitech.wikimedia.org/wiki/File:Whatson_out_of_disk.png
The nodes on the left were upgraded and blue means disk usage by Elasticsearch and brown is "other" disk usage.
When I dig around on the filesystem all the space usage is in the shard storage directory (/var/lib/elasticsearch/production-search-eqiad/nodes/0/indices) but when I compare the list of open files to the list of files on the file system with this I see that whole directories are just sitting around, unused. Hitting the
/_cat/shards/<directory_name>
corroborates that the shard in the directory isn't on the node. Oddly, if we keep poking around we find open files in directories representing shards that we don't expect to be on the node either....What we're doing now:
We're going to try restarting the upgrade and blasting the data directory on the node as we upgrade it.
Reproduction steps:
No idea. And I'm a bit afraid to keep pushing things on our cluster with it in the state that it is in.
The text was updated successfully, but these errors were encountered: