New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
_upgrade doesn't upgrade index directories with no shards #16044
Comments
Having a 40 nodes cluster, I presume you use dedicated master nodes? if so, I suspect that the state-0.st files are on the master nodes while the data is stored on data. This is expected behaviour for 1.x (changing in 2.0). I also understand that you still have the cluster running under 1.7.3 ? can you post the output of Also, do see a message in the data node logs starting with "Not updating settings for the index... because upgraded of some primary shards failed..." ? |
I have a single master node, although it also has data on it (so it isn't dedicated.) The master node was NOT the node with the issue in this case, although it had its own issue for other indexes. However this cluster has been running "forever", and at one point (years ago) I didn't have a single master, was just using the default of all hosts can be a master, so it is very possible that it could have been a master in a previous time. Reading between the lines can I just go delete _state from top level indices that are NOT the master node and that will fix everything? (I did try renaming _state on a node, and that node could start up, so I'm guessing yes.)
|
OK, I think what you are saying is that this state files are from the days that a node used to be a master node but now it's a data node only (i.e., |
I have no clue if that is where the state files are from, that is my guess, I was hoping you could confirm :) I do have node.master: false on those nodes NOW, my point was in the past it was true. I do think this is a bug though, either:
|
double checking - did you have these when you run the
Agreed. The problem is that 1.x didn't write them on data nodes and thus the _upgrade API didn't take it into account. With 2.x this has changed and we write them (and upgrade) on all nodes. |
They must have been there before I did _upgrade, since the last time stamp is over a month ago. Ok I'll just delete them manually. Can I delete index level _state directories on non master nodes always, or only if that data node has no shards? |
you can delete them on all nodes. If it makes things simpler, just do so on nodes that fail to start. They will have no effect o.w. Note that 2.x will write those files right back (and keep maintaining it correctly).
|
Closing this as I presume we diagnosed it correctly and it's a non-issue. Please reopen if this turns out to be wrong. |
Yes I was able to delete them everywhere and it worked. I still think elasticsearch should have just ignored them or deleted them for me. |
ok, i'm having the similar issue with replicated shards. This seems like a bug with the _upgrade, where maybe it doesn't check the version of all the replicates but only the master? |
So I went thru all the nodes and delete the files that matched find . -name "state-*" -ls | grep 201[234] So unlike the original bug report, these shards had index data but were replicates and didn't get upgraded. |
It's the same issue - we check the index level state files on node startup, regardless of whether the data folder has shards or not. Stale files from the node being a master node will cause this issue. Do note (and everyone reading this in the future) that the command you posted can be very dangerous - if you're not careful it will also delete the shard level state files, rendering the data useless and causing data loss. |
I guess I'm confused why this isn't considered a bug in 1.7.3? I can understand if there is no shard data and it was left over, but in this case there was shard data (it was just a replicate) Yes definitely look at the output before removing, although the only thing I found that was 2 years old or later was these files, since in use state files are within the last 2 years. :) |
I agree that this is not the best behevior. It was fixed in 2.0 (we update these files) but the fix was too involved to back port to 1.7. |
Maybe just clear up the message about what exact file is causing the issue? Fixing it in 2.0 doesn't help because I still would have gotten the error if I had got from 1.7.3 to 2.0 right? I guess maybe there aren't many of us who have been upgrading for 4+ years now. :) |
I agree it's rare - but remember the message can also be genuine.
correct
That does make you special. In a very good way :) |
I have a 40 node cluster that I've been upgrading from 0.18 over the years currently running 1.7.3. I've done the _upgrade a few times, and I'm trying to go to 2.1.1. After restarting the full cluster certain nodes have the dreaded (although the index with the issue is different on each)
java.lang.IllegalStateException: The index [files_v2] was created before v0.90.0 and wasn't upgraded. This index should be open using a version before 2.0.0 and upgraded using the upgrade API.
However if I go look at the files_v2 directory on that host there are no shard directories only a _state directory which just has one file state-0.st, which was last changed over a month ago. On the nodes that actually have files_v2 shards, there is no _state directory at the top level they are under the shard directories.
Are there any easy commands to go clean up these no shard indexes?
Maybe the start code should delete them or something?
The text was updated successfully, but these errors were encountered: