_upgrade doesn't upgrade index directories with no shards #16044

awick · 2016-01-17T21:36:19Z

I have a 40 node cluster that I've been upgrading from 0.18 over the years currently running 1.7.3. I've done the _upgrade a few times, and I'm trying to go to 2.1.1. After restarting the full cluster certain nodes have the dreaded (although the index with the issue is different on each)

java.lang.IllegalStateException: The index [files_v2] was created before v0.90.0 and wasn't upgraded. This index should be open using a version before 2.0.0 and upgraded using the upgrade API.

However if I go look at the files_v2 directory on that host there are no shard directories only a _state directory which just has one file state-0.st, which was last changed over a month ago. On the nodes that actually have files_v2 shards, there is no _state directory at the top level they are under the shard directories.

Are there any easy commands to go clean up these no shard indexes?
Maybe the start code should delete them or something?

The text was updated successfully, but these errors were encountered:

bleskes · 2016-01-18T08:33:24Z

Having a 40 nodes cluster, I presume you use dedicated master nodes? if so, I suspect that the state-0.st files are on the master nodes while the data is stored on data. This is expected behaviour for 1.x (changing in 2.0).

I also understand that you still have the cluster running under 1.7.3 ? can you post the output of GET _cluster/state/metadata/index/ ? I'm interested in the settings and state sections.

Also, do see a message in the data node logs starting with "Not updating settings for the index... because upgraded of some primary shards failed..." ?

awick · 2016-01-18T16:44:59Z

I have a single master node, although it also has data on it (so it isn't dedicated.)

The master node was NOT the node with the issue in this case, although it had its own issue for other indexes. However this cluster has been running "forever", and at one point (years ago) I didn't have a single master, was just using the default of all hosts can be a master, so it is very possible that it could have been a master in a previous time.

Reading between the lines can I just go delete _state from top level indices that are NOT the master node and that will fix everything? (I did try renaming _state on a node, and that node could start up, so I'm guessing yes.)

  "files_v2" : {
    "state" : "open",
    "settings" : {
      "index" : {
        "number_of_shards" : "2",
        "version" : {
          "created" : "191299",
          "upgraded" : "1070399",
          "minimum_compatible" : "4.10.3"
        },
        "number_of_replicas" : "2",
        "auto_expand_replicas" : "0-2"
      }
    },

bleskes · 2016-01-19T11:26:35Z

OK, I think what you are saying is that this state files are from the days that a node used to be a master node but now it's a data node only (i.e., node.master: false is set in elasticsearch.yml). If that's the case then yes, that state file might be confusing the cluster and you can delete it. If you don't have node.master: false on those nodes, deleting the file is dangerous and we should dig further to see what's wrong.

awick · 2016-01-19T12:40:19Z

I have no clue if that is where the state files are from, that is my guess, I was hoping you could confirm :)

I do have node.master: false on those nodes NOW, my point was in the past it was true.

I do think this is a bug though, either:

_upgrade should have upgraded them everywhere, even if no shards
they should just be ignored since node.master: false is set

bleskes · 2016-01-19T20:39:22Z

I do have node.master: false on those nodes NOW

double checking - did you have these when you run the _upgrade API? if so, you can just delete those index _state file on all data nodes.

_upgrade should have upgraded them everywhere, even if no shards

Agreed. The problem is that 1.x didn't write them on data nodes and thus the _upgrade API didn't take it into account. With 2.x this has changed and we write them (and upgrade) on all nodes.

awick · 2016-01-19T20:55:50Z

They must have been there before I did _upgrade, since the last time stamp is over a month ago.

Ok I'll just delete them manually. Can I delete index level _state directories on non master nodes always, or only if that data node has no shards?

bleskes · 2016-01-19T20:57:43Z

you can delete them on all nodes. If it makes things simpler, just do so on nodes that fail to start. They will have no effect o.w.

Note that 2.x will write those files right back (and keep maintaining it correctly).

On 19 Jan 2016, at 21:56, Andy Wick notifications@github.com wrote:

They must have been there before I did _upgrade, since the last time stamp is over a month ago.

Ok I'll just delete them manually. Can I delete index level _state directories on non master nodes always, or only if that data node has no shards?

—
Reply to this email directly or view it on GitHub.

bleskes · 2016-03-01T12:55:59Z

Closing this as I presume we diagnosed it correctly and it's a non-issue. Please reopen if this turns out to be wrong.

awick · 2016-03-01T13:34:37Z

Yes I was able to delete them everywhere and it worked. I still think elasticsearch should have just ignored them or deleted them for me.

awick · 2016-03-09T14:48:45Z

ok, i'm having the similar issue with replicated shards. This seems like a bug with the _upgrade, where maybe it doesn't check the version of all the replicates but only the master?

awick · 2016-03-09T15:16:28Z

So I went thru all the nodes and delete the files that matched

find . -name "state-*" -ls | grep 201[234]

So unlike the original bug report, these shards had index data but were replicates and didn't get upgraded.

bleskes · 2016-03-09T16:37:02Z

It's the same issue - we check the index level state files on node startup, regardless of whether the data folder has shards or not. Stale files from the node being a master node will cause this issue.

Do note (and everyone reading this in the future) that the command you posted can be very dangerous - if you're not careful it will also delete the shard level state files, rendering the data useless and causing data loss.

awick · 2016-03-09T17:14:02Z

I guess I'm confused why this isn't considered a bug in 1.7.3? I can understand if there is no shard data and it was left over, but in this case there was shard data (it was just a replicate)

Yes definitely look at the output before removing, although the only thing I found that was 2 years old or later was these files, since in use state files are within the last 2 years. :)

bleskes · 2016-03-10T20:26:50Z

I agree that this is not the best behevior. It was fixed in 2.0 (we update these files) but the fix was too involved to back port to 1.7.

awick · 2016-03-10T20:34:04Z

Maybe just clear up the message about what exact file is causing the issue? Fixing it in 2.0 doesn't help because I still would have gotten the error if I had got from 1.7.3 to 2.0 right? I guess maybe there aren't many of us who have been upgrading for 4+ years now. :)

bleskes · 2016-03-10T20:36:59Z

I agree it's rare - but remember the message can also be genuine.

I still would have gotten the error if I had got from 1.7.3 to 2.0 right?

correct

I guess maybe there aren't many of us who have been upgrading for 4+ years now. :)

That does make you special. In a very good way :)

clintongormley added feedback_needed :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. labels Jan 18, 2016

bleskes closed this as completed Mar 1, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

_upgrade doesn't upgrade index directories with no shards #16044

_upgrade doesn't upgrade index directories with no shards #16044

awick commented Jan 17, 2016

bleskes commented Jan 18, 2016

awick commented Jan 18, 2016

bleskes commented Jan 19, 2016

awick commented Jan 19, 2016

bleskes commented Jan 19, 2016

awick commented Jan 19, 2016

bleskes commented Jan 19, 2016

bleskes commented Mar 1, 2016

awick commented Mar 1, 2016

awick commented Mar 9, 2016

awick commented Mar 9, 2016

bleskes commented Mar 9, 2016

awick commented Mar 9, 2016

bleskes commented Mar 10, 2016

awick commented Mar 10, 2016

bleskes commented Mar 10, 2016

_upgrade doesn't upgrade index directories with no shards #16044

_upgrade doesn't upgrade index directories with no shards #16044

Comments

awick commented Jan 17, 2016

bleskes commented Jan 18, 2016

awick commented Jan 18, 2016

bleskes commented Jan 19, 2016

awick commented Jan 19, 2016

bleskes commented Jan 19, 2016

awick commented Jan 19, 2016

bleskes commented Jan 19, 2016

bleskes commented Mar 1, 2016

awick commented Mar 1, 2016

awick commented Mar 9, 2016

awick commented Mar 9, 2016

bleskes commented Mar 9, 2016

awick commented Mar 9, 2016

bleskes commented Mar 10, 2016

awick commented Mar 10, 2016

bleskes commented Mar 10, 2016