Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_upgrade doesn't upgrade index directories with no shards #16044

Closed
awick opened this issue Jan 17, 2016 · 16 comments
Closed

_upgrade doesn't upgrade index directories with no shards #16044

awick opened this issue Jan 17, 2016 · 16 comments
Labels
:Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. feedback_needed

Comments

@awick
Copy link

awick commented Jan 17, 2016

I have a 40 node cluster that I've been upgrading from 0.18 over the years currently running 1.7.3. I've done the _upgrade a few times, and I'm trying to go to 2.1.1. After restarting the full cluster certain nodes have the dreaded (although the index with the issue is different on each)

java.lang.IllegalStateException: The index [files_v2] was created before v0.90.0 and wasn't upgraded. This index should be open using a version before 2.0.0 and upgraded using the upgrade API.

However if I go look at the files_v2 directory on that host there are no shard directories only a _state directory which just has one file state-0.st, which was last changed over a month ago. On the nodes that actually have files_v2 shards, there is no _state directory at the top level they are under the shard directories.

Are there any easy commands to go clean up these no shard indexes?
Maybe the start code should delete them or something?

@bleskes
Copy link
Contributor

bleskes commented Jan 18, 2016

Having a 40 nodes cluster, I presume you use dedicated master nodes? if so, I suspect that the state-0.st files are on the master nodes while the data is stored on data. This is expected behaviour for 1.x (changing in 2.0).

I also understand that you still have the cluster running under 1.7.3 ? can you post the output of GET _cluster/state/metadata/index/ ? I'm interested in the settings and state sections.

Also, do see a message in the data node logs starting with "Not updating settings for the index... because upgraded of some primary shards failed..." ?

@clintongormley clintongormley added feedback_needed :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. labels Jan 18, 2016
@awick
Copy link
Author

awick commented Jan 18, 2016

I have a single master node, although it also has data on it (so it isn't dedicated.)

The master node was NOT the node with the issue in this case, although it had its own issue for other indexes. However this cluster has been running "forever", and at one point (years ago) I didn't have a single master, was just using the default of all hosts can be a master, so it is very possible that it could have been a master in a previous time.

Reading between the lines can I just go delete _state from top level indices that are NOT the master node and that will fix everything? (I did try renaming _state on a node, and that node could start up, so I'm guessing yes.)

  "files_v2" : {
    "state" : "open",
    "settings" : {
      "index" : {
        "number_of_shards" : "2",
        "version" : {
          "created" : "191299",
          "upgraded" : "1070399",
          "minimum_compatible" : "4.10.3"
        },
        "number_of_replicas" : "2",
        "auto_expand_replicas" : "0-2"
      }
    },

@bleskes
Copy link
Contributor

bleskes commented Jan 19, 2016

OK, I think what you are saying is that this state files are from the days that a node used to be a master node but now it's a data node only (i.e., node.master: false is set in elasticsearch.yml). If that's the case then yes, that state file might be confusing the cluster and you can delete it. If you don't have node.master: false on those nodes, deleting the file is dangerous and we should dig further to see what's wrong.

@awick
Copy link
Author

awick commented Jan 19, 2016

I have no clue if that is where the state files are from, that is my guess, I was hoping you could confirm :)

I do have node.master: false on those nodes NOW, my point was in the past it was true.

I do think this is a bug though, either:

  • _upgrade should have upgraded them everywhere, even if no shards
  • they should just be ignored since node.master: false is set

@bleskes
Copy link
Contributor

bleskes commented Jan 19, 2016

I do have node.master: false on those nodes NOW

double checking - did you have these when you run the _upgrade API? if so, you can just delete those index _state file on all data nodes.

_upgrade should have upgraded them everywhere, even if no shards

Agreed. The problem is that 1.x didn't write them on data nodes and thus the _upgrade API didn't take it into account. With 2.x this has changed and we write them (and upgrade) on all nodes.

@awick
Copy link
Author

awick commented Jan 19, 2016

They must have been there before I did _upgrade, since the last time stamp is over a month ago.

Ok I'll just delete them manually. Can I delete index level _state directories on non master nodes always, or only if that data node has no shards?

@bleskes
Copy link
Contributor

bleskes commented Jan 19, 2016

you can delete them on all nodes. If it makes things simpler, just do so on nodes that fail to start. They will have no effect o.w.

Note that 2.x will write those files right back (and keep maintaining it correctly).

On 19 Jan 2016, at 21:56, Andy Wick notifications@github.com wrote:

They must have been there before I did _upgrade, since the last time stamp is over a month ago.

Ok I'll just delete them manually. Can I delete index level _state directories on non master nodes always, or only if that data node has no shards?


Reply to this email directly or view it on GitHub.

@bleskes
Copy link
Contributor

bleskes commented Mar 1, 2016

Closing this as I presume we diagnosed it correctly and it's a non-issue. Please reopen if this turns out to be wrong.

@bleskes bleskes closed this as completed Mar 1, 2016
@awick
Copy link
Author

awick commented Mar 1, 2016

Yes I was able to delete them everywhere and it worked. I still think elasticsearch should have just ignored them or deleted them for me.

@awick
Copy link
Author

awick commented Mar 9, 2016

ok, i'm having the similar issue with replicated shards. This seems like a bug with the _upgrade, where maybe it doesn't check the version of all the replicates but only the master?

@awick
Copy link
Author

awick commented Mar 9, 2016

So I went thru all the nodes and delete the files that matched

find . -name "state-*" -ls | grep 201[234]

So unlike the original bug report, these shards had index data but were replicates and didn't get upgraded.

@bleskes
Copy link
Contributor

bleskes commented Mar 9, 2016

It's the same issue - we check the index level state files on node startup, regardless of whether the data folder has shards or not. Stale files from the node being a master node will cause this issue.

Do note (and everyone reading this in the future) that the command you posted can be very dangerous - if you're not careful it will also delete the shard level state files, rendering the data useless and causing data loss.

@awick
Copy link
Author

awick commented Mar 9, 2016

I guess I'm confused why this isn't considered a bug in 1.7.3? I can understand if there is no shard data and it was left over, but in this case there was shard data (it was just a replicate)

Yes definitely look at the output before removing, although the only thing I found that was 2 years old or later was these files, since in use state files are within the last 2 years. :)

@bleskes
Copy link
Contributor

bleskes commented Mar 10, 2016

I agree that this is not the best behevior. It was fixed in 2.0 (we update these files) but the fix was too involved to back port to 1.7.

@awick
Copy link
Author

awick commented Mar 10, 2016

Maybe just clear up the message about what exact file is causing the issue? Fixing it in 2.0 doesn't help because I still would have gotten the error if I had got from 1.7.3 to 2.0 right? I guess maybe there aren't many of us who have been upgrading for 4+ years now. :)

@bleskes
Copy link
Contributor

bleskes commented Mar 10, 2016

I agree it's rare - but remember the message can also be genuine.

I still would have gotten the error if I had got from 1.7.3 to 2.0 right?

correct

I guess maybe there aren't many of us who have been upgrading for 4+ years now. :)

That does make you special. In a very good way :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. feedback_needed
Projects
None yet
Development

No branches or pull requests

3 participants