default.path.data included as a data path when path.data configured as yaml array #23981

tylerfontaine · 2017-04-07T18:39:42Z

Elasticsearch version: 5.3.0

OS version: Tried in Centos 7 and OS X, though I suspect it doesn't matter.

Description of the problem including expected versus actual behavior:
When using multiple data paths, or even specifying just a single data path as an array in yaml, elasticearch includes whatever is passed to it as default.path.data, such as when installed via RPM or DEB packages, from the systemd unit OR the init.d script, since you set this when you execute the es binary:

-Edefault.path.data=${DATA_DIR}

Steps to reproduce:
Repro:

Set this in the elasticsearch.yml

path.data:
- "/some/datapath1"

run this:

bin/elasticsearch -Edefault.path.data="/some/datapath2"

And elasticsearch will start configured with both data paths.

Provide logs (if relevant):

[2017-04-07T13:30:49,987][INFO ][o.e.e.NodeEnvironment    ] [jCWIeyS] using [2] data paths, mounts [[/ (/dev/disk1)]], net usable_space [198.8gb], net total_space [464.7gb], spins? [unknown], types [hfs]

This also happens when you use this array format:

path.data: ["/some/datapath1"]

The text was updated successfully, but these errors were encountered:

jasontedor · 2017-04-07T20:08:05Z

I know what is happening here and I see a fix, I will open a PR soon. This is an incredibly unfortunate consequence of the fact that when a setting is specified as an array, we break it down into new keys path.data.0, path.data.1, ..., path.data.n, yet the default setting is applied as path.data so none of the keys from the array setting override as is done with other default settings.

jasontedor · 2017-04-07T20:36:59Z

This is a horrifically bad bug, it appears to have been introduced in 5.3.0, and it means that shards can end up in /var/lib/elasticsearch even if a user did not intend for that to be the case because they configured path.data as an array but the packaging specifies default.path.data. If /var/lib/elasticsearch is sitting on the root partition because why not since the user wants the data elsewhere, the user can easily end up with:

shards not where they wanted them
the root partition can fill up

It gets worse. Let's say that we fix this bug in 5.3.1 and a user migrates from 5.3.0 not aware of the fiasco that is ensuing here. They start up Elasticsearch on 5.3.1, and now their shards are missing! If they're not careful, they are in data loss territory.

nik9000 · 2017-04-07T20:39:48Z

😭

nik9000 · 2017-04-08T01:40:28Z

Let's say that we fix this bug in 5.3.1 and a user migrates from 5.3.0 not aware of the fiasco that is ensuing here. They start up Elasticsearch on 5.3.1, and now their shards are missing! If they're not careful, they are in data loss territory.

If we just fix the config issue and someone upgrades one node at a time they shouldn't lose any data so long as they have replicas. The entire shard will have to recover from the primary, but that is a fairly normal thing. They'd have to manually clean up the shards in accidental data directory. Which isn't horrible, but it isn't great at all.

We could do more to try and copy the data or make tool or something.

Minimally anyone on 5.3 will want to be able to know whether they are effected by this issue before they upgrade so they know what they have to do as part of the upgrade.

jasontedor · 2017-04-08T01:44:33Z

If we just fix the config issue and someone upgrades one node at a time they shouldn't lose any data so long as they have replicas.

That's the problem, we do not have complete control over this.

They'd have to manually clean up the shards in accidental data directory. Which isn't horrible, but it isn't great at all.

Yes, if they are aware of the problem and do not panic. Or they do not do something like practice immutable infrastructure.

We could do more to try and copy the data or make tool or something.

Yes, but it's quite delicate.

Minimally anyone on 5.3 will want to be able to know whether they are effected by this issue before they upgrade so they know what they have to do as part of the upgrade.

Indeed, and therein lies the rub.

nik9000 · 2017-04-08T01:49:25Z

Yeah, I just wanted to start talking about this. I certainly don't claim to have answers. I guess I'm fairly sure that the person doing the upgrade is going to want to know if they are effected by this. Lots of people won't and 5.3.0 will just be a normal version for them. For the effected people, even if we handle the whole thing automatically, they are going to need to know.

jasontedor · 2017-04-08T01:57:34Z

Yeah, I just wanted to start talking about this.

Indeed, we need more people thinking about this. I discussed some of the complexities with @dakrone earlier in the evening via another channel, and I've been thinking about it all night. Sorry that I haven't taken the time to write my thoughts down yet.

Another complexity that I thought of is that the fix for this has to live for the rest of the 5.x series lifecycle because someone can upgrade from 5.3.0 to any future release in the 5.x series.

And yet another complexity that I thought is a situation where someone has their default.path.data included as one of the values in path.data (assuming multiple path.data).

For the effected people, even if we handle the whole thing automatically, they are going to need to know.

I completely agree.

tylerfontaine · 2017-04-08T02:00:11Z

Do we have set limitations yet on rolling upgrades to 6.0? Would 5.3.0 -> 6.0 be allowed? If so, the fix has to carry over there as well.

…

On Fri, Apr 7, 2017, 8:58 PM Jason Tedor ***@***.***> wrote: Yeah, I just wanted to start talking about this. Indeed, I need more people to talk to about this. I discussed some of the complexities with @dakrone <https://github.com/dakrone> earlier in the evening via another channel, and I've been thinking about it all night. Sorry that I haven't taken the time to write my thoughts down yet. Another complexity that I thought of is that the fix for this has to live for the rest of the 5.x series lifecycle because someone can upgrade from 5.3.0 to any future release in the 5.x series. And yet another complexity that I thought is a situation where someone has their default.path.data included as one of the values in path.data (assuming multiple path.data). For the effected people, even if we handle the whole thing automatically, they are going to need to know. I completely agree. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#23981 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AB1ZYHwbhJ9d1nXtq3sn8vTdcT_x87O0ks5rtumtgaJpZM4M3P2Y> .

jasontedor · 2017-04-08T02:01:36Z

No, this is absolutely not allowed. You can only rolling restart upgrade from the latest 5.x that is released when 6.0.0 goes GA.

nik9000 · 2017-04-08T02:01:49Z

Do we have set limitations yet on rolling upgrades to 6.0? Would 5.3.0 ->
6.0 be allowed? If so, the fix has to carry over there as well.

We'll make sure there is a 5.4 and so that doesn't have to happen.

And yet another complexity that I thought is a situation where someone has their default.path.data included as one of the values in path.data (assuming multiple path.data).

So any detection we have will produce a false positive for them, right?

tylerfontaine · 2017-04-08T02:01:58Z

In fact, full cluster shutdown major version upgrades are even MORE dangerous in this instance. Because all of the shard disappear at the same time.

jasontedor · 2017-04-08T02:02:23Z

So any detection we have will produce a false positive for them, right?

I think it has to be smart enough to not produce a false positive for them.

tylerfontaine · 2017-04-08T02:02:51Z

Just because a 5.4 exists, doesn't mean people will be there when they upgrade.

jasontedor · 2017-04-08T02:03:01Z

In fact, full cluster shutdown major version upgrades are even MORE dangerous in this instance. Because all of the shard disappear at the same time.

Indeed, that means that we need a fix in 6.x as well, someone could full cluster restart upgrade from 5.3.0.

nik9000 · 2017-04-08T02:04:12Z

Just because a 5.4 exists, doesn't mean people will be there when they upgrade.

Right, sorry. They won't be able to do a rolling upgrade. Just a full cluster restart upgrade.

s1monw · 2017-04-10T10:02:37Z

I think we should look into removing the default option here. It's too much magic IMO and we can give people a good error message in that case. We can even advice how to copy over stuff to the actual data directories OR do it ourself. We had code for this in 2.x that we can reuse when we moved away from rolling our own software raid.

jasontedor · 2017-04-10T15:22:23Z

I think we should look into removing the default option here.

Today this is used to support the packaging. I think that we can remove this if we instead ship the elasticsearch.yml with the distribution packaging to include explicit values path.data, path.logs, and path.conf. Sadly, I think that as a result of the situation here, we can not do this until 7.0.0.

s1monw · 2017-04-11T06:19:53Z

Sadly, I think that as a result of the situation here, we can not do this until 7.0.0

that is a critical issue, I think we have to find a way to either fix it reliably or we need to drop stuff and I am not hesitating to even do this in a minor if it means dataloss or anything along those lines.

Adds a packaging test that would have detected #23981. The test configures path.data and validates that nothing ends up in default.path.data.

dakrone self-assigned this Apr 7, 2017

dakrone added the >bug label Apr 7, 2017

jasontedor assigned jasontedor and unassigned dakrone Apr 7, 2017

jasontedor added critical v5.3.0 :Core/Infra/Settings Settings infrastructure and APIs labels Apr 7, 2017

nik9000 mentioned this issue Apr 10, 2017

Add package test for bad data.path setting from #23981 #24029

Merged

geekpete mentioned this issue Apr 11, 2017

Issue supplying datadir as a string, not as array. voxpupuli/puppet-elasticsearch#802

Closed

This was referenced Apr 11, 2017

Correct handling of default settings and path.data #24052

Closed

Correct handling of default and array settings #24074

Merged

Remove support for default settings #24093

Merged

jasontedor mentioned this issue Apr 14, 2017

Detect remnants of path.data/default.path.data bug #24099

Merged

jasontedor closed this as completed in #24099 Apr 17, 2017

nik9000 added a commit that referenced this issue Apr 17, 2017

Add package test to catch leaking default.data.path setting (#24029)

9653551

Adds a packaging test that would have detected #23981. The test configures path.data and validates that nothing ends up in default.path.data.

nik9000 added a commit that referenced this issue Apr 17, 2017

Add package test to catch leaking default.data.path setting (#24029)

c3011cf

Adds a packaging test that would have detected #23981. The test configures path.data and validates that nothing ends up in default.path.data.

nik9000 added a commit that referenced this issue Apr 17, 2017

Add package test to catch leaking default.data.path setting (#24029)

5ea212d

Adds a packaging test that would have detected #23981. The test configures path.data and validates that nothing ends up in default.path.data.

nik9000 added a commit that referenced this issue Apr 17, 2017

Add package test to catch leaking default.data.path setting (#24029)

b570380

Adds a packaging test that would have detected #23981. The test configures path.data and validates that nothing ends up in default.path.data.

antdavidl mentioned this issue Apr 24, 2017

Wrong checking of #23981 bug #24283

Closed

eskibars mentioned this issue May 5, 2017

java.lang.IllegalArgumentException: unknown setting [default.path.home] Elasticsearch 5.4.0 voxpupuli/puppet-elasticsearch#818

Closed

andreas-p mentioned this issue Oct 4, 2017

default.data.path tried although data.path defined #26883

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

default.path.data included as a data path when path.data configured as yaml array #23981

default.path.data included as a data path when path.data configured as yaml array #23981

tylerfontaine commented Apr 7, 2017 •

edited

jasontedor commented Apr 7, 2017 •

edited

jasontedor commented Apr 7, 2017

nik9000 commented Apr 7, 2017

nik9000 commented Apr 8, 2017

jasontedor commented Apr 8, 2017

nik9000 commented Apr 8, 2017

jasontedor commented Apr 8, 2017 •

edited

tylerfontaine commented Apr 8, 2017 via email

jasontedor commented Apr 8, 2017 •

edited

nik9000 commented Apr 8, 2017

tylerfontaine commented Apr 8, 2017

jasontedor commented Apr 8, 2017

tylerfontaine commented Apr 8, 2017

jasontedor commented Apr 8, 2017 •

edited

nik9000 commented Apr 8, 2017

s1monw commented Apr 10, 2017

jasontedor commented Apr 10, 2017

s1monw commented Apr 11, 2017

default.path.data included as a data path when path.data configured as yaml array #23981

default.path.data included as a data path when path.data configured as yaml array #23981

Comments

tylerfontaine commented Apr 7, 2017 • edited

jasontedor commented Apr 7, 2017 • edited

jasontedor commented Apr 7, 2017

nik9000 commented Apr 7, 2017

nik9000 commented Apr 8, 2017

jasontedor commented Apr 8, 2017

nik9000 commented Apr 8, 2017

jasontedor commented Apr 8, 2017 • edited

tylerfontaine commented Apr 8, 2017 via email

jasontedor commented Apr 8, 2017 • edited

nik9000 commented Apr 8, 2017

tylerfontaine commented Apr 8, 2017

jasontedor commented Apr 8, 2017

tylerfontaine commented Apr 8, 2017

jasontedor commented Apr 8, 2017 • edited

nik9000 commented Apr 8, 2017

s1monw commented Apr 10, 2017

jasontedor commented Apr 10, 2017

s1monw commented Apr 11, 2017

tylerfontaine commented Apr 7, 2017 •

edited

jasontedor commented Apr 7, 2017 •

edited

jasontedor commented Apr 8, 2017 •

edited

jasontedor commented Apr 8, 2017 •

edited

jasontedor commented Apr 8, 2017 •

edited