Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Archived settings prevent updating other settings #28026

Closed
nik9000 opened this issue Dec 29, 2017 · 27 comments
Closed

Archived settings prevent updating other settings #28026

nik9000 opened this issue Dec 29, 2017 · 27 comments
Labels
>bug :Core/Infra/Settings Settings infrastructure and APIs

Comments

@nik9000
Copy link
Member

nik9000 commented Dec 29, 2017

It looks like if you:

  1. Start 5.x
  2. Add a persistent cluster setting that is unsupported by 6.x
  3. Upgrade to 6.x
  4. Attempt to update another setting

Then you get an error back about the archived setting not being a valid setting. You can clear the archived setting with PUT _cluster/settings { "persistent": { "archived.*": null } } but you must do this before updating any other settings. It feels like you should be able to deal with the archived settings at your leisure.

I put together a test that reproduces this by adding this to FullClusterRestartIT.

@nik9000 nik9000 added :Core/Infra/Settings Settings infrastructure and APIs discuss labels Dec 29, 2017
@jasontedor
Copy link
Member

We discussed this during Fix-it-Friday and agreed that we should not archive unknown and broken cluster settings. Instead, we should fail to recover the cluster state. The solution for users in an upgrade case would be to rollback to the previous version, address the settings that would be unknown or broken in the next major version, and then proceed with the upgrade.

@otrosien
Copy link

The solution does not seem to apply for transient settings. I'm getting acknowledgement from ES, but the invalid setting stays. (in my case indices.store.throttle.type)

@mayya-sharipova
Copy link
Contributor

@otrosien how were you able to keep transient settings between versions? Did you do a rolling upgrade from 5.6 to 6.x?

@adichad
Copy link

adichad commented Jan 30, 2018

@otrosien 's teammate here. @mayya-sharipova Yes, we did a rolling upgrade of Elasticsearch. after the upgrade, the transient settings remained, but trying to either remove the unsupported setting or change any other setting in the transient set throws the error:

curl -XPUT -H"Content-Type: application/json" -s localhost:9200/_cluster/settings -d '{"transient": { "indices.*":null } }'

> {"error":{"root_cause":[{"type":"remote_transport_exception","reason":"[1Mwia6T][172.31.164.55:9300][cluster:admin/settings/update]"}],"type":"illegal_argument_exception","reason":"unknown setting [indices.store.throttle.type] please check that any required plugins are installed, or check the breaking changes documentation for removed settings"},"status":400}

For us the problem is not "archival" of bad settings, but the complete inability to edit transient settings now that they contain one unsupported setting.

We can update any persistent settings because those were empty before the upgrade, but for the settings that exist in our transient settings, the transient versions take precedence according to documentation: https://www.elastic.co/guide/en/elasticsearch/reference/6.1/cluster-update-settings.html#_precedence_of_settings
so we cannot effectively change any of those settings now.

We would expect to have a bugfix release of Elasticsearch, which allows this cleanup without requiring a full cluster restart.

At this point, the only option we have is to create a new cluster in parallel, index to it, and change DNS settings. This is extremely expensive, because our cluster is large(ish), with 100s of data nodes. service disruption by way of a full-cluster restart is not an option for us.

@bleskes
Copy link
Contributor

bleskes commented Jan 30, 2018

@adichad which exact version are you using? I'm asking because as far as I can tell from glancing at the code, #27671 should allow you to remove that setting.

@otrosien
Copy link

@bleskes the masters are on 6.1.1, the data nodes still on 6.1.0. indices.store.throttle.type is still a cluster-wide setting, so from my understanding #27671 doesn't apply.

@mayya-sharipova
Copy link
Contributor

@otrosien @adichad

indices.store.throttle.type setting was deprecated in 6.0 [1] , so after the upgrade it should have archived prefix added to this setting. Did you try to remove the archived version of this setting:

curl -XPUT -H "Content-Type: application/json" -s localhost:9200/_cluster/settings -d '{"transient": { "archived.indices.*":null } }'

[1]https://www.elastic.co/guide/en/elasticsearch/reference/current/breaking_60_settings_changes.html#_store_throttling_settings

@otrosien
Copy link

@mayya-sharipova we tried all variations of removing that setting. Apparently it was not moved to archived when we upgraded. Is it somehow possible to trigger this?

@scratchy
Copy link

scratchy commented Feb 7, 2018

Having the same issue in #28524

Were unable to rollback, so a force reset solution would be nice.

Since its a production cluster we also dont want to shutdown for this...

@faxm0dem
Copy link

faxm0dem commented Feb 7, 2018

If the official solution is what @jasontedor said, this should really make it to the documentation on rolling upgrade procedure

@scratchy
Copy link

scratchy commented Feb 7, 2018

This should not be the official solution for this.

Getting hell lot of errors downgrading / rollbacking ending in:

nested: IllegalStateException[index [products_37_es/mZ1tmbEdTaeNYSpCquAWGA] version not supported: 6.1.3 the node version is: 6.0.0]; ]

org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];

@jasontedor
Copy link
Member

There is a misunderstanding here. This comment that is being referred to as the "official solution" is not a solution. It is a proposal for how we should change Elasticsearch so that users can not end up in the situation that is causing so many problems here. It requires code changes to implement that solution and a new release carrying that solution.

@faxm0dem
Copy link

faxm0dem commented Feb 7, 2018

Thanks @jasontedor for the clarification.
Is there a workaround for @scratchy who has a cluster with newly created indices, and who therefore cannot rollback?

@mayya-sharipova
Copy link
Contributor

@faxm0dem The current workaround is to remove archived settings by PUT _cluster/settings { "persistent": { "archived.*": null } } . But it looks like deprecated settings have not been added archived prefix. We will discuss in our next meeting possible workarounds from this.

@waltrinehart
Copy link

If you have dedicated master nodes, we were able to workaround this by downgrading them to a previous version (5.6.1 in our case) and then removing the offending settings, then re-upgrading.

@faxm0dem
Copy link

Oh very cool thanks! @scratchy can you try this?

@dustin-decker
Copy link

dustin-decker commented Feb 28, 2018

I work with @waltrinehart. We were able to apply the downgrade master workaround for transient settings, but not for permanent settings. We cannot upgrade from 6.1.1 to 6.2.2 because of the stuck permanent settings. The only way forward that we see is to downgrade to 5.x and do a full cluster restart to remove the permanent setting, which is not really a viable option for us. In the current state we cannot modify cluster settings at all. This also implies that we cannot disable shard allocation before doing a rolling upgrade.

The only real solution that we see right now is a software patch allowing us to remove this setting and move forward.

@dustin-decker
Copy link

dustin-decker commented Mar 6, 2018

We found that shutting down all of our master nodes simultaneously and starting them back up was sufficient to clear the persistent setting.

The cluster still required initializing all the shards even though the data nodes stayed up. This isn't possible for everyone though, so I think an alternative path without such disruption is still needed.

Follow cluster recovery we saw the setting was properly archived and could be removed. Confirms that it is an issue that crops up during rolling upgrades.

@jasontedor
Copy link
Member

We integrated a change (#28888) that will automatically archive any unknown or invalid settings on any settings update. This prevents their presence for failing the request and once archived they can be deleted.

@dorony
Copy link

dorony commented Mar 25, 2018

@jasontedor do you know when this will be released?

@jasontedor
Copy link
Member

@dorony The change #28888 will be in the next 6.2 patch release (6.2.4) which is not yet released although we do not provide release dates.

mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this issue Mar 29, 2018
Currently unknown or invalid cluster settings get archived.

For a better user experience, we stop archving broken cluster settings.
Instead, we will fail to recover the cluster state.
The solution for users in an upgrade case would be to rollback
to the previous version, address the settings that would be unknown
or  invalid the next major version, and then proceed with the upgrade.

Closes elastic#28026
@colings86 colings86 added the >bug label Apr 24, 2018
@ghost
Copy link

ghost commented May 4, 2018

I'm no expert, but I'm suffering from this bug/situation right now and, if you're looking for QA feedback: this has put our production deployment in a very precarious state.

@pjanzen
Copy link

pjanzen commented Nov 23, 2018

I am running ES 6.3.0 and I executed:

curl -H "Content-Type: application/json" -XPUT 'localhost:9200/_cluster/settings' -d '{ "persistent" : { "archived.*":null }}'

and restarted the full cluster. That did it for me.

@mayya-sharipova mayya-sharipova removed their assignment Nov 27, 2018
@DaveCTurner
Copy link
Contributor

The situation described in the OP is still true today (e.g. for upgrades from snapshots built from 7.x to master) but the other points raised in this thread seem to have been addressed by #28888.

Do we still consider this a bug? We could say that if you upgrade your cluster without addressing all the deprecation warnings first then there is a risk that some things may not work for you. In this case it's PUT _cluster/settings that doesn't work, and it's fixable. If we let a cluster carry on without taking explicit action to remove these broken settings then I expect they'll never get removed. I'm raising this for discussion again.

@DaveCTurner
Copy link
Contributor

We discussed this today and agreed that we are happy with the behaviour as it stands, so this can be closed.

@chingis-elastic
Copy link

chingis-elastic commented Jun 26, 2020

Hey team, sorry to dig up an old issue but we just hit this during cloud-observability upgrade (from 6.8 to 7.8). Some of our clusters have setting

xpack.notification.slack.account.<account_name>.url

which is apparently not supported in 7.x and hence got archived.*. I wonder why there wouldn't be an additional check/action in 7.x upgrade assistant to warn about unsupported settings? Or even check and remove them if they have no effect.

When upgrade succeeds, those settings leave cluster basically unusable (at least, on Elastic Cloud)

@DaveCTurner
Copy link
Contributor

@chingis-elastic that this was not caught ahead of the upgrade sounds like it might be a bug somewhere in the deprecation or upgrade assistance areas. Would you open a new issue for it to make sure that gets investigated? Closed issues like this don't normally see any further activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Core/Infra/Settings Settings infrastructure and APIs
Projects
None yet
Development

No branches or pull requests