Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Putting index settings in twice causes ES to crash from assertion error #95347

Open
dakrone opened this issue Apr 18, 2023 · 7 comments
Open
Labels
>bug :Core/Infra/Settings Settings infrastructure and APIs Team:Core/Infra Meta label for core/infra team

Comments

@dakrone
Copy link
Member

dakrone commented Apr 18, 2023

Elasticsearch Version

8.8.0-SNAPSHOT

Installed Plugins

No response

Java Version

bundled

OS Version

n/a

Problem Description

It's possible to cause an assert failure by putting the same index settings into an index twice.

Steps to Reproduce

When running on the main branch, with ./gradlew run (so that asserts are enabled), it is possible to easily crash Elasticsearch with the following:

PUT /foo

PUT /foo/_settings
{
  "index.auto_expand_replicas": "0-all",
  "index.number_of_replicas": 1
}

PUT /foo/_settings
{
  "index.auto_expand_replicas": "0-all",
  "index.number_of_replicas": 1
}

Putting the same settings in twice causes this assertion to trip:

[2023-04-18T14:35:46,835][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [runTask-0] fatal error in thread [elasticsearch[runTask-0][clusterApplierService#updateTask][T#1]], exiting java.lang.AssertionError: Index updates are expected as index settings version has changed
	at org.elasticsearch.server@8.8.0-SNAPSHOT/org.elasticsearch.index.IndexService.updateMetadata(IndexService.java:806)
	at org.elasticsearch.server@8.8.0-SNAPSHOT/org.elasticsearch.indices.cluster.IndicesClusterStateService.updateIndices(IndicesClusterStateService.java:539)
	at org.elasticsearch.server@8.8.0-SNAPSHOT/org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:248)
	at org.elasticsearch.server@8.8.0-SNAPSHOT/org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:536)
	at org.elasticsearch.server@8.8.0-SNAPSHOT/org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:522)
	at org.elasticsearch.server@8.8.0-SNAPSHOT/org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:495)
	at org.elasticsearch.server@8.8.0-SNAPSHOT/org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:426)
	at org.elasticsearch.server@8.8.0-SNAPSHOT/org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:154)
	at org.elasticsearch.server@8.8.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:916)
	at org.elasticsearch.server@8.8.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:217)
	at org.elasticsearch.server@8.8.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:183)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)

Logs (if relevant)

No response

@dakrone dakrone added >bug :Core/Infra/Settings Settings infrastructure and APIs labels Apr 18, 2023
@elasticsearchmachine elasticsearchmachine added the Team:Core/Infra Meta label for core/infra team label Apr 18, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@thecoop
Copy link
Member

thecoop commented Apr 19, 2023

The reason this assertion is tripping is because ClusterChangedEvent.indexMetadataChanged uses reference equality test - the identity has changed - but the next bit of code to run in IndexService.updateMetadata uses a value equality test - no settings have actually changed. So the index service gets confused as to why it has actually been called, the version has updated, but no settings have changed.

So the best thing to do here might be to just remove these assertions - its ok for the settings version to increment without anything changing. But this is more distributed's area, so pinging them for a second opinion

@thecoop thecoop added the :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. label Apr 19, 2023
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Meta label for distributed team label Apr 19, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@DaveCTurner
Copy link
Contributor

It has been suggested that we reject attempts to set the number of replicas while auto_expand_replicas is in play: #27835. I think that's the only way you can hit this. I even opened a PR to show how we could deprecate that behaviour in #73230. I'd prefer that we at least ignore updates to number_of_replicas in this situation rather than accepting them and then reverting them, and leave the assertion alone, but I would be ok with weakening the assertion in this situation too. I'd rather not remove it tho.

That said, this is definitely a :Core/Infra/Settings question IMO

@DaveCTurner DaveCTurner removed :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. Team:Distributed Meta label for distributed team labels Apr 19, 2023
@rjernst
Copy link
Member

rjernst commented Apr 25, 2023

@DaveCTurner This doesn't seem to have to do with the specific settings set in this issue, but in general with how the settings differences are calculated, as @thecoop noted. Since the assertions in question live in IndexService, it seems to me to be more of a question for the Distributed team, though this is probably a gray area. Are you comfortable removing the assertions?

@andreidan
Copy link
Contributor

@rjernst @DaveCTurner Would it help if I add another reproduction case here?

Namely, trying to modify a data stream to replace an existing backing index with a downsample index will run into this problem as well (https://github.com/elastic/elasticsearch/pull/91141/files#diff-3525330b076f1366498e871381f6f4a3fe7e0127b55e863227a3b98cf29d7e6cR128 )

The workaround is not great https://github.com/elastic/elasticsearch/pull/91141/files#diff-3525330b076f1366498e871381f6f4a3fe7e0127b55e863227a3b98cf29d7e6cR110

@dakrone
Copy link
Member Author

dakrone commented Nov 6, 2023

Is there any update on this? It still reproduces on main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Core/Infra/Settings Settings infrastructure and APIs Team:Core/Infra Meta label for core/infra team
Projects
None yet
Development

No branches or pull requests

6 participants