You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
OS version:
Debian 8.5
Linux 8e884c0e66a2 4.4.15-moby # 1 SMP Thu Jul 28 22:03:07 UTC 2016 x86_64 GNU/Linux
Description of the problem including expected versus actual behavior:
ES arbitrarily accepts any integer for index.merge.scheduler.max_thread_count even if it exceeds internal limits which then causes various scheduled tasks to throw exceptions and eventually break indexing.
Steps to reproduce:
We experienced this on our production cluster but I was able to reproduce it using a couple of docker containers.
The invalid max_thread_count causes all sorts of exceptions:
Updating the index settings:
[2016-09-08 03:50:48,676][INFO ][index.shard ] [Abe Brown] [test_index][3] updating [index.merge.scheduler.max_thread_count] from [7] to [12]
[2016-09-08 03:50:48,677][WARN ][index.settings ] [Abe Brown] [test_index] failed to refresh settings for [org.elasticsearch.index.shard.IndexShard$ApplyRefreshSettings@27098528]
java.lang.IllegalArgumentException: maxThreadCount should be <= maxMergeCount (= 7)
at org.apache.lucene.index.ConcurrentMergeScheduler.setMaxMergesAndThreads(ConcurrentMergeScheduler.java:154)
at org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.refreshConfig(ElasticsearchConcurrentMergeScheduler.java:176)
at org.elasticsearch.index.engine.InternalEngine.onSettingsChanged(InternalEngine.java:1268)
at org.elasticsearch.index.shard.IndexShard$ApplyRefreshSettings.onRefreshSettings(IndexShard.java:1325)
at org.elasticsearch.index.settings.IndexSettingsService.refreshSettings(IndexSettingsService.java:54)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.applySettings(IndicesClusterStateService.java:341)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:180)
at org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(InternalClusterService.java:610)
at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:772)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Various errors seen while indexing and optimizing:
[2016-09-08 03:58:16,128][WARN ][index.shard ] [Abe Brown] [test_index][3] Failed to perform scheduled engine refresh
java.lang.IllegalArgumentException: maxThreadCount should be <= maxMergeCount (= 7)
at org.apache.lucene.index.ConcurrentMergeScheduler.setMaxMergesAndThreads(ConcurrentMergeScheduler.java:154)
at org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.refreshConfig(ElasticsearchConcurrentMergeScheduler.java:176)
at org.elasticsearch.index.engine.InternalEngine.refresh(InternalEngine.java:685)
at org.elasticsearch.index.shard.IndexShard.refresh(IndexShard.java:661)
at org.elasticsearch.index.shard.IndexShard$EngineRefresher$1.run(IndexShard.java:1343)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
[2016-09-08 03:58:36,218][WARN ][indices.memory ] [Abe Brown] failed to set shard [test_index][1] index buffer to [24.7mb]
org.apache.lucene.store.AlreadyClosedException: translog [2] is already closed
at org.elasticsearch.index.translog.TranslogWriter.ensureOpen(TranslogWriter.java:329)
at org.elasticsearch.index.translog.BufferingTranslogWriter.updateBufferSize(BufferingTranslogWriter.java:143)
at org.elasticsearch.index.translog.Translog.updateBuffer(Translog.java:406)
at org.elasticsearch.index.shard.IndexShard.updateBufferSize(IndexShard.java:1164)
at org.elasticsearch.indices.memory.IndexingMemoryController.updateShardBuffers(IndexingMemoryController.java:232)
at org.elasticsearch.indices.memory.IndexingMemoryController$ShardsIndicesStatusChecker.run(IndexingMemoryController.java:286)
at org.elasticsearch.threadpool.ThreadPool$ReschedulingRunnable.doRun(ThreadPool.java:1057)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at org.elasticsearch.threadpool.ThreadPool$LoggingRunnable.run(ThreadPool.java:668)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
[2016-09-08 03:53:05,512][DEBUG][action.admin.indices.forcemerge] [Abe Brown] [indices:admin/optimize] failed to execute operation for shard [[test_index][1], node[6jC3WEUrQWifFvTn-twoBw], [P], v[3], s[STARTED], a[id=Fa-2IEiESai0wk2csElq0Q]]
[test_index][[test_index][1]] FlushFailedEngineException[Flush failed]; nested: IllegalStateException[already committing a translog with generation: 1];
at org.elasticsearch.index.engine.InternalEngine.flush(InternalEngine.java:762)
at org.elasticsearch.index.engine.InternalEngine.forceMerge(InternalEngine.java:863)
at org.elasticsearch.index.shard.IndexShard.forceMerge(IndexShard.java:793)
at org.elasticsearch.action.admin.indices.forcemerge.TransportForceMergeAction.shardOperation(TransportForceMergeAction.java:79)
at org.elasticsearch.action.admin.indices.forcemerge.TransportForceMergeAction.shardOperation(TransportForceMergeAction.java:46)
at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.onShardOperation(TransportBroadcastByNodeAction.java:436)
at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.messageReceived(TransportBroadcastByNodeAction.java:415)
at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.messageReceived(TransportBroadcastByNodeAction.java:402)
at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:33)
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:77)
at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:376)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: already committing a translog with generation: 1
at org.elasticsearch.index.translog.Translog.prepareCommit(Translog.java:1785)
at org.elasticsearch.index.engine.InternalEngine.flush(InternalEngine.java:753)
... 14 more
[2016-09-08 03:59:27,335][WARN ][rest.suppressed ] path: //test_index/test_type, params: {index=test_index, type=test_type}
RemoteTransportException[[Abe Brown][172.19.0.2:9300][indices:data/write/index[p]]]; nested: CreateFailedEngineException[Create failed for [test_type#AVcH80pjYp3inMh6kFUK]]; nested: AlreadyClosedException[translog is already closed];
Caused by: [test_index][[test_index][1]] CreateFailedEngineException[Create failed for [test_type#AVcH80pjYp3inMh6kFUK]]; nested: AlreadyClosedException[translog is already closed];
at org.elasticsearch.index.engine.InternalEngine.create(InternalEngine.java:351)
at org.elasticsearch.index.shard.IndexShard.create(IndexShard.java:545)
at org.elasticsearch.index.engine.Engine$Create.execute(Engine.java:810)
at org.elasticsearch.action.index.TransportIndexAction.executeIndexRequestOnPrimary(TransportIndexAction.java:236)
at org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:157)
at org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:66)
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.doRun(TransportReplicationAction.java:648)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:279)
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:271)
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:77)
at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:376)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.lucene.store.AlreadyClosedException: translog is already closed
at org.elasticsearch.index.translog.Translog.ensureOpen(Translog.java:1888)
at org.elasticsearch.index.translog.Translog.add(Translog.java:544)
at org.elasticsearch.index.engine.InternalEngine.innerCreateNoLock(InternalEngine.java:437)
at org.elasticsearch.index.engine.InternalEngine.innerCreate(InternalEngine.java:375)
at org.elasticsearch.index.engine.InternalEngine.create(InternalEngine.java:346)
... 15 more
Interestingly, closing and reopening the index will restore things to working order with the configured thread count.
Returning the max_thread_count to 1 shows that it has increased the thread count above what was previously set:
PUT /test_index/_settings
{
"index.merge.scheduler.max_thread_count": 1
}
[2016-09-08 04:14:22,483][INFO ][index.shard ] [Abe Brown] [test_index][1] updating [index.merge.scheduler.max_thread_count] from [17] to [1]
[2016-09-08 04:14:22,484][INFO ][index.shard ] [Abe Brown] [test_index][2] updating [index.merge.scheduler.max_thread_count] from [17] to [1]
[2016-09-08 04:14:22,484][INFO ][index.shard ] [Abe Brown] [test_index][3] updating [index.merge.scheduler.max_thread_count] from [17] to [1]
[2016-09-08 04:14:22,484][INFO ][index.shard ] [Abe Brown] [test_index][0] updating [index.merge.scheduler.max_thread_count] from [17] to [1]
The text was updated successfully, but these errors were encountered:
Manual backport of #20383
Since settings are not validate in 2.x, this change catch the exception if the merge settings are not valid and do not apply them in the current CMS.
The downside is that we'll try to apply the new settings if the index is reopened or if the node is restarted, this will fail and the only solution for the user is to re-update the merge settings back to valid values.
Relates #20380
Elasticsearch version: 2.3.4 / 2.4.0
Plugins installed: kopf
JVM version:
version: "1.8.0_102",
vm_name: "OpenJDK 64-Bit Server VM",
vm_version: "25.102-b14"",
vm_vendor: "Oracle Corporation",
OS version:
Debian 8.5
Linux 8e884c0e66a2 4.4.15-moby # 1 SMP Thu Jul 28 22:03:07 UTC 2016 x86_64 GNU/Linux
Description of the problem including expected versus actual behavior:
ES arbitrarily accepts any integer for
index.merge.scheduler.max_thread_count
even if it exceeds internal limits which then causes various scheduled tasks to throw exceptions and eventually break indexing.Steps to reproduce:
We experienced this on our production cluster but I was able to reproduce it using a couple of docker containers.
Create an index:
Update the max thread count:
Alternate creating documents ...
And optimizing the index (to force a flush):
This will shortly lead to translog failures:
Provide logs (if relevant):
The invalid
max_thread_count
causes all sorts of exceptions:Updating the index settings:
Various errors seen while indexing and optimizing:
Interestingly, closing and reopening the index will restore things to working order with the configured thread count.
Returning the
max_thread_count
to 1 shows that it has increased the thread count above what was previously set:The text was updated successfully, but these errors were encountered: