Skip to content

[Bug]: The task submission failed, with no retries after the timeout failure, and it remained blocked: Task has been reset or not yet scheduled #4235

@lichaohao

Description

@lichaohao

What happened?

There are often occasional tables (which are different each time) where some tasks fail to commit, and all operations on this table become stuck. Cancel process also unsuccessful
Error message: (Spark external optimizer) TaskRuntime Exception: Task has been reset or not scheduled

log:
2026-05-23 20:08:41,213 INFO [thrift-server-optimize-manager-25511] [org.apache.amoro.server.DefaultOptimizingService] [] - Ack task OptimizingTaskId(processId:1457797617278976, taskId:51) by optimizer b15a10a8-8edc-4a72-902f-7a159e2b8f87 (threadId 29)
2026-05-23 20:08:41,222 INFO [optimizer-keeper-thread] [org.apache.amoro.server.DefaultOptimizingService] [] - Task OptimizingTaskId(processId:1457797617278976, taskId:52) is suspending, since it's optimizer is expired, put it to retry queue, optimizer b15a10a8-8edc-4a72-902f-7a159e2b8f87:9
2026-05-23 20:08:41,223 INFO [thrift-server-optimize-manager-25511] [org.apache.amoro.server.DefaultOptimizingService] [] - Optimizer b15a10a8-8edc-4a72-902f-7a159e2b8f87 (threadId 0) complete task OptimizingTaskId(processId:1457797617278976, taskId:39) (status: SUCCESS)
2026-05-23 20:08:41,224 INFO [thrift-server-optimize-manager-25472] [org.apache.amoro.server.DefaultOptimizingService] [] - OptimizerThread OptimizerThread{threadId=9, optimizer=OptimizerInstance{token=b15a10a8-8edc-4a72-902f-7a159e2b8f87, startTime=1779066770938, touchTime=1779538111241}} polled task OptimizingTaskId(processId:1457797617278976, taskId:52)
2026-05-23 20:08:41,231 INFO [thrift-server-optimize-manager-25472] [org.apache.amoro.server.DefaultOptimizingService] [] - Ack task OptimizingTaskId(processId:1457797617278976, taskId:52) by optimizer b15a10a8-8edc-4a72-902f-7a159e2b8f87 (threadId 9)
2026-05-23 20:08:41,233 ERROR [thrift-server-optimize-manager-25472] [org.apache.amoro.server.persistence.PersistentBase] [] - failed to commit transaction
org.apache.amoro.exception.TaskRuntimeException: Task has been reset or not yet scheduled, taskId:OptimizingTaskId(processId:1457797617278976, taskId:52)
at org.apache.amoro.server.optimizing.TaskRuntime.validThread(TaskRuntime.java:252) ~[amoro-ams-0.9-SNAPSHOT.jar:0.9-SNAPSHOT]
at org.apache.amoro.server.optimizing.TaskRuntime.lambda$ack$3(TaskRuntime.java:138) ~[amoro-ams-0.9-SNAPSHOT.jar:0.9-SNAPSHOT]
at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) ~[?:1.8.0_372]
at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647) ~[?:1.8.0_372]
at org.apache.amoro.server.persistence.PersistentBase.doAsTransaction(PersistentBase.java:91) [amoro-ams-0.9-SNAPSHOT.jar:0.9-SNAPSHOT]
at org.apache.amoro.server.persistence.StatedPersistentBase.invokeConsistency(StatedPersistentBase.java:47) [amoro-ams-0.9-SNAPSHOT.jar:0.9-SNAPSHOT]
at org.apache.amoro.server.optimizing.TaskRuntime.ack(TaskRuntime.java:136) [amoro-ams-0.9-SNAPSHOT.jar:0.9-SNAPSHOT]
at org.apache.amoro.server.optimizing.OptimizingQueue$TableOptimizingProcess.ackTask(OptimizingQueue.java:626) [amoro-ams-0.9-SNAPSHOT.jar:0.9-SNAPSHOT]
at org.apache.amoro.server.optimizing.OptimizingQueue$TableOptimizingProcess.access$100(OptimizingQueue.java:493) [amoro-ams-0.9-SNAPSHOT.jar:0.9-SNAPSHOT]
at org.apache.amoro.server.optimizing.OptimizingQueue.ackTask(OptimizingQueue.java:401) [amoro-ams-0.9-SNAPSHOT.jar:0.9-SNAPSHOT]
at org.apache.amoro.server.DefaultOptimizingService.ackTask(DefaultOptimizingService.java:279) [amoro-ams-0.9-SNAPSHOT.jar:0.9-SNAPSHOT]
at sun.reflect.GeneratedMethodAccessor207.invoke(Unknown Source) ~[?:?]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_372]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_372]
at org.apache.amoro.server.utils.ThriftServiceProxy.invoke(ThriftServiceProxy.java:56) [amoro-ams-0.9-SNAPSHOT.jar:0.9-SNAPSHOT]
at com.sun.proxy.$Proxy48.ackTask(Unknown Source) [?:?]
at org.apache.amoro.api.OptimizingService$Processor$ackTask.getResult(OptimizingService.java:724) [amoro-common-0.9-SNAPSHOT.jar:0.9-SNAPSHOT]
at org.apache.amoro.api.OptimizingService$Processor$ackTask.getResult(OptimizingService.java:700) [amoro-common-0.9-SNAPSHOT.jar:0.9-SNAPSHOT]
at org.apache.amoro.shade.thrift.org.apache.thrift.ProcessFunction.process(ProcessFunction.java:40) [amoro-shade-thrift-0.20.0-0.7.0-incubating.jar:0.20.0-0.7.0-incubating]
at org.apache.amoro.shade.thrift.org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:40) [amoro-shade-thrift-0.20.0-0.7.0-incubating.jar:0.20.0-0.7.0-incubating]
at org.apache.amoro.shade.thrift.org.apache.thrift.TMultiplexedProcessor.process(TMultiplexedProcessor.java:147) [amoro-shade-thrift-0.20.0-0.7.0-incubating.jar:0.20.0-0.7.0-incubating]
at org.apache.amoro.shade.thrift.org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:492) [amoro-shade-thrift-0.20.0-0.7.0-incubating.jar:0.20.0-0.7.0-incubating]
at org.apache.amoro.shade.thrift.org.apache.thrift.server.Invocation.run(Invocation.java:19) [amoro-shade-thrift-0.20.0-0.7.0-incubating.jar:0.20.0-0.7.0-incubating]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_372]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_372]
at java.lang.Thread.run(Thread.java:750) [?:1.8.0_372]
2026-05-23 20:08:41,280 ERROR [thrift-server-optimize-manager-25472] [org.apache.amoro.server.TableManagementService] [] - Thrift service:DefaultOptimizingService.ackTask execute failed
org.apache.amoro.exception.TaskRuntimeException: Task has been reset or not yet scheduled, taskId:OptimizingTaskId(processId:1457797617278976, taskId:52)
at org.apache.amoro.server.optimizing.TaskRuntime.validThread(TaskRuntime.java:252) ~[amoro-ams-0.9-SNAPSHOT.jar:0.9-SNAPSHOT]
at org.apache.amoro.server.optimizing.TaskRuntime.lambda$ack$3(TaskRuntime.java:138) ~[amoro-ams-0.9-SNAPSHOT.jar:0.9-SNAPSHOT]
at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) ~[?:1.8.0_372]
at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647) ~[?:1.8.0_372]
at org.apache.amoro.server.persistence.PersistentBase.doAsTransaction(PersistentBase.java:91) ~[amoro-ams-0.9-SNAPSHOT.jar:0.9-SNAPSHOT]
at org.apache.amoro.server.persistence.StatedPersistentBase.invokeConsistency(StatedPersistentBase.java:47) ~[amoro-ams-0.9-SNAPSHOT.jar:0.9-SNAPSHOT]
at org.apache.amoro.server.optimizing.TaskRuntime.ack(TaskRuntime.java:136) ~[amoro-ams-0.9-SNAPSHOT.jar:0.9-SNAPSHOT]
at org.apache.amoro.server.optimizing.OptimizingQueue$TableOptimizingProcess.ackTask(OptimizingQueue.java:626) ~[amoro-ams-0.9-SNAPSHOT.jar:0.9-SNAPSHOT]
at org.apache.amoro.server.optimizing.OptimizingQueue$TableOptimizingProcess.access$100(OptimizingQueue.java:493) ~[amoro-ams-0.9-SNAPSHOT.jar:0.9-SNAPSHOT]
at org.apache.amoro.server.optimizing.OptimizingQueue.ackTask(OptimizingQueue.java:401) ~[amoro-ams-0.9-SNAPSHOT.jar:0.9-SNAPSHOT]
at org.apache.amoro.server.DefaultOptimizingService.ackTask(DefaultOptimizingService.java:279) ~[amoro-ams-0.9-SNAPSHOT.jar:0.9-SNAPSHOT]
at sun.reflect.GeneratedMethodAccessor207.invoke(Unknown Source) ~[?:?]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_372]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_372]
at org.apache.amoro.server.utils.ThriftServiceProxy.invoke(ThriftServiceProxy.java:56) [amoro-ams-0.9-SNAPSHOT.jar:0.9-SNAPSHOT]
at com.sun.proxy.$Proxy48.ackTask(Unknown Source) [?:?]
at org.apache.amoro.api.OptimizingService$Processor$ackTask.getResult(OptimizingService.java:724) [amoro-common-0.9-SNAPSHOT.jar:0.9-SNAPSHOT]
at org.apache.amoro.api.OptimizingService$Processor$ackTask.getResult(OptimizingService.java:700) [amoro-common-0.9-SNAPSHOT.jar:0.9-SNAPSHOT]
at org.apache.amoro.shade.thrift.org.apache.thrift.ProcessFunction.process(ProcessFunction.java:40) [amoro-shade-thrift-0.20.0-0.7.0-incubating.jar:0.20.0-0.7.0-incubating]
at org.apache.amoro.shade.thrift.org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:40) [amoro-shade-thrift-0.20.0-0.7.0-incubating.jar:0.20.0-0.7.0-incubating]
at org.apache.amoro.shade.thrift.org.apache.thrift.TMultiplexedProcessor.process(TMultiplexedProcessor.java:147) [amoro-shade-thrift-0.20.0-0.7.0-incubating.jar:0.20.0-0.7.0-incubating]
at org.apache.amoro.shade.thrift.org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:492) [amoro-shade-thrift-0.20.0-0.7.0-incubating.jar:0.20.0-0.7.0-incubating]
at org.apache.amoro.shade.thrift.org.apache.thrift.server.Invocation.run(Invocation.java:19) [amoro-shade-thrift-0.20.0-0.7.0-incubating.jar:0.20.0-0.7.0-incubating]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_372]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_372]
at java.lang.Thread.run(Thread.java:750) [?:1.8.0_372]
2026-05-23 20:08:41,300 INFO [thrift-server-optimize-manager-25472] [org.apache.amoro.server.DefaultOptimizingService] [] - Optimizer b15a10a8-8edc-4a72-902f-7a159e2b8f87 (threadId 23) complete task OptimizingTaskId(processId:1457797617278976, taskId:36) (status: SUCCESS)

Affects Versions

master/0.9.0

What table formats are you seeing the problem on?

No response

What engines are you seeing the problem on?

No response

How to reproduce

No response

Relevant log output

Anything else

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

Labels

type:bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions