Search before asking
Version
4.0/4.1 (any version with enable_parallel_publish_version enabled; default is true).
What's Wrong?
TransactionState.publishVersionTasks is declared as a plain HashMap:
private Map<Long, List<PublishVersionTask>> publishVersionTasks;
this.publishVersionTasks = Maps.newHashMap();
Before 4.0, publish ran entirely on the single-threaded MasterDaemon, so a non-thread-safe map was fine. Starting in 4.0, PublishVersionDaemon runs publish in parallel through a per-db pool (dbExecutors, sized by Config.publish_thread_pool_num, default 128, each pool has corePoolSize = 1). When parallel publish is enabled, tryFinishOneTxn hands
tryFinishTxnSync off to a worker while the master loop keeps going. The same map is then touched concurrently by:
- Master daemon thread — iterates via forEach / reads keySet() in PublishVersionDaemon.tryFinishOneTxn.
- PUBLISH_VERSION_EXEC worker for that txn's db — routed by dbId % publish_thread_pool_num to a single-thread pool, iterates values().forEach in PublishVersionDaemon.tryFinishTxnSync, and calls clear() in TransactionState.pruneAfterVisible after the txn becomes VISIBLE.
(addPublishVersionTask is also called by the master daemon, but it runs once per txn during the initial dispatch in traverseReadyTxnAndDispatchPublishVersionTask, guarded by hasSendTask, strictly before any worker iteration — so it does not participate in this race.)
The race: the master daemon iterates one txn's map while the worker (from the previous round) runs pruneAfterVisible() -> clear() on the same map. The HashMap fail-fast iterator detects the modCount change and throws ConcurrentModificationException.
The CME is caught at an outer layer so FE does not crash, but that publish round aborts and the txn stays in COMMITTED until a later daemon round re-publishes it successfully. Recurring CMEs increase publish latency; combined with other factors (e.g. table-lock contention or executor saturation) this can further evolve into larger publish backlog — but those
secondary effects are out of scope for this issue.
Sample stack from a production FE:
java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextNode(HashMap.java:1597)
at java.util.HashMap$EntryIterator.next(HashMap.java:1630)
at org.apache.doris.transaction.PublishVersionDaemon.tryFinishOneTxn(PublishVersionDaemon.java:191)
What You Expected?
Publish should not abort the daemon round due to CME. publishVersionTasks must be safe for concurrent access by the master daemon and the per-db publish worker.
How to Reproduce?
The race depends on timing and is most easily observed when:
- enable_parallel_publish_version = true (4.0 default);
- heavy publish workload (many concurrent stream loads across many dbs/tables);
- several publish-ready txns per daemon round, so the master is iterating one txn's map at the same moment a worker is inside pruneAfterVisible() on the same txn.
On a busy cluster under steady load, CME typically appears in fe.warn.log within hours.
Anything Else?
No response
Are you willing to submit PR?
Code of Conduct
Search before asking
Version
4.0/4.1 (any version with enable_parallel_publish_version enabled; default is true).
What's Wrong?
TransactionState.publishVersionTasksis declared as a plainHashMap:this.publishVersionTasks = Maps.newHashMap();
Before 4.0, publish ran entirely on the single-threaded MasterDaemon, so a non-thread-safe map was fine. Starting in 4.0, PublishVersionDaemon runs publish in parallel through a per-db pool (dbExecutors, sized by Config.publish_thread_pool_num, default 128, each pool has corePoolSize = 1). When parallel publish is enabled, tryFinishOneTxn hands
tryFinishTxnSync off to a worker while the master loop keeps going. The same map is then touched concurrently by:
(addPublishVersionTask is also called by the master daemon, but it runs once per txn during the initial dispatch in traverseReadyTxnAndDispatchPublishVersionTask, guarded by hasSendTask, strictly before any worker iteration — so it does not participate in this race.)
The race: the master daemon iterates one txn's map while the worker (from the previous round) runs pruneAfterVisible() -> clear() on the same map. The HashMap fail-fast iterator detects the modCount change and throws ConcurrentModificationException.
The CME is caught at an outer layer so FE does not crash, but that publish round aborts and the txn stays in COMMITTED until a later daemon round re-publishes it successfully. Recurring CMEs increase publish latency; combined with other factors (e.g. table-lock contention or executor saturation) this can further evolve into larger publish backlog — but those
secondary effects are out of scope for this issue.
Sample stack from a production FE:
java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextNode(HashMap.java:1597)
at java.util.HashMap$EntryIterator.next(HashMap.java:1630)
at org.apache.doris.transaction.PublishVersionDaemon.tryFinishOneTxn(PublishVersionDaemon.java:191)
What You Expected?
Publish should not abort the daemon round due to CME. publishVersionTasks must be safe for concurrent access by the master daemon and the per-db publish worker.
How to Reproduce?
The race depends on timing and is most easily observed when:
On a busy cluster under steady load, CME typically appears in fe.warn.log within hours.
Anything Else?
No response
Are you willing to submit PR?
Code of Conduct