[Feature]: Introduce a mechanism for concurrency control between Writing and Optimizing #985
Closed
2 tasks done
Labels
type:feature
Feature Requests
Milestone
Description
We should introduce a mechanism to address the concurrency issues when Writing and Optimizing happen at the same time.
Use case/motivation
There will be concurrency issues when table writing and optimizing happen at the same time in some cases:
case1:
Spark and Flink write into
ChangeStore
while self-optimizing is executing. TransactionId of theChangeStore
may be disordered in this case, which will bring data loss after self-optimizing.case2:
Spark concurrently writes into Arctic Table without conflict detection, especially when Spark writes into
ChangeStore
, because it's not easy to detect conflict since thedelete files
inChangeStore
is different from the standard Iceberg Table.Describe the solution
As described in this proposal
https://docs.google.com/document/d/13ftRtwYU9TXWojtA3jyeH9bVVgSfm2cdSjGYNzHEGKI/edit
We should introduce a mechanism to block operations on the table, like
Optimizing
andBatchWrite
, and avoid the concurrency issue.We call this mechanism
TableBlocker
, with theTableBlocker
:Subtasks
No response
Related issues
No response
Are you willing to submit a PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: