-
Notifications
You must be signed in to change notification settings - Fork 260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ARCTIC-994] Introduce design of TransactionId
generation to resolving data conflicts
#1010
Conversation
When |
core/src/main/java/com/netease/arctic/data/DefaultKeyedFile.java
Outdated
Show resolved
Hide resolved
Adapt new Transaction model
Adapt new Transaction model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wangtaohz Thanks for your contribution. I left some comments.
@zstraw Do you have a time to take a look?
flink/v1.12/flink/src/main/java/com/netease/arctic/flink/write/ArcticFileWriter.java
Outdated
Show resolved
Hide resolved
...v1.12/flink/src/main/java/com/netease/arctic/flink/write/ArcticRowDataTaskWriterFactory.java
Outdated
Show resolved
Hide resolved
...v1.12/flink/src/main/java/com/netease/arctic/flink/write/ArcticRowDataTaskWriterFactory.java
Outdated
Show resolved
Hide resolved
flink/v1.12/flink/src/main/java/com/netease/arctic/flink/write/ArcticFileWriter.java
Outdated
Show resolved
Hide resolved
2.fix unit test of tracer in core
This PR also fix #1045 |
try (TaskWriter<Record> writer = GenericTaskWriters.builderFor(table) | ||
.withTransactionId(txId) | ||
.withChangeAction(action) | ||
.buildChangeWriter()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove transactionId here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, it writes some data into ChangeStore, there is no need to begin a transaction here, so I remove the transactionId
.
2.support generate hive sub dir without txId
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…ing data conflicts (#1010) * generate transaction from change table sequence number * remove gap * add operationId into OutputFileFactory * remove allocateTransactionId from TablePropertyUtil for spark * remove useless import * flink not begin transaction * remove begin transaction when write into change store * add UpdateTool to be compatible with old transactionId * fix unit test * Adapt new Transaction model * Adapt new Transaction model * Adapt new Transaction model * Adapt new Transaction model * Adapt new Transaction model * Adapt new Transaction model * fix checkstyle * fix compile error * 1.fix flink incremental pull in 1.12 2.fix unit test in 1.12 * 1.fix flink incremental pull in 1.14 2.fix unit test in 1.14 * 1.fix flink incremental pull in 1.15 2.fix unit test in 1.15 * remove useless import * Adapt new Transaction model * Adapt new Transaction model * fix ams unit test * fix hive commit target files serialization * fix compile error * add summary to empty snapshot for Transaction begin and add some comment * Adapt new Transaction model * fix parse file for tracer and refactor some methods of FileNameHandle" * remove useless code in flink * 1. add more comment for FileNameHandle 2.fix unit test of tracer in core * add comment * remove useless import * refactor FileNameHandle to FileNameGenerator * add unit test for #1045 * 1.spark remove txId for operations on UnkeyedTable 2.support generate hive sub dir without txId * remove FileName transaction > 0 check --------- Co-authored-by: shidayang <530847445@qq.com>
…ing data conflicts (apache#1010) * generate transaction from change table sequence number * remove gap * add operationId into OutputFileFactory * remove allocateTransactionId from TablePropertyUtil for spark * remove useless import * flink not begin transaction * remove begin transaction when write into change store * add UpdateTool to be compatible with old transactionId * fix unit test * Adapt new Transaction model * Adapt new Transaction model * Adapt new Transaction model * Adapt new Transaction model * Adapt new Transaction model * Adapt new Transaction model * fix checkstyle * fix compile error * 1.fix flink incremental pull in 1.12 2.fix unit test in 1.12 * 1.fix flink incremental pull in 1.14 2.fix unit test in 1.14 * 1.fix flink incremental pull in 1.15 2.fix unit test in 1.15 * remove useless import * Adapt new Transaction model * Adapt new Transaction model * fix ams unit test * fix hive commit target files serialization * fix compile error * add summary to empty snapshot for Transaction begin and add some comment * Adapt new Transaction model * fix parse file for tracer and refactor some methods of FileNameHandle" * remove useless code in flink * 1. add more comment for FileNameHandle 2.fix unit test of tracer in core * add comment * remove useless import * refactor FileNameHandle to FileNameGenerator * add unit test for apache#1045 * 1.spark remove txId for operations on UnkeyedTable 2.support generate hive sub dir without txId * remove FileName transaction > 0 check --------- Co-authored-by: shidayang <530847445@qq.com>
Why are the changes needed?
fix #994
Brief change log
TransactionId
from change snapshot sequenceTransactionId
TransactionId
when committing. TheTxId
in fileName is set to be 0, and the correctTransactionId
should get from iceberg metadata in this caseTableEntriesScan
instead of the icebergTableScan
, to get the sequence number from iceberg metadata whenTxId
in fileName is 0TxId
= 0, inCommonOutputFileFactory
andAdaptHiveOutputFileFactory
com.netease.arctic.io.FileNameHandle
to resolve file name.com.netease.arctic.data.file.ContentFileWithSequence
to wrap ContentFile.com.netease.arctic.scan.BaseChangeTableIncrementalScan
can return SequenceNumber.TransactionId
be compatible with oldTransactionId
(make sure snapshot sequence >= old transactionId) when ams start, inUpdateTool
How was this patch tested?
Add some test cases that check the changes thoroughly including negative and positive cases if possible
Add screenshots for manual tests if appropriate
Run test locally before making a pull request
Documentation