-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[improve](txn insert) txn insert support write to one table many times #32980
Conversation
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
run buildall |
run buildall |
TPC-H: Total hot run time: 37510 ms
|
TPC-DS: Total hot run time: 181883 ms
|
Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
|
run buildall |
TPC-H: Total hot run time: 37824 ms
|
TPC-DS: Total hot run time: 182424 ms
|
ClickBench: Total hot run time: 28.9 s
|
Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
|
fe/fe-core/src/main/java/org/apache/doris/transaction/DatabaseTransactionMgr.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one comment.
run buildall |
run buildall |
TPC-H: Total hot run time: 38520 ms
|
TPC-DS: Total hot run time: 181653 ms
|
ClickBench: Total hot run time: 30.1 s
|
Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
|
run buildall |
TPC-H: Total hot run time: 38735 ms
|
TPC-DS: Total hot run time: 181749 ms
|
ClickBench: Total hot run time: 31.14 s
|
fe/fe-core/src/main/java/org/apache/doris/transaction/TransactionEntry.java
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/transaction/GlobalTransactionMgr.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
run buildall |
TPC-H: Total hot run time: 41325 ms
|
TPC-DS: Total hot run time: 186442 ms
|
run p0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR approved by at least one committer and no changes requested. |
PR approved by anyone and no changes requested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
run buildall |
run cloud_p1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
## Proposed changes ### Purpose The user doc: https://doris.apache.org/zh-CN/docs/dev/data-operate/import/transaction-load-manual We have supported insert into select(#31666), update(#33034) and delete(#33100) in transaction load. #32980 implements one txn write to one partition more than one rowsets. This pr implements to cloud mode of #32980 ### Implementation #### sub_txn_id see #32980 #### Meta service supports commit txn This process is generally the same as commit_txn, the difference is that he partitions version will plus 1 in multi sub txns. One example: Suppose the table, partition, tablet and version info is: ``` -------------------------------------------- | table | partition | tablet | version | -------------------------------------------- | t1 | t1_p1 | t1_p1.1 | 1 | | t1 | t1_p1 | t1_p1.2 | 1 | | t1 | t1_p2 | t1_p2.1 | 2 | | t2 | t2_p3 | t2_p3.1 | 3 | | t2 | t2_p4 | t2_p4.1 | 4 | -------------------------------------------- ``` Now we commit a txn with 3 sub txns and the tablets are: * sub_txn1: t1_p1.1, t1_p1.2, t1_p2.1 * sub_txn2: t2_p3.1 * sub_txn3: t1_p1.1, t1_p1.2 When commit, the partitions version will be: * sub_txn1: t1_p1(1 -> 2), t1_p2(2 -> 3) * sub_txn2: t2_p3(3 -> 4) * sub_txn3: t1_p1(2 -> 3) After commit, the partitions version will be: * t1: t1_p1(3), t1_p2(3) * t2: t2_p3(4), t2_p4(4) #### Meta service support generate sub_txn_id by `begin_sub_txn`
## Proposed changes ### Purpose The user doc: https://doris.apache.org/zh-CN/docs/dev/data-operate/import/transaction-load-manual We have supported insert into select(#31666), update(#33034) and delete(#33100) in transaction load. #32980 implements one txn write to one partition more than one rowsets. This pr implements to cloud mode of #32980 ### Implementation #### sub_txn_id see #32980 #### Meta service supports commit txn This process is generally the same as commit_txn, the difference is that he partitions version will plus 1 in multi sub txns. One example: Suppose the table, partition, tablet and version info is: ``` -------------------------------------------- | table | partition | tablet | version | -------------------------------------------- | t1 | t1_p1 | t1_p1.1 | 1 | | t1 | t1_p1 | t1_p1.2 | 1 | | t1 | t1_p2 | t1_p2.1 | 2 | | t2 | t2_p3 | t2_p3.1 | 3 | | t2 | t2_p4 | t2_p4.1 | 4 | -------------------------------------------- ``` Now we commit a txn with 3 sub txns and the tablets are: * sub_txn1: t1_p1.1, t1_p1.2, t1_p2.1 * sub_txn2: t2_p3.1 * sub_txn3: t1_p1.1, t1_p1.2 When commit, the partitions version will be: * sub_txn1: t1_p1(1 -> 2), t1_p2(2 -> 3) * sub_txn2: t2_p3(3 -> 4) * sub_txn3: t1_p1(2 -> 3) After commit, the partitions version will be: * t1: t1_p1(3), t1_p2(3) * t2: t2_p3(4), t2_p4(4) #### Meta service support generate sub_txn_id by `begin_sub_txn`
Proposed changes
Purpose
We have supported
insert into select
(#31666),update
(#33034) anddelete
(#33100) in transaction load.But leave a problem that, one partition can only be written once in one transaction, because current transaction mechanism only support publish one version for one partition. This pr is to solve this problem.
In other words, this pr supports write to one table many times in one transaction like:
Current implementation
In Doris, one transaction is related to one txn_id
BE use this txn_id to record the load info of partition_id, tablet, DeltaWriter... in
txn_manager
If writing to one partition twice in one txn, the above info in BE may be overwrited
When FE commit the txn, it calcultes a new partition version, a version is related to a Rowset, but multiple loads in txn generate multiple Rowsets.
New implementation
Introduce of sub_txn_id
To solve the above problem, the basic idea is to separate the txn_id in FE and BE. For multiple loads in one txn, we use sub_txn_id to distinguish the load for BE.
One example: suppose table t has 2 partitions, p1 and p2. The current version of p1 is 3, p2 is 4.
2. sub_txn_id1 = txn_id;
* sub_txn_id1: p1(4), p2(5)
* sub_txn_id2: p1(5)
* sub_txn_id3: p1(6), p2(6)
publish_task:
* use sub_txn_id to submit publish version tasks to be
FE Meta
In addition, this pr change the storage of
TransationState
in bdbje to json format to make it compatible.Isolation Level
Doris provides the
READ COMMITTED
isolation level. Please note the following:In a transaction, each statement reads the data that was committed at the time the statement began executing.
In a transaction, each statement cannot read the modifications made by other statements within the same transaction. Please notice:
For delete command, there are 2 implementations, one is delete condition, one is insert.
If the delete condition is committed after the insert, the delete will work for the insert, for example:
User doc
apache/doris-website#604