Skip to content

[fix](merge-on-write) fix that the query result has duplicate keys when load with sequence column#16587

Merged
zhannngchen merged 1 commit intoapache:masterfrom
liaoxin01:fix_dup_seq
Feb 10, 2023
Merged

[fix](merge-on-write) fix that the query result has duplicate keys when load with sequence column#16587
zhannngchen merged 1 commit intoapache:masterfrom
liaoxin01:fix_dup_seq

Conversation

@liaoxin01
Copy link
Contributor

@liaoxin01 liaoxin01 commented Feb 10, 2023

Proposed changes

Issue Number: close #xxx

Problem summary

Delete bitmap will be calculate when memtable flush and publish. The two stages may see different versions.
When there is sequence column, the currently imported data of rowset may be marked for deletion at memtablet flush or publish because the seq column is smaller than the previous rowset.
Finally, the real version of delete bitmap will be updated. Because the set operation is used, so the delete bitmap of a certain version is lost.

Checklist(Required)

  1. Does it affect the original behavior:
    • Yes
    • No
    • I don't know
  2. Has unit tests been added:
    • Yes
    • No
    • No Need
  3. Has document been added or modified:
    • Yes
    • No
    • No Need
  4. Does it need to update dependencies:
    • Yes
    • No
  5. Are there any changes that cannot be rolled back:
    • Yes (If Yes, please explain WHY)
    • No

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@hello-stephen
Copy link
Contributor

TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 34.85 seconds
stream load tsv: 469 seconds loaded 74807831229 Bytes, about 152 MB/s
stream load json: 38 seconds loaded 2358488459 Bytes, about 59 MB/s
stream load orc: 68 seconds loaded 1101869774 Bytes, about 15 MB/s
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230210050925_clickbench_pr_93549.html

Copy link
Contributor

@zhannngchen zhannngchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhannngchen zhannngchen merged commit c3110f8 into apache:master Feb 10, 2023
morningman pushed a commit that referenced this pull request Feb 10, 2023
YangShaw pushed a commit to YangShaw/doris that referenced this pull request Feb 17, 2023
@liaoxin01 liaoxin01 deleted the fix_dup_seq branch February 6, 2024 12:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants