do not read seq column when reading a compacted rowset #10344

dataroaring · 2022-06-22T11:38:38Z

Proposed changes

Issue Number: close #10337

Problem Summary:

SEQ_COL is used on tables with unique key to order data in one transaction(rowset), when there is only one rowset and the rowset is compacted, rows in the rowset is sorted and rows with same keys are resolved by compaction, so a scanner sets direct_mode to optimize read iterator to avoid sorting and aggregating, and iterators does not need SEQ_COL. However, init_return_columns adds SEQ_COL to return_columns, which is passed to SegmentIterator. Then segment Iterator would be called via get_next with a block without SEQ_COL, segment iterator creates columns included in return_columns but not in the block. SEQ_COL is nullable, segment Iterator does not handle it, so a core dump happen.

Actually, in the above case, segment iterator does not need to read SEQ_COL. When SEQ_COL is really needed, iterators creates SEQ_COL column in block, so segment Iterator does not need do create SEQ_COL at all.

Checklist(Required)

Does it affect the original behavior: (Yes/No/I Don't know)
Has unit tests been added: (Yes/No/No Need)
Has document been added or modified: (Yes/No/No Need)
Does it need to update dependencies: (Yes/No)
Are there any changes that cannot be rolled back: (Yes/No)

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

morningman

LGTM

github-actions · 2022-06-22T12:35:05Z

PR approved by at least one committer and no changes requested.

github-actions · 2022-06-22T12:35:08Z

PR approved by anyone and no changes requested.

morningman

LGTM

SEQ_COL is used on tables with unique key to order data in one transaction(rowset), when there is only one rowset and the rowset is compacted, rows in the rowset is sorted and rows with same keys are resolved by compaction, so a scanner sets direct_mode to optimize read iterator to avoid sorting and aggregating, and iterators does not need SEQ_COL. However, init_return_columns adds SEQ_COL to return_columns, which is passed to SegmentIterator. Then segment Iterator would be called via get_next with a block without SEQ_COL, segment iterator creates columns included in return_columns but not in the block. SEQ_COL is nullable, segment Iterator does not handle it, so a core dump happen. Actually, in the above case, segment iterator does not need to read SEQ_COL. When SEQ_COL is really needed, iterators creates SEQ_COL column in block, so segment Iterator does not need do create SEQ_COL at all.

github-actions bot added the area/vectorization label Jun 22, 2022

dataroaring added the dev/1.0.1-deprecated should be merged into dev-1.0.1 branch label Jun 22, 2022

dataroaring force-pushed the opt_single_version branch from c6818b2 to 39181ea Compare June 22, 2022 12:14

do not read seq column when reading a compacted rowset

8f53059

dataroaring force-pushed the opt_single_version branch from 39181ea to 8f53059 Compare June 22, 2022 12:21

morningman approved these changes Jun 22, 2022

View reviewed changes

github-actions bot added the approved Indicates a PR has been approved by one committer. label Jun 22, 2022

github-actions bot added the reviewed label Jun 22, 2022

morningman approved these changes Jun 22, 2022

View reviewed changes

morningman merged commit 274a0f2 into apache:master Jun 23, 2022

morningman added dev/merged-1.0.1-deprecated PR has been merged into dev-1.0.1 and removed dev/1.0.1-deprecated should be merged into dev-1.0.1 branch labels Jun 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

do not read seq column when reading a compacted rowset #10344

do not read seq column when reading a compacted rowset #10344

Uh oh!

dataroaring commented Jun 22, 2022 •

edited

Loading

Uh oh!

morningman left a comment

Uh oh!

github-actions bot commented Jun 22, 2022

Uh oh!

github-actions bot commented Jun 22, 2022

Uh oh!

morningman left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

do not read seq column when reading a compacted rowset #10344

do not read seq column when reading a compacted rowset #10344

Uh oh!

Conversation

dataroaring commented Jun 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed changes

Problem Summary:

Checklist(Required)

Further comments

Uh oh!

morningman left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jun 22, 2022

Uh oh!

github-actions bot commented Jun 22, 2022

Uh oh!

morningman left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dataroaring commented Jun 22, 2022 •

edited

Loading