Skip to content

Conversation

@dataroaring
Copy link
Contributor

@dataroaring dataroaring commented Jun 22, 2022

Proposed changes

Issue Number: close #10337

Problem Summary:

SEQ_COL is used on tables with unique key to order data in one transaction(rowset), when there is only one rowset and the rowset is compacted, rows in the rowset is sorted and rows with same keys are resolved by compaction, so a scanner sets direct_mode to optimize read iterator to avoid sorting and aggregating, and iterators does not need SEQ_COL. However, init_return_columns adds SEQ_COL to return_columns, which is passed to SegmentIterator. Then segment Iterator would be called via get_next with a block without SEQ_COL, segment iterator creates columns included in return_columns but not in the block. SEQ_COL is nullable, segment Iterator does not handle it, so a core dump happen.

Actually, in the above case, segment iterator does not need to read SEQ_COL. When SEQ_COL is really needed, iterators creates SEQ_COL column in block, so segment Iterator does not need do create SEQ_COL at all.

Checklist(Required)

  1. Does it affect the original behavior: (Yes/No/I Don't know)
  2. Has unit tests been added: (Yes/No/No Need)
  3. Has document been added or modified: (Yes/No/No Need)
  4. Does it need to update dependencies: (Yes/No)
  5. Are there any changes that cannot be rolled back: (Yes/No)

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@dataroaring dataroaring added the dev/1.0.1-deprecated should be merged into dev-1.0.1 branch label Jun 22, 2022
Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jun 22, 2022
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman merged commit 274a0f2 into apache:master Jun 23, 2022
@morningman morningman added dev/merged-1.0.1-deprecated PR has been merged into dev-1.0.1 and removed dev/1.0.1-deprecated should be merged into dev-1.0.1 branch labels Jun 23, 2022
morningman pushed a commit that referenced this pull request Jun 23, 2022
SEQ_COL is used on tables with unique key to order data in one transaction(rowset),
when there is only one rowset and the rowset is compacted, rows in the rowset is sorted
and rows with same keys are resolved by compaction, so a scanner sets direct_mode to
optimize read iterator to avoid sorting and aggregating, and iterators does not need SEQ_COL.
However, init_return_columns adds SEQ_COL to return_columns, which is passed to SegmentIterator.
Then segment Iterator would be called via get_next with a block without SEQ_COL, segment iterator
creates columns included in return_columns but not in the block. SEQ_COL is nullable, segment Iterator
does not handle it, so a core dump happen.

Actually, in the above case, segment iterator does not need to read SEQ_COL.
When SEQ_COL is really needed, iterators creates SEQ_COL column in block,
so segment Iterator does not need do create SEQ_COL at all.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. area/vectorization dev/merged-1.0.1-deprecated PR has been merged into dev-1.0.1 reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Fix read seq column cores

2 participants