Skip to content

[fix](spark-load) no need to filter row group when doing spark load#13116

Merged
morningman merged 4 commits intoapache:masterfrom
morningman:fix_spark_load_parquet
Oct 5, 2022
Merged

[fix](spark-load) no need to filter row group when doing spark load#13116
morningman merged 4 commits intoapache:masterfrom
morningman:fix_spark_load_parquet

Conversation

@morningman
Copy link
Contributor

@morningman morningman commented Oct 4, 2022

Proposed changes

Issue Number: close #13115

Problem summary

  1. Fix issue [Bug] Spark load cause BE crash #13115
  2. Modify the method of get_next_block or GenericReader, to return "read_rows" explicitly.
    Some columns in block may not be filled in reader, if the first column is not filled, use block->rows() can not return real row numbers.
  3. Add more checks for broker load test cases.

Checklist(Required)

  1. Does it affect the original behavior:
    • Yes
    • No
    • I don't know
  2. Has unit tests been added:
    • Yes
    • No
    • No Need
  3. Has document been added or modified:
    • Yes
    • No
    • No Need
  4. Does it need to update dependencies:
    • Yes
    • No
  5. Are there any changes that cannot be rolled back:
    • Yes (If Yes, please explain WHY)
    • No

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@morningman morningman added kind/fix Categorizes issue or PR as related to a bug. area/spark-load Issues or PRs related to the spark load labels Oct 4, 2022
@morningman morningman force-pushed the fix_spark_load_parquet branch from bbf7c61 to 2b824bf Compare October 5, 2022 09:12
Copy link
Contributor

@wsjz wsjz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@hf200012 hf200012 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman merged commit d286aa7 into apache:master Oct 5, 2022
FreeOnePlus pushed a commit to FreeOnePlus/doris that referenced this pull request Oct 8, 2022
…pache#13116)

1. Fix issue apache#13115 
2. Modify the method of `get_next_block` or `GenericReader`, to return "read_rows" explicitly.
    Some columns in block may not be filled in reader, if the first column is not filled, use `block->rows()` can not return real row numbers.
3. Add more checks for broker load test cases.
FreeOnePlus pushed a commit to FreeOnePlus/doris that referenced this pull request Oct 8, 2022
…pache#13116)

1. Fix issue apache#13115 
2. Modify the method of `get_next_block` or `GenericReader`, to return "read_rows" explicitly.
    Some columns in block may not be filled in reader, if the first column is not filled, use `block->rows()` can not return real row numbers.
3. Add more checks for broker load test cases.
FreeOnePlus pushed a commit to FreeOnePlus/doris that referenced this pull request Oct 8, 2022
…pache#13116)

1. Fix issue apache#13115 
2. Modify the method of `get_next_block` or `GenericReader`, to return "read_rows" explicitly.
    Some columns in block may not be filled in reader, if the first column is not filled, use `block->rows()` can not return real row numbers.
3. Add more checks for broker load test cases.
FreeOnePlus pushed a commit to FreeOnePlus/doris that referenced this pull request Oct 8, 2022
…pache#13116)

1. Fix issue apache#13115 
2. Modify the method of `get_next_block` or `GenericReader`, to return "read_rows" explicitly.
    Some columns in block may not be filled in reader, if the first column is not filled, use `block->rows()` can not return real row numbers.
3. Add more checks for broker load test cases.
FreeOnePlus pushed a commit to FreeOnePlus/doris that referenced this pull request Oct 8, 2022
…pache#13116)

1. Fix issue apache#13115 
2. Modify the method of `get_next_block` or `GenericReader`, to return "read_rows" explicitly.
    Some columns in block may not be filled in reader, if the first column is not filled, use `block->rows()` can not return real row numbers.
3. Add more checks for broker load test cases.
FreeOnePlus pushed a commit to FreeOnePlus/doris that referenced this pull request Oct 8, 2022
…pache#13116)

1. Fix issue apache#13115 
2. Modify the method of `get_next_block` or `GenericReader`, to return "read_rows" explicitly.
    Some columns in block may not be filled in reader, if the first column is not filled, use `block->rows()` can not return real row numbers.
3. Add more checks for broker load test cases.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/spark-load Issues or PRs related to the spark load area/vectorization kind/fix Categorizes issue or PR as related to a bug. kind/test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Spark load cause BE crash

3 participants