Skip to content

Conversation

@AshinGau
Copy link
Member

@AshinGau AshinGau commented Nov 14, 2022

Proposed changes

Support partition&missing columns in parquet lazy read.

Problem summary

PR #13917 has supported lazy read for non-predicate columns in ParquetReader, but can't trigger lazy read when predicate columns are partition or missing columns. This PR support such case, and fill partition and missing columns in FileReader.

Checklist(Required)

  1. Does it affect the original behavior:
    • Yes
    • No
    • I don't know
  2. Has unit tests been added:
    • Yes
    • No
    • No Need
  3. Has document been added or modified:
    • Yes
    • No
    • No Need
  4. Does it need to update dependencies:
    • Yes
    • No
  5. Are there any changes that cannot be rolled back:
    • Yes (If Yes, please explain WHY)
    • No

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

bool write_vec_column(const SlotDescriptor* slot_desc, vectorized::IColumn* nullable_col_ptr,
const char* data, size_t len, bool copy_string, bool need_escape);
const char* data, size_t len, bool copy_string, bool need_escape,
size_t rows = 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comment in code to explain this magic num

} else {
nullable_column->get_null_map_data().push_back(0);
auto& null_map = nullable_column->get_null_map_data();
null_map.resize_fill(null_map.size() + rows, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to modify the if code block in L333

}
virtual ~GenericReader() = default;

bool fill_all_columns() const { return _fill_all_columns; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment for interface.

@hello-stephen
Copy link
Contributor

hello-stephen commented Nov 14, 2022

TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 33.83 seconds
load time: 433 seconds
storage size: 17152206963 Bytes
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20221115133249_clickbench_pr_46565.html

@AshinGau AshinGau force-pushed the lazy-pm branch 2 times, most recently from a61c89a to ff6685a Compare November 15, 2022 11:38
Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman merged commit 20634ab into apache:master Nov 16, 2022
@AshinGau AshinGau deleted the lazy-pm branch December 20, 2022 07:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants