Skip to content

[fix](multi-catalog) get dictionary-encode from parquet metadata#15524

Merged
morningman merged 1 commit intoapache:branch-1.2-ltsfrom
AshinGau:parquet_dict
Dec 30, 2022
Merged

[fix](multi-catalog) get dictionary-encode from parquet metadata#15524
morningman merged 1 commit intoapache:branch-1.2-ltsfrom
AshinGau:parquet_dict

Conversation

@AshinGau
Copy link
Member

Proposed changes

Issue Number: close #xxx

Problem summary

Check whether a parquet is dictionary-encoded from file metadata instead of reading the first block. In some versions of parquet writer, the file format is not standard. We can't read the information from file metadata, but it's a dictionary-encoded column. In this case, the performance of reading delete files will be degraded.

Checklist(Required)

  1. Does it affect the original behavior:
    • Yes
    • No
    • I don't know
  2. Has unit tests been added:
    • Yes
    • No
    • No Need
  3. Has document been added or modified:
    • Yes
    • No
    • No Need
  4. Does it need to update dependencies:
    • Yes
    • No
  5. Are there any changes that cannot be rolled back:
    • Yes (If Yes, please explain WHY)
    • No

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Dec 30, 2022
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@morningman morningman merged commit 63a11f9 into apache:branch-1.2-lts Dec 30, 2022
@AshinGau AshinGau deleted the parquet_dict branch August 10, 2023 08:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. area/vectorization dev/1.2.1-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants