Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[config](load) enable new load scan node by default #14808

Merged
merged 13 commits into from
Dec 16, 2022

Conversation

morningman
Copy link
Contributor

@morningman morningman commented Dec 5, 2022

Proposed changes

Issue Number: close #xxx

Problem summary

Set FE enable_new_load_scan_node to true by default.
So that all load tasks(broker load, stream load, routine load, insert into) will use FileScanNode instead of BrokerScanNode
to read data

  1. Support loading parquet file in stream load with new load scan node.
  2. Fix bug that new parquet reader can not read column without logical or converted type.
  3. Change jsonb parser function to "jsonb_parse_error_to_null"
    So that if the input string is not a valid json string, it will return null for jsonb column in load task.

Checklist(Required)

  1. Does it affect the original behavior:
    • Yes
    • No
    • I don't know
  2. Has unit tests been added:
    • Yes
    • No
    • No Need
  3. Has document been added or modified:
    • Yes
    • No
    • No Need
  4. Does it need to update dependencies:
    • Yes
    • No
  5. Are there any changes that cannot be rolled back:
    • Yes (If Yes, please explain WHY)
    • No

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@morningman morningman added area/config Issues or PRs related to configuration kind/behavior-changed labels Dec 5, 2022
@hello-stephen
Copy link
Contributor

hello-stephen commented Dec 5, 2022

TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 36.5 seconds
load time: 702 seconds
storage size: 17123547050 Bytes
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20221215073329_clickbench_pr_63752.html

@github-actions github-actions bot added area/load Issues or PRs related to all kinds of load area/planner Issues or PRs related to the query planner area/vectorization labels Dec 5, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Dec 5, 2022

clang-tidy review says "All clean, LGTM! 👍"

2 similar comments
@github-actions
Copy link
Contributor

github-actions bot commented Dec 5, 2022

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Contributor

github-actions bot commented Dec 6, 2022

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Contributor

github-actions bot commented Dec 6, 2022

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
@github-actions
Copy link
Contributor

github-actions bot commented Dec 6, 2022

clang-tidy review says "All clean, LGTM! 👍"

@AshinGau
Copy link
Member

AshinGau commented Dec 6, 2022

LGTM

@github-actions
Copy link
Contributor

github-actions bot commented Dec 6, 2022

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
@github-actions
Copy link
Contributor

github-actions bot commented Dec 7, 2022

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Dec 16, 2022
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@morningman morningman merged commit 0e1e5a8 into apache:master Dec 16, 2022
morningman added a commit that referenced this pull request Dec 19, 2022
Set FE `enable_new_load_scan_node` to true by default.
So that all load tasks(broker load, stream load, routine load, insert into) will use FileScanNode instead of BrokerScanNode
to read data

1. Support loading parquet file in stream load with new load scan node.
2. Fix bug that new parquet reader can not read column without logical or converted type.
3. Change jsonb parser function to "jsonb_parse_error_to_null"
    So that if the input string is not a valid json string, it will return null for jsonb column in load task.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. area/config Issues or PRs related to configuration area/load Issues or PRs related to all kinds of load area/planner Issues or PRs related to the query planner area/vectorization dev/1.2.1-merged kind/behavior-changed kind/test reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants