Skip to content

[fix](csv-reader) fix new csv reader's performance issue#15581

Merged
yiguolei merged 3 commits intoapache:masterfrom
morningman:slow_csv
Jan 4, 2023
Merged

[fix](csv-reader) fix new csv reader's performance issue#15581
yiguolei merged 3 commits intoapache:masterfrom
morningman:slow_csv

Conversation

@morningman
Copy link
Contributor

@morningman morningman commented Jan 3, 2023

Proposed changes

Issue Number: close #xxx

Problem summary

#14604 fix the bug but introduce a performance issue.
This PR fix the performance issue when loading csv.

For clickbench load time, from 600s -> 470s

TODO:
Still slower than origin broker scan node, which is about 420s.
Will improve later.

Checklist(Required)

  1. Does it affect the original behavior:
    • Yes
    • No
    • I don't know
  2. Has unit tests been added:
    • Yes
    • No
    • No Need
  3. Has document been added or modified:
    • Yes
    • No
    • No Need
  4. Does it need to update dependencies:
    • Yes
    • No
  5. Are there any changes that cannot be rolled back:
    • Yes (If Yes, please explain WHY)
    • No

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@github-actions
Copy link
Contributor

github-actions bot commented Jan 3, 2023

clang-tidy review says "All clean, LGTM! 👍"

@hello-stephen
Copy link
Contributor

hello-stephen commented Jan 3, 2023

TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 35.97 seconds
load time: 476 seconds
storage size: 17122316622 Bytes
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230104083851_clickbench_pr_73320.html

@github-actions
Copy link
Contributor

github-actions bot commented Jan 4, 2023

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Contributor

github-actions bot commented Jan 4, 2023

clang-tidy review says "All clean, LGTM! 👍"

@morningman morningman changed the title [Draft](csv-reader) improve new csv reader's performance [fix](csv-reader) improve new csv reader's performance Jan 4, 2023
@morningman morningman changed the title [fix](csv-reader) improve new csv reader's performance [fix](csv-reader) fix new csv reader's performance issue Jan 4, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Jan 4, 2023

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

@yiguolei yiguolei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yiguolei yiguolei merged commit 4075e3a into apache:master Jan 4, 2023
@morningman morningman mentioned this pull request Feb 6, 2023
luwei16 pushed a commit to luwei16/Doris that referenced this pull request Apr 7, 2023
…che#1342)

commit 4075e3a
Author: Mingyu Chen <morningman@163.com>
Date:   Wed Jan 4 18:25:08 2023 +0800

    [fix](csv-reader) fix new csv reader's performance issue (apache#15581)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants