Skip to content

[Refactor](multi catalog)BE assignment data structure.#18157

Closed
Jibing-Li wants to merge 1 commit into
apache:masterfrom
Jibing-Li:be
Closed

[Refactor](multi catalog)BE assignment data structure.#18157
Jibing-Li wants to merge 1 commit into
apache:masterfrom
Jibing-Li:be

Conversation

@Jibing-Li
Copy link
Copy Markdown
Contributor

@Jibing-Li Jibing-Li commented Mar 28, 2023

Scan operation including two steps:

  1. Generate scan range list. Each scan range is a basic scheduling unit. For OlapScanNode, a scan range is a tablet. For ExternalFileScanNode, it is one or several blocks of a file.
  2. Assign a BE for each scan range to execute.

This PR does two things:

  1. Define the data structure for ScanNode scan range (ScanRangeList) and the interface in ScanNode to get it.
  2. Implement the scan range generation logic and BE assign logic for ExternalFileScanNode. (ExternalFileScanNode is used by Hive, Iceberg, TVF query, broker load and stream load)

Checklist(Required)

  • Does it affect the original behavior
  • Has unit tests been added
  • Has document been added or modified
  • Does it need to update dependencies
  • Is this PR support rollback (If NO, please explain WHY)

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@Jibing-Li Jibing-Li marked this pull request as draft March 28, 2023 02:24
@github-actions github-actions Bot added the area/planner Issues or PRs related to the query planner label Mar 28, 2023
Comment thread fe/fe-core/src/main/java/org/apache/doris/planner/ScanInfo.java Outdated
@Jibing-Li Jibing-Li force-pushed the be branch 7 times, most recently from bab9343 to 0a5e803 Compare March 29, 2023 08:07
@Jibing-Li Jibing-Li marked this pull request as ready for review March 29, 2023 08:08
@Jibing-Li Jibing-Li force-pushed the be branch 6 times, most recently from 6bf9f52 to 64951e7 Compare March 30, 2023 05:36
@Jibing-Li
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 33.74 seconds
stream load tsv: 450 seconds loaded 74807831229 Bytes, about 158 MB/s
stream load json: 24 seconds loaded 2358488459 Bytes, about 93 MB/s
stream load orc: 73 seconds loaded 1101869774 Bytes, about 14 MB/s
stream load parquet: 30 seconds loaded 861443392 Bytes, about 27 MB/s
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230330085029_clickbench_pr_122829.html

@morningman morningman closed this Apr 28, 2023
@Jibing-Li Jibing-Li deleted the be branch June 15, 2023 02:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/planner Issues or PRs related to the query planner

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants