[FLINK-27958] Compare batch maxKey to reduce comparisons in SortMergeReader #313

FangYongs · 2022-10-10T03:23:24Z

Currently the SortMergeReader will compare and sort the readers after reading one batch from them to ensure that the sequence is correct. The readers are created from SortedRun list and the key ranges of them may be disjoint. We can compare batch minKey and maxKey for each read in the files of SortedRun list and divide them to multiple regions. When there's only one reader in the region, it can read data directly without compare and sort.

So the main changes are as follows:

Add SortedRegionDataRecordReader class which can create a reader with minKey and maxKey from each file in SortedRun
Add RecordReaderSubRegion class which includes SortedRegionDataRecordReader list, it is created from one SortedRun
Add RecordReaderRegionManager to divide RecordReaderSubRegion list into multiple RecordReaderRegion, each RecordReaderRegion manages its own RecordReaderSubRegion list and the key range in different RecordReaderRegions are disjoint
Create SortMergeReader from each RecordReaderRegion to reduce the comparisons in different RecordReaderRegions. If the RecordReaderRegion has only one reader, using the specify reader directly

Test cases RecordReaderRegionTest and RecordReaderRegionManagerTest are added to test the new classes, the SortMergeReader and related classes are tested in MergeTreeTest

FangYongs · 2022-10-10T03:27:06Z

Hi @JingsongLi I tried to fix FLINK-27958 and the main changes are described as above. Can you help to review the implementation and codes when you're free THX

JingsongLi · 2022-10-12T04:25:52Z

CC: @tsreaper

JingsongLi · 2022-10-12T04:26:33Z

Hi @zjureel can you do some benchmark to verify the improvement?

FangYongs · 2022-10-13T03:05:48Z

Hi @zjureel can you do some benchmark to verify the improvement?

Hi @JingsongLi It's a good idea and I like it. I find there's a flink-table-store-benchmark project in flink-table-store to setup a flink cluster, run a query in the cluster and collect some metrics. I propose to add a new micro benchmark project in flink-table-store, we then add mcro benchmarks of core operation steps in flink-table-store-micro-benchmarks such as the throughput of read, write and compaction. We can create a view for the micro benchmarks, and the flink-table-store-micro-benchmarks project is just similar to flink-benchmarks for flink. What do you think ? Hope to hear from you, THX

JingsongLi · 2022-10-13T11:11:43Z

+1 to flink-table-store-micro-benchmarks

…Reader

fangyong and others added 2 commits October 28, 2022 16:02

[FLINK-27958] Compare batch maxKey to reduce comparisons in SortMerge…

b9a8f1f

…Reader

[FLINK-27958] Rebase from master and update config in micro benchmark

bd050f3

FangYongs force-pushed the FLINK_27958_batch_maxKey_in_SortMergeReader branch from 3a03dea to bd050f3 Compare October 31, 2022 07:37

FangYongs closed this Feb 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FLINK-27958] Compare batch maxKey to reduce comparisons in SortMergeReader #313

[FLINK-27958] Compare batch maxKey to reduce comparisons in SortMergeReader #313

Uh oh!

FangYongs commented Oct 10, 2022 •

edited

Loading

Uh oh!

FangYongs commented Oct 10, 2022 •

edited

Loading

Uh oh!

JingsongLi commented Oct 12, 2022

Uh oh!

JingsongLi commented Oct 12, 2022

Uh oh!

FangYongs commented Oct 13, 2022 •

edited

Loading

Uh oh!

JingsongLi commented Oct 13, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[FLINK-27958] Compare batch maxKey to reduce comparisons in SortMergeReader #313

[FLINK-27958] Compare batch maxKey to reduce comparisons in SortMergeReader #313

Uh oh!

Conversation

FangYongs commented Oct 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FangYongs commented Oct 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JingsongLi commented Oct 12, 2022

Uh oh!

JingsongLi commented Oct 12, 2022

Uh oh!

FangYongs commented Oct 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JingsongLi commented Oct 13, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

FangYongs commented Oct 10, 2022 •

edited

Loading

FangYongs commented Oct 10, 2022 •

edited

Loading

FangYongs commented Oct 13, 2022 •

edited

Loading