Skip to content

[INLONG-8125][Sort] Optimizing the speed of transitioning from snapshot to binlog#8131

Merged
dockerzhang merged 9 commits intoapache:masterfrom
e-mhui:INLONG-8125
Jun 6, 2023
Merged

[INLONG-8125][Sort] Optimizing the speed of transitioning from snapshot to binlog#8131
dockerzhang merged 9 commits intoapache:masterfrom
e-mhui:INLONG-8125

Conversation

@e-mhui
Copy link
Copy Markdown
Contributor

@e-mhui e-mhui commented May 31, 2023

Prepare a Pull Request

[INLONG-8125][Sort] Optimizing the speed of transitioning from snapshot to binlog

Motivation

When transitioning from the snapshot phase to the binlog phase, the shouldEmit() method is called. Its function is to traverse all splits for each Binlog data and check whether the record is located after the end of the snapshot phase. However, the time complexity of this method is O(n). Since finishedSplitsInfo is locally ordered, we can use binary search to find the split where the current binlog is located, reducing the time complexity from O(n) to O(logn).

  for (FinishedSnapshotSplitInfo splitInfo : finishedSplitsInfo.get(tableId)) {
      if (RecordUtils.splitKeyRangeContains(
              key, splitInfo.getSplitStart(), splitInfo.getSplitEnd())
              && position.isAfter(splitInfo.getHighWatermark())) {
          return true;
      }

Modifications

  1. When the MySqlSourceReader receives all the FinishedSnapshotSplitInfos, it sorts the FinishedSnapshotSplitInfos based on the chunkId.
  2. Use binary search to find the split which contain the specific key.

@e-mhui e-mhui marked this pull request as draft May 31, 2023 07:59
@e-mhui e-mhui marked this pull request as ready for review June 3, 2023 01:35
@dockerzhang dockerzhang requested a review from gong June 5, 2023 08:22
@dockerzhang dockerzhang merged commit 48f3636 into apache:master Jun 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Improve][Sort] Optimizing the speed of transitioning from snapshot to binlog

5 participants