Skip to content

refactor(mergetree): introduce InMemorySortBuffer and MergedKeyValueRecordReader#253

Merged
lxy-9602 merged 9 commits into
alibaba:mainfrom
zjw1111:spill-pr1
Apr 28, 2026
Merged

refactor(mergetree): introduce InMemorySortBuffer and MergedKeyValueRecordReader#253
lxy-9602 merged 9 commits into
alibaba:mainfrom
zjw1111:spill-pr1

Conversation

@zjw1111
Copy link
Copy Markdown
Collaborator

@zjw1111 zjw1111 commented Apr 27, 2026

Purpose

Linked issue: #149

This PR refactors the merge-tree write buffer to be built on top of a new SortBuffer abstraction, and introduces a MergedKeyValueRecordReader that performs merge-on-read over multiple key-value record readers.

Main changes:

  • Introduce SortBuffer interface and InMemorySortBuffer
  • Refactor WriteBuffer to delegate sorting/iteration to the new SortBuffer implementations, removing the previous duplicated sort logic.
  • Refactor KeyValueInMemoryRecordReader: previously, KeyValueInMemoryRecordReader handled both sorting and merging of duplicate keys. Now, KeyValueInMemoryRecordReader is responsible only for sorting, while MergedKeyValueRecordReader handles merging duplicate keys.

Tests

New / updated unit tests:

  • core/io/merged_key_value_record_reader_test.cpp (new)
  • core/io/key_value_in_memory_record_reader_test.cpp (updated)
  • core/mergetree/write_buffer_test.cpp (updated)

API and Format

No.

Documentation

No new user-facing feature; no documentation change required.

Generative AI tooling

Generated-by: Aone Copilot(Claude-4.7-Opus) and Github Copilot(GPT-5.4)

…ValueRecordReader

Refactor write_buffer to use the new SortBuffer abstraction (BinaryInMemorySortBuffer / BinaryExternalSortBuffer) and introduce MergedKeyValueRecordReader for merged key-value reading. Also adapt key_value_in_memory_record_reader, merge_tree_writer, spill reader/writer and core_options accordingly.
Copilot AI review requested due to automatic review settings April 27, 2026 03:24
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors merge-tree buffering/iteration by introducing a SortBuffer abstraction (with a new in-memory implementation) and adds MergedKeyValueRecordReader to merge duplicate keys on read, updating writer/read paths and unit tests accordingly.

Changes:

  • Add SortBuffer interface and BinaryInMemorySortBuffer implementation, moving in-memory buffering/sorting logic out of WriteBuffer.
  • Refactor WriteBuffer / MergeTreeWriter to build readers via SortBuffer and to clear buffer state explicitly.
  • Introduce MergedKeyValueRecordReader (+ tests) and update KeyValueInMemoryRecordReader semantics/tests.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/paimon/core/utils/batch_writer.h Extends BatchWriter with memory usage reporting and FlushMemory() API.
src/paimon/core/postpone/postpone_bucket_writer.h Implements FlushMemory() via existing flush path.
src/paimon/core/append/append_only_writer.h Implements FlushMemory() via non-blocking flush.
src/paimon/core/mergetree/write_buffer.h Refactors WriteBuffer to delegate to SortBuffer and expose reader creation/clearing.
src/paimon/core/mergetree/write_buffer.cpp Builds BinaryInMemorySortBuffer and wraps produced readers with MergedKeyValueRecordReader.
src/paimon/core/mergetree/write_buffer_test.cpp Updates tests to new WriteBuffer API + memory estimation location.
src/paimon/core/mergetree/sort_buffer.h Adds the new SortBuffer interface.
src/paimon/core/mergetree/binary_in_memory_sort_buffer.h / .cpp Adds in-memory sort buffer implementation and memory estimation helper.
src/paimon/core/io/key_value_in_memory_record_reader.h / .cpp Adjusts in-memory reader to emit sorted raw KVs (no merge), adds sequence field sort direction.
src/paimon/core/io/key_value_in_memory_record_reader_test.cpp Updates expected behavior and adds descending sequence-fields coverage.
src/paimon/core/io/merged_key_value_record_reader.h / .cpp Adds reader wrapper that merges duplicate keys using merge function.
src/paimon/core/io/merged_key_value_record_reader_test.cpp New tests validating merge behavior across underlying batches.
src/paimon/core/mergetree/spill_writer.cpp Minor type annotation tweak for compression type.
src/paimon/core/mergetree/spill_reader.cpp Minor type annotation tweak for file status + level constant usage.
src/paimon/core/mergetree/merge_tree_writer.h / .cpp Removes writer-held sequence tracking and adapts flush path to new WriteBuffer API.
src/paimon/CMakeLists.txt Registers new sources/tests in the build.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/paimon/core/mergetree/write_buffer.cpp Outdated
Comment thread src/paimon/core/mergetree/write_buffer.cpp
Comment thread src/paimon/core/io/merged_key_value_record_reader.cpp
Comment thread src/paimon/core/mergetree/sort_buffer.h
Comment thread src/paimon/core/mergetree/sort_buffer.h
Comment thread src/paimon/core/io/key_value_in_memory_record_reader.cpp
Comment thread src/paimon/core/io/merged_key_value_record_reader.cpp
Comment thread src/paimon/core/io/key_value_in_memory_record_reader_test.cpp
Comment thread src/paimon/core/io/merged_key_value_record_reader.cpp
Comment thread src/paimon/core/utils/batch_writer.h
Comment thread src/paimon/core/mergetree/write_buffer.cpp
Comment thread src/paimon/core/append/append_only_writer.h
Comment thread src/paimon/core/mergetree/binary_in_memory_sort_buffer.h Outdated
Comment thread src/paimon/core/mergetree/binary_in_memory_sort_buffer.cpp Outdated
Copy link
Copy Markdown
Collaborator

@lxy-9602 lxy-9602 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@lxy-9602 lxy-9602 merged commit c7492e6 into alibaba:main Apr 28, 2026
9 of 10 checks passed
@zjw1111 zjw1111 changed the title refactor(mergetree): introduce BinaryInMemorySortBuffer and MergedKeyValueRecordReader refactor(mergetree): introduce InMemorySortBuffer and MergedKeyValueRecordReader Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants