Skip to content

[feat](paimon) support paimon insert by jni&paimon-cpp#62689

Open
FayneBupt wants to merge 42 commits into
apache:branch-4.1from
FayneBupt:feature/support_paimon_insert_4.1
Open

[feat](paimon) support paimon insert by jni&paimon-cpp#62689
FayneBupt wants to merge 42 commits into
apache:branch-4.1from
FayneBupt:feature/support_paimon_insert_4.1

Conversation

@FayneBupt
Copy link
Copy Markdown

What problem does this PR solve?

Issue Number: close #56005

Related PR: #xxx

Problem Summary:

This PR implements Doris write-back support for Paimon external tables (INSERT INTO ...), completing the end-to-end write path across FE and BE, including planning, sink serialization, execution, and FE-side commit.

1) Core capability

  • Support writing data from Doris into Paimon external tables.
  • Support both writer paths:
    • Native Writer (paimon-cpp): High-performance C++ native writer path (Default).
    • JNI Writer: Java-based writer path for complex scenarios and compatibility.

2) FE/BE write pipeline integration

  • FE:
    • Add Paimon insert executor flow.
    • Build and send Paimon sink metadata (table identity/location/options/column names/shuffle info).
    • Collect commit payload from BE and commit on FE side.
  • BE:
    • Execute Paimon sink writing.
    • Build commit payload and report back to FE.

3) Bucket/partition write behavior

  • Add distributed bucket shuffle support for bucketed Paimon tables.
  • Add compatible fallback behavior (e.g. gather/random) when required bucket keys are not resolvable.
  • Keep partition-related write behavior aligned with existing sink framework.

4) Limitations of Native Writer (paimon-cpp)

Currently, the high-performance Native Writer is the default path, but it has some functional limitations compared to the JNI Writer. When these scenarios are encountered, users must fallback to the JNI Writer (set enable_paimon_jni_writer = true):

  • Dynamic Bucket (bucket = -1): Not supported. Native Writer currently only supports fixed bucket tables.
  • Inline Compaction: Not supported. Native Writer operates in write-only mode without compaction capabilities during the write process.
  • Memory Spill: Not supported. Native Writer lacks disk spill logic when memory buffers are exhausted.
  • File Formats: Strictly limited to parquet for both data files and manifest files.

5) Session variables and write tunables

This PR introduces/uses Paimon-related session controls for write behavior and performance tuning.

Execution Path & Shuffle Control

  • enable_paimon_jni_writer (Default: false): Switch to JNI writer path instead of native paimon-cpp.
  • enable_paimon_distributed_bucket_shuffle (Default: true): Enable distributed bucket shuffle for Paimon tables.

Memory & Buffer Tuning

  • paimon_global_memory_pool_size (Default: 1073741824L / 1GB): Global memory pool size for Paimon writer.
  • paimon_write_buffer_size (Default: 268435456L / 256MB): Memory buffer size per writer.
  • enable_paimon_adaptive_buffer_size (Default: false): Enable adaptive adjustment of buffer sizes.
  • paimon_writer_queue_size (Default: 3): Queue size for writer tasks.
  • paimon_target_file_size (Default: 268435456L / 256MB): Target size for generated data files.

JNI Spill & Compaction Control (Only applicable when enable_paimon_jni_writer=true)

  • enable_paimon_jni_compact (Default: true): Enable inline compaction during JNI writing.
  • enable_paimon_jni_spill (Default: false): Enable memory spilling to disk.
  • paimon_spill_max_disk_size (Default: 53687091200L / 50GB): Maximum disk space allowed for spilling.
  • paimon_spill_sort_buffer_size (Default: 67108864L / 64MB): Buffer size for sorting during spill.
  • paimon_spill_sort_threshold (Default: 10): Threshold triggering sort operation.
  • paimon_spill_compression (Default: "zstd"): Compression codec used for spilled files.

6) Compatibility/alignment work

  • Align merged code with current community branch conventions and dependencies.
  • Resolve compile/checkstyle conflicts introduced by branch differences.
  • Ensure FE build passes under Maven >= 3.9 environment.

7) Current recommendation for production usage

  • Prefer JNI Writer in production for now: it is currently the safer path for large-scale write workloads.
  • Enable spill for large data volume: for big batches, it is recommended to set enable_paimon_jni_spill = true to reduce memory pressure and avoid excessive memory usage.
  • Native Writer roadmap dependency: Native Writer can be the default high-performance path later, but full production readiness for heavy workloads still depends on paimon-cpp upstream support for compaction and spill.

@FayneBupt FayneBupt requested a review from yiguolei as a code owner April 22, 2026 02:00
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@FayneBupt FayneBupt changed the title Feature/support paimon insert 4.1 [feat](paimon) support paimon insert by jni&paimon-cpp Apr 22, 2026
@FayneBupt
Copy link
Copy Markdown
Author

run buildall

1 similar comment
@FayneBupt
Copy link
Copy Markdown
Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 35.81% (207/578) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 6.54% (136/2081) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.25% (26299/36912)
Line Coverage 54.24% (278859/514120)
Region Coverage 51.58% (231697/449183)
Branch Coverage 52.90% (100101/189218)

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 3.57% (45/1261) 🎉
Increment coverage report
Complete coverage report

@FayneBupt
Copy link
Copy Markdown
Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 78.05% (1842/2360)
Line Coverage 64.80% (33007/50936)
Region Coverage 65.30% (16370/25070)
Branch Coverage 55.90% (8743/15640)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 5.67% (118/2081) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.08% (20034/37744)
Line Coverage 36.57% (188869/516509)
Region Coverage 32.95% (146914/445857)
Branch Coverage 33.99% (64240/188997)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 6.54% (136/2081) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.30% (26347/36950)
Line Coverage 54.31% (279582/514769)
Region Coverage 51.71% (232570/449761)
Branch Coverage 52.99% (100411/189503)

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 7.76% (45/580) 🎉
Increment coverage report
Complete coverage report

@FayneBupt
Copy link
Copy Markdown
Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 51.38% (297/578) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 53.29% (308/578) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 25.25% (531/2103) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.23% (20097/37752)
Line Coverage 36.68% (189493/516668)
Region Coverage 33.02% (147294/446039)
Branch Coverage 34.09% (64450/189073)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 26.11% (549/2103) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.46% (26412/36958)
Line Coverage 54.44% (280314/514928)
Region Coverage 51.68% (232539/449943)
Branch Coverage 53.13% (100722/189579)

@FayneBupt
Copy link
Copy Markdown
Author

run buildall

3 similar comments
@FayneBupt
Copy link
Copy Markdown
Author

run buildall

@FayneBupt
Copy link
Copy Markdown
Author

run buildall

@morningman
Copy link
Copy Markdown
Contributor

run buildall

@Gabriel39
Copy link
Copy Markdown
Contributor

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found several blocking issues in the PR. Critical checkpoints: build compatibility is not satisfied because Paimon sink symbols are referenced when ENABLE_PAIMON_CPP is disabled and the rawlog compat library is linked unconditionally; runtime correctness has risks in native writer format handling and HDFS overwrite semantics; memory-safety under allocation failure is not satisfied in the Arrow memory pool adapter. Existing review context had no prior inline threads, and there was no additional user-provided review focus beyond a complete PR review.

Comment thread be/src/exec/CMakeLists.txt Outdated
Comment thread be/src/service/CMakeLists.txt
Comment thread be/src/exec/sink/writer/paimon/vpaimon_partition_writer.cpp
Comment thread be/src/vec/sink/writer/paimon/paimon_doris_hdfs_file_system.cpp Outdated
Comment thread be/src/vec/sink/vpaimon_table_writer.cpp Outdated
Comment thread be/src/exec/sink/writer/paimon/vpaimon_table_writer.cpp
@FayneBupt FayneBupt force-pushed the feature/support_paimon_insert_4.1 branch from 53063d8 to 2487ebb Compare May 7, 2026 09:38
@FayneBupt
Copy link
Copy Markdown
Author

run buildall

2 similar comments
@FayneBupt
Copy link
Copy Markdown
Author

run buildall

@FayneBupt
Copy link
Copy Markdown
Author

run buildall

@FayneBupt
Copy link
Copy Markdown
Author

run p0

@FayneBupt
Copy link
Copy Markdown
Author

run NonConcurrent

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 27.51% (532/1934) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.38% (20228/37896)
Line Coverage 36.92% (191002/517407)
Region Coverage 33.22% (148471/446995)
Branch Coverage 34.30% (64997/189520)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 28.02% (542/1934) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.45% (26494/37079)
Line Coverage 54.50% (280946/515523)
Region Coverage 51.88% (233888/450863)
Branch Coverage 53.13% (100956/190014)

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 64.46% (361/560) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 28.02% (542/1934) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.44% (26491/37079)
Line Coverage 54.49% (280931/515523)
Region Coverage 51.87% (233865/450863)
Branch Coverage 53.13% (100953/190014)

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 64.46% (361/560) 🎉
Increment coverage report
Complete coverage report

@FayneBupt
Copy link
Copy Markdown
Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 78.11% (1848/2366)
Line Coverage 64.85% (33223/51232)
Region Coverage 65.32% (16438/25167)
Branch Coverage 55.95% (8790/15710)

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 50.27% (278/553) 🎉
Increment coverage report
Complete coverage report

@FayneBupt FayneBupt force-pushed the feature/support_paimon_insert_4.1 branch from 48970b4 to f1152d9 Compare May 13, 2026 07:12
@FayneBupt
Copy link
Copy Markdown
Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 78.11% (1848/2366)
Line Coverage 64.85% (33225/51232)
Region Coverage 65.34% (16445/25167)
Branch Coverage 55.88% (8779/15710)

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 50.27% (278/553) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 7.04% (39/554) 🎉
Increment coverage report
Complete coverage report

@FayneBupt
Copy link
Copy Markdown
Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 50.27% (278/553) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 8.32% (46/553) 🎉
Increment coverage report
Complete coverage report

@FayneBupt FayneBupt force-pushed the feature/support_paimon_insert_4.1 branch from ebc64c5 to 217a176 Compare May 14, 2026 02:18
@FayneBupt
Copy link
Copy Markdown
Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 78.11% (1848/2366)
Line Coverage 64.91% (33254/51232)
Region Coverage 65.40% (16459/25167)
Branch Coverage 55.96% (8792/15710)

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 50.27% (278/553) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 8.32% (46/553) 🎉
Increment coverage report
Complete coverage report

@FayneBupt FayneBupt force-pushed the feature/support_paimon_insert_4.1 branch from 217a176 to 9484591 Compare May 14, 2026 06:49
@FayneBupt
Copy link
Copy Markdown
Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 78.11% (1848/2366)
Line Coverage 64.86% (33229/51232)
Region Coverage 65.36% (16449/25167)
Branch Coverage 55.94% (8788/15710)

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 65.10% (360/553) 🎉
Increment coverage report
Complete coverage report

@FayneBupt
Copy link
Copy Markdown
Author

run p0

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 65.10% (360/553) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants