[feat](paimon) support paimon insert by jni&paimon-cpp#62689
[feat](paimon) support paimon insert by jni&paimon-cpp#62689FayneBupt wants to merge 42 commits into
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
1 similar comment
|
run buildall |
FE UT Coverage ReportIncrement line coverage |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
3 similar comments
|
run buildall |
|
run buildall |
|
run buildall |
|
/review |
There was a problem hiding this comment.
I found several blocking issues in the PR. Critical checkpoints: build compatibility is not satisfied because Paimon sink symbols are referenced when ENABLE_PAIMON_CPP is disabled and the rawlog compat library is linked unconditionally; runtime correctness has risks in native writer format handling and HDFS overwrite semantics; memory-safety under allocation failure is not satisfied in the Arrow memory pool adapter. Existing review context had no prior inline threads, and there was no additional user-provided review focus beyond a complete PR review.
53063d8 to
2487ebb
Compare
|
run buildall |
2 similar comments
|
run buildall |
|
run buildall |
|
run p0 |
|
run NonConcurrent |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
FE UT Coverage ReportIncrement line coverage |
48970b4 to
f1152d9
Compare
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
ebc64c5 to
217a176
Compare
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
217a176 to
9484591
Compare
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
|
run p0 |
FE Regression Coverage ReportIncrement line coverage |
What problem does this PR solve?
Issue Number: close #56005
Related PR: #xxx
Problem Summary:
This PR implements Doris write-back support for Paimon external tables (
INSERT INTO ...), completing the end-to-end write path across FE and BE, including planning, sink serialization, execution, and FE-side commit.1) Core capability
2) FE/BE write pipeline integration
3) Bucket/partition write behavior
4) Limitations of Native Writer (paimon-cpp)
Currently, the high-performance Native Writer is the default path, but it has some functional limitations compared to the JNI Writer. When these scenarios are encountered, users must fallback to the JNI Writer (
set enable_paimon_jni_writer = true):bucket = -1): Not supported. Native Writer currently only supports fixed bucket tables.parquetfor both data files and manifest files.5) Session variables and write tunables
This PR introduces/uses Paimon-related session controls for write behavior and performance tuning.
Execution Path & Shuffle Control
enable_paimon_jni_writer(Default:false): Switch to JNI writer path instead of native paimon-cpp.enable_paimon_distributed_bucket_shuffle(Default:true): Enable distributed bucket shuffle for Paimon tables.Memory & Buffer Tuning
paimon_global_memory_pool_size(Default:1073741824L/ 1GB): Global memory pool size for Paimon writer.paimon_write_buffer_size(Default:268435456L/ 256MB): Memory buffer size per writer.enable_paimon_adaptive_buffer_size(Default:false): Enable adaptive adjustment of buffer sizes.paimon_writer_queue_size(Default:3): Queue size for writer tasks.paimon_target_file_size(Default:268435456L/ 256MB): Target size for generated data files.JNI Spill & Compaction Control (Only applicable when
enable_paimon_jni_writer=true)enable_paimon_jni_compact(Default:true): Enable inline compaction during JNI writing.enable_paimon_jni_spill(Default:false): Enable memory spilling to disk.paimon_spill_max_disk_size(Default:53687091200L/ 50GB): Maximum disk space allowed for spilling.paimon_spill_sort_buffer_size(Default:67108864L/ 64MB): Buffer size for sorting during spill.paimon_spill_sort_threshold(Default:10): Threshold triggering sort operation.paimon_spill_compression(Default:"zstd"): Compression codec used for spilled files.6) Compatibility/alignment work
7) Current recommendation for production usage
enable_paimon_jni_spill = trueto reduce memory pressure and avoid excessive memory usage.