branch-4.1: [feat](paimon) integrate paimon-cpp reader #60676 #60795#61379
Open
xylaaaaa wants to merge 2 commits intoapache:branch-4.1from
Open
branch-4.1: [feat](paimon) integrate paimon-cpp reader #60676 #60795#61379xylaaaaa wants to merge 2 commits intoapache:branch-4.1from
xylaaaaa wants to merge 2 commits intoapache:branch-4.1from
Conversation
Issue Number: apache#56005 Co-authored-by: morningman <yunyou@selectdb.com>
…der (apache#60795) ## Problem Followup apache#60676 When FE does not pass full table options in scan ranges, paimon-cpp may default manifest.format to avro. For non-avro environments, this can fail in PaimonCppReader initialization with: Could not find a FileFormatFactory implementation class for format avro. ## Solution In PaimonCppReader::_build_options, if split-level file_format exists and table options are missing/empty: - set file.format from split file_format - set manifest.format from split file_format This keeps paimon-cpp format resolution consistent with the actual split format and avoids unintended avro fallback. ## Verification - Incremental BE build succeeded for doris_be target. - Change scope is limited to be/src/vec/exec/format/table/paimon_cpp_reader.cpp.
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
Author
|
run buildall |
There was a problem hiding this comment.
Pull request overview
Cherry-pick to branch-4.1 to integrate the paimon-cpp reader path (plus follow-up fix to infer manifest format from split file format) so BE can read Paimon splits via native C++ instead of the JNI scanner, controlled by a new session/query option.
Changes:
- Add
enable_paimon_cpp_readersession variable + thriftTQueryOptionsplumbing to toggle the BE reader selection. - Implement BE-side paimon-cpp scan path:
PaimonCppReader, Doris-backed paimon file system, and predicate pushdown conversion into paimon-cpp predicates. - Add coverage via a new BE unit test and a regression test comparing JNI vs paimon-cpp results.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| regression-test/suites/external_table_p0/paimon/test_paimon_cpp_reader.groovy | Regression test comparing results between JNI and paimon-cpp paths. |
| gensrc/thrift/PaloInternalService.thrift | Adds enable_paimon_cpp_reader query option to thrift. |
| fe/fe-core/src/main/java/org/apache/doris/qe/SessionVariable.java | Adds session variable + thrift serialization + ignore split type update. |
| fe/fe-core/src/main/java/org/apache/doris/datasource/paimon/source/PaimonSource.java | Exposes table location for BE paimon-cpp reader. |
| fe/fe-core/src/main/java/org/apache/doris/datasource/paimon/source/PaimonScanNode.java | Encodes DataSplit for paimon-cpp and passes table location in scan range. |
| fe/fe-core/src/main/java/org/apache/doris/datasource/paimon/PaimonUtil.java | Adds native DataSplit serialization (standard Base64) for BE. |
| fe/be-java-extensions/paimon-scanner/.../PaimonUtils.java | Adds Base64 decoding fallback for split deserialization. |
| be/src/vec/exec/scan/file_scanner.cpp | Selects paimon-cpp reader when enabled; wires predicate pushdown and safer JNI cast. |
| be/src/vec/exec/format/table/paimon_cpp_reader.{h,cpp} | New paimon-cpp reader implementation. |
| be/src/vec/exec/format/table/paimon_predicate_converter.{h,cpp} | Converts Doris conjuncts into paimon-cpp predicates. |
| be/src/vec/exec/format/table/paimon_doris_file_system.{h,cpp} | Registers a doris-backed filesystem for paimon-cpp. |
| be/test/vec/exec/format/table/paimon_cpp_reader_test.cpp | New unit tests for count-pushdown and init validation. |
| be/cmake/thirdparty.cmake | Adds paimon-cpp static libs under ENABLE_PAIMON_CPP. |
| be/CMakeLists.txt | Introduces ENABLE_PAIMON_CPP option and paimon/arrow linkage wiring. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+217
to
+221
| if (expr->op() == TExprOpcode::EQ_FOR_NULL) { | ||
| return paimon::PredicateBuilder::IsNull( | ||
| field_meta->index, field_meta->slot_desc->col_name(), field_meta->field_type); | ||
| } | ||
|
|
Comment on lines
+289
to
+299
| const std::string& pattern = *pattern_opt; | ||
| if (!pattern.empty() && pattern.front() == '%') { | ||
| return nullptr; | ||
| } | ||
| if (pattern.empty() || pattern.back() != '%') { | ||
| return nullptr; | ||
| } | ||
|
|
||
| std::string prefix = pattern.substr(0, pattern.size() - 1); | ||
| paimon::Literal lower_literal(paimon::FieldType::STRING, prefix.data(), prefix.size()); | ||
| auto lower_pred = paimon::PredicateBuilder::GreaterOrEqual( |
Comment on lines
+67
to
+98
| std::shared_ptr<paimon::Predicate> PaimonPredicateConverter::build( | ||
| const VExprContextSPtrs& conjuncts) { | ||
| std::vector<std::shared_ptr<paimon::Predicate>> predicates; | ||
| predicates.reserve(conjuncts.size()); | ||
| for (const auto& conjunct : conjuncts) { | ||
| if (!conjunct || !conjunct->root()) { | ||
| continue; | ||
| } | ||
| auto root = conjunct->root(); | ||
| if (root->is_rf_wrapper()) { | ||
| if (auto impl = root->get_impl()) { | ||
| root = impl; | ||
| } | ||
| } | ||
| auto predicate = _convert_expr(root); | ||
| if (predicate) { | ||
| predicates.emplace_back(std::move(predicate)); | ||
| } | ||
| } | ||
|
|
||
| if (predicates.empty()) { | ||
| return nullptr; | ||
| } | ||
| if (predicates.size() == 1) { | ||
| return predicates.front(); | ||
| } | ||
| auto and_result = paimon::PredicateBuilder::And(predicates); | ||
| if (!and_result.ok()) { | ||
| return nullptr; | ||
| } | ||
| return std::move(and_result).value(); | ||
| } |
Comment on lines
+149
to
+155
| option(ENABLE_PAIMON_CPP "Enable Paimon C++ integration" ON) | ||
| set(PAIMON_HOME "" CACHE PATH "Paimon install prefix") | ||
|
|
||
| # Allow env to override when reconfiguring (avoid picking /usr/local). | ||
| if (DEFINED ENV{ENABLE_PAIMON_CPP}) | ||
| set(ENABLE_PAIMON_CPP "$ENV{ENABLE_PAIMON_CPP}" CACHE BOOL "" FORCE) | ||
| endif() |
Comment on lines
+183
to
+196
| if (ENABLE_PAIMON_CPP) | ||
| add_thirdparty(paimon LIB64) | ||
| add_thirdparty(paimon_parquet_file_format LIB64) | ||
| add_thirdparty(paimon_orc_file_format LIB64) | ||
| add_thirdparty(paimon_blob_file_format LIB64) | ||
| add_thirdparty(paimon_local_file_system LIB64) | ||
| add_thirdparty(paimon_file_index LIB64) | ||
| add_thirdparty(paimon_global_index LIB64) | ||
|
|
||
| add_thirdparty(roaring_bitmap_paimon LIB64) | ||
| add_thirdparty(xxhash_paimon LIB64) | ||
| add_thirdparty(fmt_paimon LIB64) | ||
| add_thirdparty(tbb_paimon LIB64) | ||
| endif() |
Comment on lines
+72
to
+74
| #include "vec/exec/format/table/paimon_cpp_reader.h" | ||
| #include "vec/exec/format/table/paimon_jni_reader.h" | ||
| #include "vec/exec/format/table/paimon_predicate_converter.h" |
Comment on lines
+1000
to
+1003
| if (_state->query_options().__isset.enable_paimon_cpp_reader && | ||
| _state->query_options().enable_paimon_cpp_reader) { | ||
| auto cpp_reader = PaimonCppReader::create_unique(_file_slot_descs, _state, | ||
| _profile, range, _params); |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Conflict Resolution
gensrc/thrift/PaloInternalService.thrift: kept both new fields from branch-4.1 and the PR (200:enable_adjust_conjunct_order_by_cost, 201:enable_paimon_cpp_reader, 202:single_backend_query)