Skip to content

branch-4.1: [feat](paimon) integrate paimon-cpp reader #60676 #60795#61379

Open
xylaaaaa wants to merge 2 commits intoapache:branch-4.1from
xylaaaaa:auto-pick-60676-branch-4.1
Open

branch-4.1: [feat](paimon) integrate paimon-cpp reader #60676 #60795#61379
xylaaaaa wants to merge 2 commits intoapache:branch-4.1from
xylaaaaa:auto-pick-60676-branch-4.1

Conversation

@xylaaaaa
Copy link
Contributor

Summary

Conflict Resolution

  • gensrc/thrift/PaloInternalService.thrift: kept both new fields from branch-4.1 and the PR (200: enable_adjust_conjunct_order_by_cost, 201: enable_paimon_cpp_reader, 202: single_backend_query)

xylaaaaa and others added 2 commits March 16, 2026 16:09
Issue Number: apache#56005

Co-authored-by: morningman <yunyou@selectdb.com>
…der (apache#60795)

## Problem
Followup apache#60676

When FE does not pass full table options in scan ranges, paimon-cpp may
default manifest.format to avro.
For non-avro environments, this can fail in PaimonCppReader
initialization with:
Could not find a FileFormatFactory implementation class for format avro.

## Solution
In PaimonCppReader::_build_options, if split-level file_format exists
and table options are missing/empty:
- set file.format from split file_format
- set manifest.format from split file_format

This keeps paimon-cpp format resolution consistent with the actual split
format and avoids unintended avro fallback.

## Verification
- Incremental BE build succeeded for doris_be target.
- Change scope is limited to
be/src/vec/exec/format/table/paimon_cpp_reader.cpp.
@xylaaaaa xylaaaaa requested a review from yiguolei as a code owner March 16, 2026 08:10
Copilot AI review requested due to automatic review settings March 16, 2026 08:10
@Thearas
Copy link
Contributor

Thearas commented Mar 16, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@xylaaaaa
Copy link
Contributor Author

run buildall

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Cherry-pick to branch-4.1 to integrate the paimon-cpp reader path (plus follow-up fix to infer manifest format from split file format) so BE can read Paimon splits via native C++ instead of the JNI scanner, controlled by a new session/query option.

Changes:

  • Add enable_paimon_cpp_reader session variable + thrift TQueryOptions plumbing to toggle the BE reader selection.
  • Implement BE-side paimon-cpp scan path: PaimonCppReader, Doris-backed paimon file system, and predicate pushdown conversion into paimon-cpp predicates.
  • Add coverage via a new BE unit test and a regression test comparing JNI vs paimon-cpp results.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
regression-test/suites/external_table_p0/paimon/test_paimon_cpp_reader.groovy Regression test comparing results between JNI and paimon-cpp paths.
gensrc/thrift/PaloInternalService.thrift Adds enable_paimon_cpp_reader query option to thrift.
fe/fe-core/src/main/java/org/apache/doris/qe/SessionVariable.java Adds session variable + thrift serialization + ignore split type update.
fe/fe-core/src/main/java/org/apache/doris/datasource/paimon/source/PaimonSource.java Exposes table location for BE paimon-cpp reader.
fe/fe-core/src/main/java/org/apache/doris/datasource/paimon/source/PaimonScanNode.java Encodes DataSplit for paimon-cpp and passes table location in scan range.
fe/fe-core/src/main/java/org/apache/doris/datasource/paimon/PaimonUtil.java Adds native DataSplit serialization (standard Base64) for BE.
fe/be-java-extensions/paimon-scanner/.../PaimonUtils.java Adds Base64 decoding fallback for split deserialization.
be/src/vec/exec/scan/file_scanner.cpp Selects paimon-cpp reader when enabled; wires predicate pushdown and safer JNI cast.
be/src/vec/exec/format/table/paimon_cpp_reader.{h,cpp} New paimon-cpp reader implementation.
be/src/vec/exec/format/table/paimon_predicate_converter.{h,cpp} Converts Doris conjuncts into paimon-cpp predicates.
be/src/vec/exec/format/table/paimon_doris_file_system.{h,cpp} Registers a doris-backed filesystem for paimon-cpp.
be/test/vec/exec/format/table/paimon_cpp_reader_test.cpp New unit tests for count-pushdown and init validation.
be/cmake/thirdparty.cmake Adds paimon-cpp static libs under ENABLE_PAIMON_CPP.
be/CMakeLists.txt Introduces ENABLE_PAIMON_CPP option and paimon/arrow linkage wiring.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +217 to +221
if (expr->op() == TExprOpcode::EQ_FOR_NULL) {
return paimon::PredicateBuilder::IsNull(
field_meta->index, field_meta->slot_desc->col_name(), field_meta->field_type);
}

Comment on lines +289 to +299
const std::string& pattern = *pattern_opt;
if (!pattern.empty() && pattern.front() == '%') {
return nullptr;
}
if (pattern.empty() || pattern.back() != '%') {
return nullptr;
}

std::string prefix = pattern.substr(0, pattern.size() - 1);
paimon::Literal lower_literal(paimon::FieldType::STRING, prefix.data(), prefix.size());
auto lower_pred = paimon::PredicateBuilder::GreaterOrEqual(
Comment on lines +67 to +98
std::shared_ptr<paimon::Predicate> PaimonPredicateConverter::build(
const VExprContextSPtrs& conjuncts) {
std::vector<std::shared_ptr<paimon::Predicate>> predicates;
predicates.reserve(conjuncts.size());
for (const auto& conjunct : conjuncts) {
if (!conjunct || !conjunct->root()) {
continue;
}
auto root = conjunct->root();
if (root->is_rf_wrapper()) {
if (auto impl = root->get_impl()) {
root = impl;
}
}
auto predicate = _convert_expr(root);
if (predicate) {
predicates.emplace_back(std::move(predicate));
}
}

if (predicates.empty()) {
return nullptr;
}
if (predicates.size() == 1) {
return predicates.front();
}
auto and_result = paimon::PredicateBuilder::And(predicates);
if (!and_result.ok()) {
return nullptr;
}
return std::move(and_result).value();
}
Comment on lines +149 to +155
option(ENABLE_PAIMON_CPP "Enable Paimon C++ integration" ON)
set(PAIMON_HOME "" CACHE PATH "Paimon install prefix")

# Allow env to override when reconfiguring (avoid picking /usr/local).
if (DEFINED ENV{ENABLE_PAIMON_CPP})
set(ENABLE_PAIMON_CPP "$ENV{ENABLE_PAIMON_CPP}" CACHE BOOL "" FORCE)
endif()
Comment on lines +183 to +196
if (ENABLE_PAIMON_CPP)
add_thirdparty(paimon LIB64)
add_thirdparty(paimon_parquet_file_format LIB64)
add_thirdparty(paimon_orc_file_format LIB64)
add_thirdparty(paimon_blob_file_format LIB64)
add_thirdparty(paimon_local_file_system LIB64)
add_thirdparty(paimon_file_index LIB64)
add_thirdparty(paimon_global_index LIB64)

add_thirdparty(roaring_bitmap_paimon LIB64)
add_thirdparty(xxhash_paimon LIB64)
add_thirdparty(fmt_paimon LIB64)
add_thirdparty(tbb_paimon LIB64)
endif()
Comment on lines +72 to +74
#include "vec/exec/format/table/paimon_cpp_reader.h"
#include "vec/exec/format/table/paimon_jni_reader.h"
#include "vec/exec/format/table/paimon_predicate_converter.h"
Comment on lines +1000 to +1003
if (_state->query_options().__isset.enable_paimon_cpp_reader &&
_state->query_options().enable_paimon_cpp_reader) {
auto cpp_reader = PaimonCppReader::create_unique(_file_slot_descs, _state,
_profile, range, _params);
@doris-robot
Copy link

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.15% (1788/2259)
Line Coverage 64.53% (31962/49530)
Region Coverage 65.37% (15994/24468)
Branch Coverage 55.92% (8507/15214)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants