Skip to content

[feat](paimon) backport paimon-cpp reader integration chain for 2.1.11-rc01#61125

Merged
morningman merged 3 commits intoapache:tmp-branch-2.1.11-rc01-paimon-cppfrom
xylaaaaa:backport/paimon-cppreader-2.1.11-rc01
Mar 10, 2026
Merged

[feat](paimon) backport paimon-cpp reader integration chain for 2.1.11-rc01#61125
morningman merged 3 commits intoapache:tmp-branch-2.1.11-rc01-paimon-cppfrom
xylaaaaa:backport/paimon-cppreader-2.1.11-rc01

Conversation

@xylaaaaa
Copy link
Contributor

@xylaaaaa xylaaaaa commented Mar 8, 2026

Backport and integrate PR chain into tmp-branch-2.1.11-rc01-paimon-cpp:

- apache#60296

- apache#60676

- apache#60711

- apache#60730

- apache#60795

- apache#60946

Not included: apache#60876
Copilot AI review requested due to automatic review settings March 8, 2026 13:24
@xylaaaaa xylaaaaa requested a review from yiguolei as a code owner March 8, 2026 13:24
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Backports the paimon-cpp reader integration chain into the release branch, including thirdparty build integration, FE/BE wiring to select the C++ reader, and accompanying test coverage.

Changes:

  • Integrate paimon-cpp (git-based) + required thirdparty updates (RapidJSON, pugixml) and Arrow 17 patches into the thirdparty toolchain.
  • Add FE session/query option (enable_paimon_cpp_reader) and propagate Paimon table location + split serialization suitable for paimon-cpp.
  • Add BE paimon-cpp reader, predicate pushdown converter, Doris-backed paimon file system factory, plus regression/unit tests.

Reviewed changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
thirdparty/vars.sh Adds paimon-cpp + pugixml thirdparty entries; updates RapidJSON source to match paimon-cpp needs.
thirdparty/patches/paimon-cpp-buildutils-static-deps.patch Patches paimon-cpp build to better support static-only deps and optionally reuse external Arrow.
thirdparty/patches/apache-arrow-17.0.0-paimon.patch Applies Arrow 17 parquet/thrift-related patches needed by paimon-cpp integration.
thirdparty/paimon-cpp-cache.cmake CMake initial cache to reuse Doris thirdparty libs (notably Arrow) when building paimon-cpp.
thirdparty/download-thirdparty.sh Adds git-repo cloning flow for git-based thirdparty packages and patches paimon-cpp + Arrow 17.
thirdparty/build-thirdparty.sh Builds Arrow with additional modules; adds pugixml + paimon-cpp build/install steps; adds post-build source cleanup.
regression-test/suites/external_table_p0/paimon/test_paimon_cpp_reader.groovy Regression validating JNI vs paimon-cpp reader result parity.
gensrc/thrift/PaloInternalService.thrift Adds thrift query option field for enabling paimon-cpp reader.
fe/fe-core/src/main/java/org/apache/doris/qe/SessionVariable.java Adds enable_paimon_cpp_reader session var and forwards it into TQueryOptions.
fe/fe-core/src/main/java/org/apache/doris/datasource/paimon/source/PaimonSource.java Adds table location resolution for paimon-cpp.
fe/fe-core/src/main/java/org/apache/doris/datasource/paimon/source/PaimonScanNode.java Encodes DataSplit using Paimon native serialization when paimon-cpp is enabled; passes table location.
fe/be-java-extensions/paimon-scanner/src/main/java/org/apache/doris/paimon/PaimonUtils.java Adds Base64 decoding fallback (URL-safe → standard) for split deserialization compatibility.
be/test/vec/exec/format/table/paimon_cpp_reader_test.cpp Unit tests for count pushdown path and missing split error handling.
be/src/vec/exec/scan/vfile_scanner.cpp Selects paimon-cpp reader under query option and wires predicate conversion for pushdown conjuncts.
be/src/vec/exec/format/table/paimon_predicate_converter.h Declares converter from Doris VExpr conjuncts to paimon::Predicate.
be/src/vec/exec/format/table/paimon_predicate_converter.cpp Implements predicate conversion (binary comparisons, IN, IS NULL, LIKE-prefix).
be/src/vec/exec/format/table/paimon_doris_file_system.h Declares a force-link registration hook for paimon-cpp FS factory.
be/src/vec/exec/format/table/paimon_doris_file_system.cpp Implements Doris-backed paimon::FileSystem using Doris FileFactory/FSProperties.
be/src/vec/exec/format/table/paimon_cpp_reader.h Declares paimon-cpp GenericReader implementation.
be/src/vec/exec/format/table/paimon_cpp_reader.cpp Implements split decode, option mapping, Arrow RecordBatch import, and COUNT pushdown via table-level row count.
be/cmake/thirdparty.cmake Adds pugixml + paimon-cpp libs to BE thirdparty linkage when enabled.
be/CMakeLists.txt Adds ENABLE_PAIMON_CPP + Arrow library selection logic and whole-archive linkage for paimon factory libs.
.github/workflows/build-thirdparty.yml Adjusts thirdparty CI workflow (disables space maximization; lowers build parallelism).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +51 to +52
std::ranges::transform(value, value.begin(),
[](unsigned char c) { return static_cast<char>(std::tolower(c)); });
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paimon_doris_file_system.cpp uses C++20 APIs (std::ranges::transform), but Doris BE is built with C++17 in this branch (e.g., other thirdparty builds explicitly set CPP_STANDARD=17). This will fail to compile on C++17 toolchains. Replace with C++17 equivalents (e.g., std::transform with iterators).

Suggested change
std::ranges::transform(value, value.begin(),
[](unsigned char c) { return static_cast<char>(std::tolower(c)); });
std::transform(value.begin(), value.end(), value.begin(),
[](unsigned char c) { return static_cast<char>(std::tolower(c)); });

Copilot uses AI. Check for mistakes.
}

std::string normalize_local_path(const std::string& path) {
if (!path.starts_with("file:")) {
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C++20-only APIs are used here (e.g., std::string::starts_with). If BE is compiled as C++17 on this branch, this will not compile. Please switch to a C++17-compatible prefix check (e.g., rfind("file:", 0) == 0 / compare(0, ...) == 0).

Suggested change
if (!path.starts_with("file:")) {
if (path.compare(0, 5, "file:") != 0) {

Copilot uses AI. Check for mistakes.
Comment on lines +610 to +611
if (type == doris::TFileType::FILE_HTTP && !options_.contains("uri") &&
!uri.scheme.empty()) {
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C++20-only API std::map::contains is used in this condition. If the BE toolchain is C++17, this will not compile; use options_.find("uri") == options_.end() instead.

Suggested change
if (type == doris::TFileType::FILE_HTTP && !options_.contains("uri") &&
!uri.scheme.empty()) {
if (type == doris::TFileType::FILE_HTTP &&
options_.find("uri") == options_.end() && !uri.scheme.empty()) {

Copilot uses AI. Check for mistakes.
Comment on lines +145 to 147
#./build-thirdparty.sh -j "$(nproc)"
./build-thirdparty.sh -j 2

Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow now hard-codes the thirdparty build to ./build-thirdparty.sh -j 2 (and comments out the -j "$(nproc)" line). This can greatly slow CI and may lead to job timeouts compared to the prior configuration; consider restoring $(nproc) or making the reduced parallelism conditional (e.g., only for troubleshooting).

Suggested change
#./build-thirdparty.sh -j "$(nproc)"
./build-thirdparty.sh -j 2
./build-thirdparty.sh -j "$(nproc)"

Copilot uses AI. Check for mistakes.
Comment on lines +67 to +77
#- name: Maximize build space
# uses: ./maximize-build-space
# with:
# root-reserve-mb: 4096
# temp-reserve-mb: 4096
# swap-size-mb: 8192
# remove-dotnet: 'true'
# remove-android: 'true'
# remove-haskell: 'true'
# remove-codeql: 'true'
# remove-docker-images: 'true'
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The disk-space maximization step has been commented out. If thirdparty builds still approach runner disk limits, this may cause intermittent failures; consider keeping it enabled or replacing it with an alternative space management step.

Suggested change
#- name: Maximize build space
# uses: ./maximize-build-space
# with:
# root-reserve-mb: 4096
# temp-reserve-mb: 4096
# swap-size-mb: 8192
# remove-dotnet: 'true'
# remove-android: 'true'
# remove-haskell: 'true'
# remove-codeql: 'true'
# remove-docker-images: 'true'
- name: Maximize build space
uses: ./maximize-build-space
with:
root-reserve-mb: 4096
temp-reserve-mb: 4096
swap-size-mb: 8192
remove-dotnet: 'true'
remove-android: 'true'
remove-haskell: 'true'
remove-codeql: 'true'
remove-docker-images: 'true'

Copilot uses AI. Check for mistakes.
@morningman morningman merged commit aa776d0 into apache:tmp-branch-2.1.11-rc01-paimon-cpp Mar 10, 2026
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants