Skip to content

Refact reader branch#64922

Merged
Gabriel39 merged 2 commits into
apache:refact_reader_branchfrom
Gabriel39:refact_reader_branch
Jun 29, 2026
Merged

Refact reader branch#64922
Gabriel39 merged 2 commits into
apache:refact_reader_branchfrom
Gabriel39:refact_reader_branch

Conversation

@Gabriel39

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: FileScannerV2 could not read Remote Doris Arrow Flight splits because FORMAT_ARROW was not routed to a v2 table reader and no v2-native Remote Doris reader existed. This change adds a Remote Doris TableReader/FileReader implementation for FileScannerV2 that opens Arrow Flight streams directly, builds the file-local schema from planned file slots, materializes Arrow RecordBatch data by column name into the v2 file-local block, applies localized filters through the v2 materialized-reader helper, validates protocol mismatches, and closes Flight resources. FORMAT_ARROW is enabled in FileScannerV2 only for table_format_type=remote_doris so ordinary Arrow stream files stay on the existing path.

### Release note

Support Remote Doris scans in FileScannerV2 when FileScannerV2 is enabled.

### Check List (For Author)

- Test: Manual test
    - BE unit test: attempted PARALLEL=1 ./run-be-ut.sh --run --filter='FileScannerV2Test.*:RemoteDorisV2ReaderTest.*', but the sandbox could not update .git/modules/contrib/datasketches-cpp and network fallback to github.com was unavailable; escalated retries timed out in approval review.
    - Manual test: python3 build-support/run_clang_format.py --clang-format-executable /usr/local/opt/llvm@16/bin/clang-format --style file --inplace false --extensions c,h,C,H,cpp,hpp,cc,hh,c++,h++,cxx,hxx --exclude none <modified files>
- Behavior changed: Yes. Remote Doris FORMAT_ARROW scan ranges can be routed to FileScannerV2.
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary: Remote Doris file scanner v2 exposed complex slot types as top-level file columns without structural children. TableColumnMapper validates complex file schemas before building mappings, so ARRAY/MAP/STRUCT columns failed with malformed schema errors such as expected one ARRAY child but actual children was zero. This change synthesizes semantic children from the Doris slot type for Remote Doris file schema entries, using element for ARRAY, key/value for MAP, and field names for STRUCT. It also adds unit coverage for array, map, and struct schema generation.

### Release note

None

### Check List (For Author)

- Test: Unit Test / Manual test
    - Added RemoteDorisV2ReaderTest.BuildsComplexSchemaChildrenFromSlots
    - Ran git diff --check for the modified files
    - Attempted ./run-be-ut.sh --run --filter='RemoteDorisV2ReaderTest.*' with PARALLEL=4, but the standard script could not run in the sandbox because submodule config writes were denied and the required elevated run approval timed out twice
- Behavior changed: No
- Does this need documentation: No
@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@Gabriel39 Gabriel39 merged commit 278e2e9 into apache:refact_reader_branch Jun 29, 2026
32 of 42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants