Skip to content

[feature](paimon) implemet INSERT INTO for Paimon external tables#61463

Open
yx-keith wants to merge 4 commits intoapache:masterfrom
yx-keith:implement-paimon-insert
Open

[feature](paimon) implemet INSERT INTO for Paimon external tables#61463
yx-keith wants to merge 4 commits intoapache:masterfrom
yx-keith:implement-paimon-insert

Conversation

@yx-keith
Copy link
Contributor

@yx-keith yx-keith commented Mar 18, 2026

What problem does this PR solve?

Issue Number: #56005

Related PR: #xxx

Problem Summary:

This PR implements write support for Apache Paimon external tables in Doris, enabling INSERT INTO and INSERT OVERWRITE from the SQL layer.

Implementation ideas

BE evaluates output expressions, writes ORC/Parquet files directly to Paimon table storage (S3/HDFS/MinIO), and reports file metadata (TPaimonCommitData: file path, row count, file size, partition values) back to FE via the existing fragment status report mechanism.

FE collects TPaimonCommitData from all BE nodes, constructs Paimon CommitMessage objects (using DataFileMeta.forAppend), and commits them via the Paimon Java API (BatchTableCommit.commit()), creating a proper Paimon snapshot.

Changes
Thrift / Protocol

DataSinks.thrift: add TPaimonTableSink, TPaimonCommitData
FrontendService.thrift: add paimon_commit_datas field to TReportExecStatusParams

FE — Nereids planner

UnboundPaimonTableSink → LogicalPaimonTableSink → PhysicalPaimonTableSink plan nodes
BindSink: bind rule for Paimon sink
LogicalPaimonTableSinkToPhysicalPaimonTableSink: implementation rule
PhysicalPlanTranslator.visitPhysicalPaimonTableSink: translates to legacy PlanFragment with correct outputExprs
InsertUtils: recognize UnboundPaimonTableSink in getTableName()
InsertIntoTableCommand: route Paimon tables to PaimonInsertExecutor

FE — execution / transaction
PaimonTableSink: legacy DataSink that builds TPaimonTableSink (output path, file format, compression, column descriptors, hadoop config)
PaimonTransaction: collects TPaimonCommitData from BEs, builds CommitMessage list, commits via Paimon Java API; supports overwrite via BatchWriteBuilder.withOverwrite()
PaimonTransactionManager / PaimonInsertExecutor / PaimonInsertCommandContext
Coordinator: process paimon_commit_datas from BE fragment reports

BE — pipeline
PaimonTableSinkOperatorX / PaimonTableSinkLocalState: pipeline sink operator
VPaimonTableWriter: dispatches rows to per-partition writers using _vec_output_expr_ctxs
VPaimonPartitionWriter: writes ORC/Parquet via VFileWriterWrapper, reports TPaimonCommitData on close
Supported
INSERT INTO ... VALUES
INSERT INTO ... SELECT
INSERT OVERWRITE TABLE
Non-partitioned and partitioned tables
File formats: Parquet (default), ORC
Storage backends: S3, HDFS, MinIO

Tests
PaimonTableSinkTest: unit tests for bindDataSink() thrift generation (format, compression, partition column tagging, overwrite flag, hadoop config)
PaimonTransactionTest: unit tests for transaction lifecycle — updatePaimonCommitData, getUpdateCnt, commit, rollback, overwrite context, error handling
test_paimon_write_insert.groovy: regression test covering basic insert, partitioned insert, INSERT OVERWRITE, all scalar types, partial column list, and both Parquet/ORC formats

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@yx-keith yx-keith changed the title implement insert to paimon table [feature](paimon) implement insert to paimon table Mar 18, 2026
@yx-keith yx-keith changed the title [feature](paimon) implement insert to paimon table [feature](paimon) implemet INSERT INTO for Paimon external tables Mar 18, 2026
@morningman morningman self-assigned this Mar 23, 2026
@morningman
Copy link
Contributor

Hi @yx-keith , thanks for contribution.
But we need to discuss about it because other team is also working on this.

@yx-keith
Copy link
Contributor Author

Hi @yx-keith , thanks for contribution. But we need to discuss about it because other team is also working on this.

OK

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants