[feature](paimon) implemet INSERT INTO for Paimon external tables#61463
Open
yx-keith wants to merge 4 commits intoapache:masterfrom
Open
[feature](paimon) implemet INSERT INTO for Paimon external tables#61463yx-keith wants to merge 4 commits intoapache:masterfrom
yx-keith wants to merge 4 commits intoapache:masterfrom
Conversation
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
|
Hi @yx-keith , thanks for contribution. |
Contributor
Author
OK |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
Issue Number: #56005
Related PR: #xxx
Problem Summary:
This PR implements write support for Apache Paimon external tables in Doris, enabling INSERT INTO and INSERT OVERWRITE from the SQL layer.
Implementation ideas
BE evaluates output expressions, writes ORC/Parquet files directly to Paimon table storage (S3/HDFS/MinIO), and reports file metadata (TPaimonCommitData: file path, row count, file size, partition values) back to FE via the existing fragment status report mechanism.
FE collects TPaimonCommitData from all BE nodes, constructs Paimon CommitMessage objects (using DataFileMeta.forAppend), and commits them via the Paimon Java API (BatchTableCommit.commit()), creating a proper Paimon snapshot.
Changes
Thrift / Protocol
DataSinks.thrift: add TPaimonTableSink, TPaimonCommitData
FrontendService.thrift: add paimon_commit_datas field to TReportExecStatusParams
FE — Nereids planner
UnboundPaimonTableSink → LogicalPaimonTableSink → PhysicalPaimonTableSink plan nodes
BindSink: bind rule for Paimon sink
LogicalPaimonTableSinkToPhysicalPaimonTableSink: implementation rule
PhysicalPlanTranslator.visitPhysicalPaimonTableSink: translates to legacy PlanFragment with correct outputExprs
InsertUtils: recognize UnboundPaimonTableSink in getTableName()
InsertIntoTableCommand: route Paimon tables to PaimonInsertExecutor
FE — execution / transaction
PaimonTableSink: legacy DataSink that builds TPaimonTableSink (output path, file format, compression, column descriptors, hadoop config)
PaimonTransaction: collects TPaimonCommitData from BEs, builds CommitMessage list, commits via Paimon Java API; supports overwrite via BatchWriteBuilder.withOverwrite()
PaimonTransactionManager / PaimonInsertExecutor / PaimonInsertCommandContext
Coordinator: process paimon_commit_datas from BE fragment reports
BE — pipeline
PaimonTableSinkOperatorX / PaimonTableSinkLocalState: pipeline sink operator
VPaimonTableWriter: dispatches rows to per-partition writers using _vec_output_expr_ctxs
VPaimonPartitionWriter: writes ORC/Parquet via VFileWriterWrapper, reports TPaimonCommitData on close
Supported
INSERT INTO ... VALUES
INSERT INTO ... SELECT
INSERT OVERWRITE TABLE
Non-partitioned and partitioned tables
File formats: Parquet (default), ORC
Storage backends: S3, HDFS, MinIO
Tests
PaimonTableSinkTest: unit tests for bindDataSink() thrift generation (format, compression, partition column tagging, overwrite flag, hadoop config)
PaimonTransactionTest: unit tests for transaction lifecycle — updatePaimonCommitData, getUpdateCnt, commit, rollback, overwrite context, error handling
test_paimon_write_insert.groovy: regression test covering basic insert, partitioned insert, INSERT OVERWRITE, all scalar types, partial column list, and both Parquet/ORC formats
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)