Skip to content

branch-4.1: [feat](tvf) Support INSERT INTO TVF to export query results to local/HDFS/S3 files#61340

Merged
morningman merged 4 commits intoapache:branch-4.1from
morningman:pick/branch-4.1/60719
Mar 16, 2026
Merged

branch-4.1: [feat](tvf) Support INSERT INTO TVF to export query results to local/HDFS/S3 files#61340
morningman merged 4 commits intoapache:branch-4.1from
morningman:pick/branch-4.1/60719

Conversation

@morningman
Copy link
Contributor

Cherry-pick from #60719

@morningman morningman requested a review from yiguolei as a code owner March 14, 2026 18:11
@Thearas
Copy link
Contributor

Thearas commented Mar 14, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@yiguolei
Copy link
Contributor

run buildall

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.15% (1788/2259)
Line Coverage 64.47% (31930/49530)
Region Coverage 65.31% (15980/24468)
Branch Coverage 55.88% (8501/15214)

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 0.00% (0/198) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.94% (19219/36301)
Line Coverage 36.14% (179079/495485)
Region Coverage 32.74% (138703/423612)
Branch Coverage 33.75% (60310/178695)

@morningman morningman force-pushed the pick/branch-4.1/60719 branch from 49f7f1b to 7c23934 Compare March 16, 2026 03:52
@morningman
Copy link
Contributor Author

run buildall

1 similar comment
@morningman
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.15% (1788/2259)
Line Coverage 64.49% (31943/49530)
Region Coverage 65.32% (15983/24468)
Branch Coverage 55.87% (8500/15214)

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 3.78% (13/344) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 5.58% (12/215) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.95% (19223/36301)
Line Coverage 36.16% (179177/495486)
Region Coverage 32.74% (138699/423612)
Branch Coverage 33.76% (60336/178695)

@yiguolei
Copy link
Contributor

run buildall

@doris-robot
Copy link

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.15% (1788/2259)
Line Coverage 64.50% (31945/49530)
Region Coverage 65.33% (15986/24468)
Branch Coverage 55.90% (8505/15214)

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 5.58% (12/215) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.87% (19230/36371)
Line Coverage 36.15% (179225/495764)
Region Coverage 32.71% (138799/424351)
Branch Coverage 33.75% (60349/178805)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 8.37% (18/215) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.38% (25417/35606)
Line Coverage 54.04% (267428/494878)
Region Coverage 51.63% (221300/428625)
Branch Coverage 53.10% (95284/179448)

morningman and others added 3 commits March 16, 2026 09:52
…HDFS/S3 files (apache#60719)

`OUTFILE` itself is a MySQL-specific syntax.
We should standardize all data access patterns: use `SELECT` for reading
and `INSERT` for writing.

Since a TVF is treated as a table, it should support being written to
via `INSERT`.

From a functionality perspective, `INSERT INTO tvf` is currently similar
to `OUTFILE`.
However, from the standpoint of conceptual consistency, we need to
support `INSERT INTO tvf`.

Add support for INSERT INTO TVF (Table-Valued Function) syntax, allowing
users
to directly export query results into external file systems (local,
HDFS, S3)
in CSV, Parquet, and ORC formats.

- FE: Add ANTLR grammar rule for INSERT INTO TVF syntax, implement
UnboundTVFTableSink, LogicalTVFTableSink, PhysicalTVFTableSink plan
nodes,
  and InsertIntoTVFCommand for query planning and execution.
- BE: Add TVFTableSinkOperator for pipeline execution, VTVFTableWriter
for
async file writing with auto-split support, and
VFileFormatTransformerFactory
  for creating CSV/Parquet/ORC format transformers.
- Support CSV options: column_separator, line_delimiter, compression
(gz/zstd/lz4/snappy).
- Support append mode (default) with file-prefix naming
({prefix}{query_id}_{idx}.{ext}).
- Add error handling for missing required params, unsupported formats,
wildcards
  in file_path, and delete_existing_files on local TVF.

Example SQL:

```
-- Export query results to local BE node as CSV
INSERT INTO local(
    "file_path" = "/tmp/export/basic_csv_",
    "backend_id" = "10001",
    "format" = "csv"
) SELECT * FROM my_table ORDER BY id;

-- Export as Parquet to HDFS
INSERT INTO hdfs(
    "file_path" = "/tmp/test_insert_into_hdfs_tvf/complex_parquet/data_",
    "format" = "parquet",
    "hadoop.username" = "doris",
    "fs.defaultFS" = "hdfs://127.0.0.1:8020",
    "delete_existing_files" = "true"
) SELECT * FROM insert_tvf_complex_src ORDER BY c_int;

-- Export ORC to s3
INSERT INTO s3(
    "uri" = "https://bucket/insert_tvf_test/basic_orc/*",
    "s3.access_key" = "ak",
    "s3.secret_key" = "sk",
    "format" = "orc",
    "region" = "region"
) SELECT c_int, c_varchar, c_string FROM my_table WHERE c_int IS NOT NULL ORDER BY c_int;
```
@morningman morningman force-pushed the pick/branch-4.1/60719 branch from ef5d206 to ca8e952 Compare March 16, 2026 16:52
@morningman
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.15% (1788/2259)
Line Coverage 64.49% (31943/49530)
Region Coverage 65.35% (15991/24468)
Branch Coverage 55.94% (8510/15214)

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 0.00% (0/198) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.89% (19288/36471)
Line Coverage 36.21% (180099/497441)
Region Coverage 32.72% (139274/425665)
Branch Coverage 33.77% (60615/179499)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 1.52% (3/198) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.42% (25501/35706)
Line Coverage 54.15% (268889/496557)
Region Coverage 51.57% (221734/429940)
Branch Coverage 53.14% (95729/180144)

@morningman morningman merged commit a07ed8b into apache:branch-4.1 Mar 16, 2026
22 of 25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants