branch-4.1: [feat](tvf) Support INSERT INTO TVF to export query results to local/HDFS/S3 files#61340
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
49f7f1b to
7c23934
Compare
|
run buildall |
1 similar comment
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
2f9ad68 to
ef5d206
Compare
FE UT Coverage ReportIncrement line coverage |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
…HDFS/S3 files (apache#60719) `OUTFILE` itself is a MySQL-specific syntax. We should standardize all data access patterns: use `SELECT` for reading and `INSERT` for writing. Since a TVF is treated as a table, it should support being written to via `INSERT`. From a functionality perspective, `INSERT INTO tvf` is currently similar to `OUTFILE`. However, from the standpoint of conceptual consistency, we need to support `INSERT INTO tvf`. Add support for INSERT INTO TVF (Table-Valued Function) syntax, allowing users to directly export query results into external file systems (local, HDFS, S3) in CSV, Parquet, and ORC formats. - FE: Add ANTLR grammar rule for INSERT INTO TVF syntax, implement UnboundTVFTableSink, LogicalTVFTableSink, PhysicalTVFTableSink plan nodes, and InsertIntoTVFCommand for query planning and execution. - BE: Add TVFTableSinkOperator for pipeline execution, VTVFTableWriter for async file writing with auto-split support, and VFileFormatTransformerFactory for creating CSV/Parquet/ORC format transformers. - Support CSV options: column_separator, line_delimiter, compression (gz/zstd/lz4/snappy). - Support append mode (default) with file-prefix naming ({prefix}{query_id}_{idx}.{ext}). - Add error handling for missing required params, unsupported formats, wildcards in file_path, and delete_existing_files on local TVF. Example SQL: ``` -- Export query results to local BE node as CSV INSERT INTO local( "file_path" = "/tmp/export/basic_csv_", "backend_id" = "10001", "format" = "csv" ) SELECT * FROM my_table ORDER BY id; -- Export as Parquet to HDFS INSERT INTO hdfs( "file_path" = "/tmp/test_insert_into_hdfs_tvf/complex_parquet/data_", "format" = "parquet", "hadoop.username" = "doris", "fs.defaultFS" = "hdfs://127.0.0.1:8020", "delete_existing_files" = "true" ) SELECT * FROM insert_tvf_complex_src ORDER BY c_int; -- Export ORC to s3 INSERT INTO s3( "uri" = "https://bucket/insert_tvf_test/basic_orc/*", "s3.access_key" = "ak", "s3.secret_key" = "sk", "format" = "orc", "region" = "region" ) SELECT c_int, c_varchar, c_string FROM my_table WHERE c_int IS NOT NULL ORDER BY c_int; ```
ef5d206 to
ca8e952
Compare
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
Cherry-pick from #60719