From 2ca23ec9290cb46fd97b30a7106bc42b1d4ed420 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Tue, 18 Nov 2025 06:36:42 +0000 Subject: [PATCH 01/11] Initial plan From 48d39448632a7d766f5c4947a503bfbd12fcb240 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Tue, 18 Nov 2025 06:46:25 +0000 Subject: [PATCH 02/11] Add compression option documentation for CSV/JSON exports Co-authored-by: WenyXu <32535939+WenyXu@users.noreply.github.com> --- docs/reference/sql/copy.md | 20 +++++++++++++++++ .../current/reference/sql/copy.md | 22 ++++++++++++++++++- 2 files changed, 41 insertions(+), 1 deletion(-) diff --git a/docs/reference/sql/copy.md b/docs/reference/sql/copy.md index d0cdfa6f6..7526353af 100644 --- a/docs/reference/sql/copy.md +++ b/docs/reference/sql/copy.md @@ -31,6 +31,15 @@ COPY tbl TO '/path/to/file.csv' WITH ( ); ``` +To export data to a compressed CSV or JSON file: + +```sql +COPY tbl TO '/path/to/file.csv.gz' WITH ( + FORMAT = 'csv', + COMPRESSION = 'gzip' +); +``` + #### `WITH` Option `WITH` adds options such as the file `FORMAT` which specifies the format of the exported file. In this example, the format is Parquet; it is a columnar storage format used for big data processing. Parquet efficiently compresses and encodes columnar data for big data analytics. @@ -39,6 +48,7 @@ COPY tbl TO '/path/to/file.csv' WITH ( |---|---|---| | `FORMAT` | Target file(s) format, e.g., JSON, CSV, Parquet | **Required** | | `START_TIME`/`END_TIME`| The time range within which data should be exported. `START_TIME` is inclusive and `END_TIME` is exclusive. | Optional | +| `COMPRESSION` | Compression algorithm for the exported file. Supported value: `gzip`. Only supported for CSV and JSON formats. | Optional | | `TIMESTAMP_FORMAT` | Custom format for timestamp columns when exporting to CSV format. Uses [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) format specifiers (e.g., `'%Y-%m-%d %H:%M:%S'`). Only supported for CSV format. | Optional | | `DATE_FORMAT` | Custom format for date columns when exporting to CSV format. Uses [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) format specifiers (e.g., `'%Y-%m-%d'`). Only supported for CSV format. | Optional | | `TIME_FORMAT` | Custom format for time columns when exporting to CSV format. Uses [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) format specifiers (e.g., `'%H:%M:%S'`). Only supported for CSV format. | Optional | @@ -158,6 +168,7 @@ COPY () TO '' WITH (FORMAT = { 'CSV' | 'JSON' | 'PARQUET' }); | `QUERY` | The SQL SELECT statement to execute | **Required** | | `PATH` | The file path where the output will be written | **Required** | | `FORMAT` | The output file format: 'CSV', 'JSON', or 'PARQUET' | **Required** | +| `COMPRESSION` | Compression algorithm for the exported file. Supported value: `gzip`. Only supported for CSV and JSON formats. | Optional | | `TIMESTAMP_FORMAT` | Custom format for timestamp columns when exporting to CSV format. Uses [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) format specifiers. Only supported for CSV format. | Optional | | `DATE_FORMAT` | Custom format for date columns when exporting to CSV format. Uses [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) format specifiers. Only supported for CSV format. | Optional | | `TIME_FORMAT` | Custom format for time columns when exporting to CSV format. Uses [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) format specifiers. Only supported for CSV format. | Optional | @@ -168,6 +179,15 @@ For example, the following statement exports query results to a CSV file: COPY (SELECT * FROM tbl WHERE host = 'host1') TO '/path/to/file.csv' WITH (FORMAT = 'csv'); ``` +You can also export query results to a compressed file: + +```sql +COPY (SELECT * FROM tbl WHERE host = 'host1') TO '/path/to/file.json.gz' WITH ( + FORMAT = 'json', + COMPRESSION = 'gzip' +); +``` + You can also specify custom date and time formats when exporting to CSV: ```sql diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/copy.md b/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/copy.md index bdae6e7a5..710641af3 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/copy.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/copy.md @@ -17,7 +17,7 @@ COPY tbl TO '/xxx/xxx/output.parquet' WITH (FORMAT = 'parquet'); 命令以 `COPY` 关键字开始,后面跟着要导出数据的表名(本例中为 `tbl`)。 `TO` 指定导出数据的文件路径和名称(本例中为 `/xxx/xxx/output.parquet`)。 -例如,可以使用自定义时间戳和日期格式导出数据到 CSV 文件: +例如,可以使用自定义时间戳和日期格式导出数据到 CSV 文件: ```sql COPY tbl TO '/path/to/file.csv' WITH ( @@ -27,6 +27,15 @@ COPY tbl TO '/path/to/file.csv' WITH ( ); ``` +导出数据到压缩的 CSV 或 JSON 文件: + +```sql +COPY tbl TO '/path/to/file.csv.gz' WITH ( + FORMAT = 'csv', + COMPRESSION = 'gzip' +); +``` + #### `WITH` 选项 `WITH` 可以添加一些选项,比如文件的 `FORMAT` 用来指定导出文件的格式。本例中的格式为 Parquet,它是一种用于大数据处理的列式存储格式。Parquet 为大数据分析高效地压缩和编码列式数据。 @@ -35,6 +44,7 @@ COPY tbl TO '/path/to/file.csv' WITH ( |---|---|---| | `FORMAT` | 目标文件格式,例如 JSON, CSV, Parquet | **是** | | `START_TIME`/`END_TIME`| 需要导出数据的时间范围,时间范围为左闭右开 | 可选 | +| `COMPRESSION` | 导出文件的压缩算法。支持的值:`gzip`。仅支持 CSV 和 JSON 格式。 | 可选 | | `TIMESTAMP_FORMAT` | 导出 CSV 格式时自定义时间戳列的格式。使用 [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) 格式说明符(例如 `'%Y-%m-%d %H:%M:%S'`)。仅支持 CSV 格式。 | 可选 | | `DATE_FORMAT` | 导出 CSV 格式时自定义日期列的格式。使用 [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) 格式说明符(例如 `'%Y-%m-%d'`)。仅支持 CSV 格式。 | 可选 | | `TIME_FORMAT` | 导出 CSV 格式时自定义时间列的格式。使用 [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) 格式说明符(例如 `'%H:%M:%S'`)。仅支持 CSV 格式。 | 可选 | @@ -151,6 +161,7 @@ COPY () TO '' WITH (FORMAT = { 'CSV' | 'JSON' | 'PARQUET' }); | `QUERY` | 要执行的 SQL SELECT 语句 | **是** | | `PATH` | 输出文件的路径 | **是** | | `FORMAT` | 输出文件格式:'CSV'、'JSON' 或 'PARQUET' | **是** | +| `COMPRESSION` | 导出文件的压缩算法。支持的值:`gzip`。仅支持 CSV 和 JSON 格式。 | 可选 | | `TIMESTAMP_FORMAT` | 导出 CSV 格式时自定义时间戳列的格式。使用 [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) 格式说明符。仅支持 CSV 格式。 | 可选 | | `DATE_FORMAT` | 导出 CSV 格式时自定义日期列的格式。使用 [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) 格式说明符。仅支持 CSV 格式。 | 可选 | | `TIME_FORMAT` | 导出 CSV 格式时自定义时间列的格式。使用 [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) 格式说明符。仅支持 CSV 格式。 | 可选 | @@ -161,6 +172,15 @@ COPY () TO '' WITH (FORMAT = { 'CSV' | 'JSON' | 'PARQUET' }); COPY (SELECT * FROM tbl WHERE host = 'host1') TO '/path/to/file.csv' WITH (FORMAT = 'csv'); ``` +也可以将查询结果导出到压缩文件: + +```sql +COPY (SELECT * FROM tbl WHERE host = 'host1') TO '/path/to/file.json.gz' WITH ( + FORMAT = 'json', + COMPRESSION = 'gzip' +); +``` + 也可以在导出到 CSV 时指定自定义日期和时间格式: ```sql From a7447432b941e7c800bb3a5ca634608465007237 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Tue, 18 Nov 2025 06:58:20 +0000 Subject: [PATCH 03/11] Update compression types to include all supported formats (gzip, zstd, bzip2, xz) Co-authored-by: WenyXu <32535939+WenyXu@users.noreply.github.com> --- docs/reference/sql/copy.md | 8 ++++---- .../current/reference/sql/copy.md | 8 ++++---- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/docs/reference/sql/copy.md b/docs/reference/sql/copy.md index 7526353af..377e395b2 100644 --- a/docs/reference/sql/copy.md +++ b/docs/reference/sql/copy.md @@ -36,7 +36,7 @@ To export data to a compressed CSV or JSON file: ```sql COPY tbl TO '/path/to/file.csv.gz' WITH ( FORMAT = 'csv', - COMPRESSION = 'gzip' + compression_type = 'gzip' ); ``` @@ -48,7 +48,7 @@ COPY tbl TO '/path/to/file.csv.gz' WITH ( |---|---|---| | `FORMAT` | Target file(s) format, e.g., JSON, CSV, Parquet | **Required** | | `START_TIME`/`END_TIME`| The time range within which data should be exported. `START_TIME` is inclusive and `END_TIME` is exclusive. | Optional | -| `COMPRESSION` | Compression algorithm for the exported file. Supported value: `gzip`. Only supported for CSV and JSON formats. | Optional | +| `compression_type` | Compression algorithm for the exported file. Supported values: `gzip`, `zstd`, `bzip2`, `xz`. Only supported for CSV and JSON formats. | Optional | | `TIMESTAMP_FORMAT` | Custom format for timestamp columns when exporting to CSV format. Uses [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) format specifiers (e.g., `'%Y-%m-%d %H:%M:%S'`). Only supported for CSV format. | Optional | | `DATE_FORMAT` | Custom format for date columns when exporting to CSV format. Uses [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) format specifiers (e.g., `'%Y-%m-%d'`). Only supported for CSV format. | Optional | | `TIME_FORMAT` | Custom format for time columns when exporting to CSV format. Uses [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) format specifiers (e.g., `'%H:%M:%S'`). Only supported for CSV format. | Optional | @@ -168,7 +168,7 @@ COPY () TO '' WITH (FORMAT = { 'CSV' | 'JSON' | 'PARQUET' }); | `QUERY` | The SQL SELECT statement to execute | **Required** | | `PATH` | The file path where the output will be written | **Required** | | `FORMAT` | The output file format: 'CSV', 'JSON', or 'PARQUET' | **Required** | -| `COMPRESSION` | Compression algorithm for the exported file. Supported value: `gzip`. Only supported for CSV and JSON formats. | Optional | +| `compression_type` | Compression algorithm for the exported file. Supported values: `gzip`, `zstd`, `bzip2`, `xz`. Only supported for CSV and JSON formats. | Optional | | `TIMESTAMP_FORMAT` | Custom format for timestamp columns when exporting to CSV format. Uses [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) format specifiers. Only supported for CSV format. | Optional | | `DATE_FORMAT` | Custom format for date columns when exporting to CSV format. Uses [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) format specifiers. Only supported for CSV format. | Optional | | `TIME_FORMAT` | Custom format for time columns when exporting to CSV format. Uses [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) format specifiers. Only supported for CSV format. | Optional | @@ -184,7 +184,7 @@ You can also export query results to a compressed file: ```sql COPY (SELECT * FROM tbl WHERE host = 'host1') TO '/path/to/file.json.gz' WITH ( FORMAT = 'json', - COMPRESSION = 'gzip' + compression_type = 'gzip' ); ``` diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/copy.md b/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/copy.md index 710641af3..9a9f89c53 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/copy.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/copy.md @@ -32,7 +32,7 @@ COPY tbl TO '/path/to/file.csv' WITH ( ```sql COPY tbl TO '/path/to/file.csv.gz' WITH ( FORMAT = 'csv', - COMPRESSION = 'gzip' + compression_type = 'gzip' ); ``` @@ -44,7 +44,7 @@ COPY tbl TO '/path/to/file.csv.gz' WITH ( |---|---|---| | `FORMAT` | 目标文件格式,例如 JSON, CSV, Parquet | **是** | | `START_TIME`/`END_TIME`| 需要导出数据的时间范围,时间范围为左闭右开 | 可选 | -| `COMPRESSION` | 导出文件的压缩算法。支持的值:`gzip`。仅支持 CSV 和 JSON 格式。 | 可选 | +| `compression_type` | 导出文件的压缩算法。支持的值:`gzip`、`zstd`、`bzip2`、`xz`。仅支持 CSV 和 JSON 格式。 | 可选 | | `TIMESTAMP_FORMAT` | 导出 CSV 格式时自定义时间戳列的格式。使用 [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) 格式说明符(例如 `'%Y-%m-%d %H:%M:%S'`)。仅支持 CSV 格式。 | 可选 | | `DATE_FORMAT` | 导出 CSV 格式时自定义日期列的格式。使用 [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) 格式说明符(例如 `'%Y-%m-%d'`)。仅支持 CSV 格式。 | 可选 | | `TIME_FORMAT` | 导出 CSV 格式时自定义时间列的格式。使用 [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) 格式说明符(例如 `'%H:%M:%S'`)。仅支持 CSV 格式。 | 可选 | @@ -161,7 +161,7 @@ COPY () TO '' WITH (FORMAT = { 'CSV' | 'JSON' | 'PARQUET' }); | `QUERY` | 要执行的 SQL SELECT 语句 | **是** | | `PATH` | 输出文件的路径 | **是** | | `FORMAT` | 输出文件格式:'CSV'、'JSON' 或 'PARQUET' | **是** | -| `COMPRESSION` | 导出文件的压缩算法。支持的值:`gzip`。仅支持 CSV 和 JSON 格式。 | 可选 | +| `compression_type` | 导出文件的压缩算法。支持的值:`gzip`、`zstd`、`bzip2`、`xz`。仅支持 CSV 和 JSON 格式。 | 可选 | | `TIMESTAMP_FORMAT` | 导出 CSV 格式时自定义时间戳列的格式。使用 [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) 格式说明符。仅支持 CSV 格式。 | 可选 | | `DATE_FORMAT` | 导出 CSV 格式时自定义日期列的格式。使用 [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) 格式说明符。仅支持 CSV 格式。 | 可选 | | `TIME_FORMAT` | 导出 CSV 格式时自定义时间列的格式。使用 [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) 格式说明符。仅支持 CSV 格式。 | 可选 | @@ -177,7 +177,7 @@ COPY (SELECT * FROM tbl WHERE host = 'host1') TO '/path/to/file.csv' WITH (FORMA ```sql COPY (SELECT * FROM tbl WHERE host = 'host1') TO '/path/to/file.json.gz' WITH ( FORMAT = 'json', - COMPRESSION = 'gzip' + compression_type = 'gzip' ); ``` From 79518e62ade2caf9a587b608ca48b9cbf6e7036e Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Tue, 18 Nov 2025 07:17:38 +0000 Subject: [PATCH 04/11] Add compression_type support for COPY FROM command Co-authored-by: WenyXu <32535939+WenyXu@users.noreply.github.com> --- docs/reference/sql/copy.md | 10 ++++++++++ .../current/reference/sql/copy.md | 10 ++++++++++ 2 files changed, 20 insertions(+) diff --git a/docs/reference/sql/copy.md b/docs/reference/sql/copy.md index 377e395b2..653a9a2e8 100644 --- a/docs/reference/sql/copy.md +++ b/docs/reference/sql/copy.md @@ -95,10 +95,20 @@ Specifically, if you only have one file to import, you can use the following syn COPY tbl FROM '/path/to/folder/xxx.parquet' WITH (FORMAT = 'parquet'); ``` +To import data from a compressed CSV or JSON file: + +```sql +COPY tbl FROM '/path/to/file.csv.gz' WITH ( + FORMAT = 'csv', + compression_type = 'gzip' +); +``` + | Option | Description | Required | |---|---|---| | `FORMAT` | Target file(s) format, e.g., JSON, CSV, Parquet, ORC | **Required** | | `PATTERN` | Use regex to match files. e.g., `*_today.parquet` | Optional | +| `compression_type` | Compression algorithm for the imported file. Supported values: `gzip`, `zstd`, `bzip2`, `xz`. Only supported for CSV and JSON formats. | Optional | :::tip NOTE The CSV file must have a header row to be imported correctly. The header row should contain the column names of the table. diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/copy.md b/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/copy.md index 9a9f89c53..11d5c2aea 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/copy.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/copy.md @@ -88,10 +88,20 @@ COPY tbl FROM '/path/to/folder/' WITH (FORMAT = 'parquet', PATTERN = '.*parquet. COPY tbl FROM '/path/to/folder/xxx.parquet' WITH (FORMAT = 'parquet'); ``` +从压缩的 CSV 或 JSON 文件导入数据: + +```sql +COPY tbl FROM '/path/to/file.csv.gz' WITH ( + FORMAT = 'csv', + compression_type = 'gzip' +); +``` + | 选项 | 描述 | 是否必需 | |---|---|---| | `FORMAT` | 目标文件格式,例如 JSON, CSV, Parquet, ORC | **是** | | `PATTERN` | 使用正则匹配文件,例如 `*_today.parquet` | 可选 | +| `compression_type` | 导入文件的压缩算法。支持的值:`gzip`、`zstd`、`bzip2`、`xz`。仅支持 CSV 和 JSON 格式。 | 可选 | :::tip NOTE CSV 文件必须带有 header,包含表的列名。 From 87fea8dc87e61f42320066feed020b6d5addc9b2 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Tue, 18 Nov 2025 07:41:37 +0000 Subject: [PATCH 05/11] Polish documentation text for better readability Co-authored-by: WenyXu <32535939+WenyXu@users.noreply.github.com> --- docs/reference/sql/copy.md | 4 ++-- .../current/reference/sql/copy.md | 6 +++--- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/reference/sql/copy.md b/docs/reference/sql/copy.md index 653a9a2e8..e3b0fa67b 100644 --- a/docs/reference/sql/copy.md +++ b/docs/reference/sql/copy.md @@ -31,7 +31,7 @@ COPY tbl TO '/path/to/file.csv' WITH ( ); ``` -To export data to a compressed CSV or JSON file: +You can also export data to a compressed CSV or JSON file: ```sql COPY tbl TO '/path/to/file.csv.gz' WITH ( @@ -95,7 +95,7 @@ Specifically, if you only have one file to import, you can use the following syn COPY tbl FROM '/path/to/folder/xxx.parquet' WITH (FORMAT = 'parquet'); ``` -To import data from a compressed CSV or JSON file: +You can also import data from a compressed CSV or JSON file: ```sql COPY tbl FROM '/path/to/file.csv.gz' WITH ( diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/copy.md b/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/copy.md index 11d5c2aea..30a255e54 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/copy.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/copy.md @@ -27,7 +27,7 @@ COPY tbl TO '/path/to/file.csv' WITH ( ); ``` -导出数据到压缩的 CSV 或 JSON 文件: +也可以将数据导出为压缩的 CSV 或 JSON 文件: ```sql COPY tbl TO '/path/to/file.csv.gz' WITH ( @@ -88,7 +88,7 @@ COPY tbl FROM '/path/to/folder/' WITH (FORMAT = 'parquet', PATTERN = '.*parquet. COPY tbl FROM '/path/to/folder/xxx.parquet' WITH (FORMAT = 'parquet'); ``` -从压缩的 CSV 或 JSON 文件导入数据: +也可以从压缩的 CSV 或 JSON 文件导入数据: ```sql COPY tbl FROM '/path/to/file.csv.gz' WITH ( @@ -182,7 +182,7 @@ COPY () TO '' WITH (FORMAT = { 'CSV' | 'JSON' | 'PARQUET' }); COPY (SELECT * FROM tbl WHERE host = 'host1') TO '/path/to/file.csv' WITH (FORMAT = 'csv'); ``` -也可以将查询结果导出到压缩文件: +也可以将查询结果导出为压缩文件: ```sql COPY (SELECT * FROM tbl WHERE host = 'host1') TO '/path/to/file.json.gz' WITH ( From d603d8645d711ce9185e89efa2563966ff252bc3 Mon Sep 17 00:00:00 2001 From: Weny Xu Date: Tue, 18 Nov 2025 15:55:31 +0800 Subject: [PATCH 06/11] Apply suggestions from code review Co-authored-by: jeremyhi --- .../current/reference/sql/copy.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/copy.md b/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/copy.md index 30a255e54..6eca9cb81 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/copy.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/copy.md @@ -17,7 +17,7 @@ COPY tbl TO '/xxx/xxx/output.parquet' WITH (FORMAT = 'parquet'); 命令以 `COPY` 关键字开始,后面跟着要导出数据的表名(本例中为 `tbl`)。 `TO` 指定导出数据的文件路径和名称(本例中为 `/xxx/xxx/output.parquet`)。 -例如,可以使用自定义时间戳和日期格式导出数据到 CSV 文件: +例如,可以使用自定义时间戳和日期格式导出数据到 CSV 文件: ```sql COPY tbl TO '/path/to/file.csv' WITH ( From baa31a34cfbf805fcf77ef6eac6163fa2562abf8 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Tue, 18 Nov 2025 08:04:35 +0000 Subject: [PATCH 07/11] Add note about file extension matching compression type Co-authored-by: WenyXu <32535939+WenyXu@users.noreply.github.com> --- docs/reference/sql/copy.md | 12 ++++++++++++ .../current/reference/sql/copy.md | 12 ++++++++++++ 2 files changed, 24 insertions(+) diff --git a/docs/reference/sql/copy.md b/docs/reference/sql/copy.md index e3b0fa67b..17b1c6bc4 100644 --- a/docs/reference/sql/copy.md +++ b/docs/reference/sql/copy.md @@ -40,6 +40,10 @@ COPY tbl TO '/path/to/file.csv.gz' WITH ( ); ``` +:::tip NOTE +When using compression, ensure the file extension matches the compression type: `.gz` for gzip, `.zst` for zstd, `.bz2` for bzip2, and `.xz` for xz. +::: + #### `WITH` Option `WITH` adds options such as the file `FORMAT` which specifies the format of the exported file. In this example, the format is Parquet; it is a columnar storage format used for big data processing. Parquet efficiently compresses and encodes columnar data for big data analytics. @@ -104,6 +108,10 @@ COPY tbl FROM '/path/to/file.csv.gz' WITH ( ); ``` +:::tip NOTE +When using compression, ensure the file extension matches the compression type: `.gz` for gzip, `.zst` for zstd, `.bz2` for bzip2, and `.xz` for xz. +::: + | Option | Description | Required | |---|---|---| | `FORMAT` | Target file(s) format, e.g., JSON, CSV, Parquet, ORC | **Required** | @@ -198,6 +206,10 @@ COPY (SELECT * FROM tbl WHERE host = 'host1') TO '/path/to/file.json.gz' WITH ( ); ``` +:::tip NOTE +When using compression, ensure the file extension matches the compression type: `.gz` for gzip, `.zst` for zstd, `.bz2` for bzip2, and `.xz` for xz. +::: + You can also specify custom date and time formats when exporting to CSV: ```sql diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/copy.md b/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/copy.md index 6eca9cb81..9473671af 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/copy.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/copy.md @@ -36,6 +36,10 @@ COPY tbl TO '/path/to/file.csv.gz' WITH ( ); ``` +:::tip NOTE +使用压缩时,请确保文件扩展名与压缩类型匹配:gzip 使用 `.gz`,zstd 使用 `.zst`,bzip2 使用 `.bz2`,xz 使用 `.xz`。 +::: + #### `WITH` 选项 `WITH` 可以添加一些选项,比如文件的 `FORMAT` 用来指定导出文件的格式。本例中的格式为 Parquet,它是一种用于大数据处理的列式存储格式。Parquet 为大数据分析高效地压缩和编码列式数据。 @@ -97,6 +101,10 @@ COPY tbl FROM '/path/to/file.csv.gz' WITH ( ); ``` +:::tip NOTE +使用压缩时,请确保文件扩展名与压缩类型匹配:gzip 使用 `.gz`,zstd 使用 `.zst`,bzip2 使用 `.bz2`,xz 使用 `.xz`。 +::: + | 选项 | 描述 | 是否必需 | |---|---|---| | `FORMAT` | 目标文件格式,例如 JSON, CSV, Parquet, ORC | **是** | @@ -191,6 +199,10 @@ COPY (SELECT * FROM tbl WHERE host = 'host1') TO '/path/to/file.json.gz' WITH ( ); ``` +:::tip NOTE +使用压缩时,请确保文件扩展名与压缩类型匹配:gzip 使用 `.gz`,zstd 使用 `.zst`,bzip2 使用 `.bz2`,xz 使用 `.xz`。 +::: + 也可以在导出到 CSV 时指定自定义日期和时间格式: ```sql From 4becc1ab2d3d53ad02b624aca926b3d6a13d0414 Mon Sep 17 00:00:00 2001 From: Weny Xu Date: Tue, 18 Nov 2025 16:09:00 +0800 Subject: [PATCH 08/11] Update docs/reference/sql/copy.md --- docs/reference/sql/copy.md | 4 ---- 1 file changed, 4 deletions(-) diff --git a/docs/reference/sql/copy.md b/docs/reference/sql/copy.md index 17b1c6bc4..5b49dac94 100644 --- a/docs/reference/sql/copy.md +++ b/docs/reference/sql/copy.md @@ -108,10 +108,6 @@ COPY tbl FROM '/path/to/file.csv.gz' WITH ( ); ``` -:::tip NOTE -When using compression, ensure the file extension matches the compression type: `.gz` for gzip, `.zst` for zstd, `.bz2` for bzip2, and `.xz` for xz. -::: - | Option | Description | Required | |---|---|---| | `FORMAT` | Target file(s) format, e.g., JSON, CSV, Parquet, ORC | **Required** | From acffa7344919ac0e5804e6afbaa424503fef3fc1 Mon Sep 17 00:00:00 2001 From: Weny Xu Date: Tue, 18 Nov 2025 16:09:11 +0800 Subject: [PATCH 09/11] Update docs/reference/sql/copy.md --- docs/reference/sql/copy.md | 4 ---- 1 file changed, 4 deletions(-) diff --git a/docs/reference/sql/copy.md b/docs/reference/sql/copy.md index 5b49dac94..e8a396493 100644 --- a/docs/reference/sql/copy.md +++ b/docs/reference/sql/copy.md @@ -202,10 +202,6 @@ COPY (SELECT * FROM tbl WHERE host = 'host1') TO '/path/to/file.json.gz' WITH ( ); ``` -:::tip NOTE -When using compression, ensure the file extension matches the compression type: `.gz` for gzip, `.zst` for zstd, `.bz2` for bzip2, and `.xz` for xz. -::: - You can also specify custom date and time formats when exporting to CSV: ```sql From 2eccafd44321f2c808e46bf4a3a0cf683faaf783 Mon Sep 17 00:00:00 2001 From: Weny Xu Date: Tue, 18 Nov 2025 16:09:22 +0800 Subject: [PATCH 10/11] Update i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/copy.md --- .../current/reference/sql/copy.md | 4 ---- 1 file changed, 4 deletions(-) diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/copy.md b/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/copy.md index 9473671af..21cdc5aa0 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/copy.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/copy.md @@ -101,10 +101,6 @@ COPY tbl FROM '/path/to/file.csv.gz' WITH ( ); ``` -:::tip NOTE -使用压缩时,请确保文件扩展名与压缩类型匹配:gzip 使用 `.gz`,zstd 使用 `.zst`,bzip2 使用 `.bz2`,xz 使用 `.xz`。 -::: - | 选项 | 描述 | 是否必需 | |---|---|---| | `FORMAT` | 目标文件格式,例如 JSON, CSV, Parquet, ORC | **是** | From b015eaa193653006a3362a1c0c8a68ca282ac66c Mon Sep 17 00:00:00 2001 From: Weny Xu Date: Tue, 18 Nov 2025 16:09:31 +0800 Subject: [PATCH 11/11] Update i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/copy.md --- .../current/reference/sql/copy.md | 4 ---- 1 file changed, 4 deletions(-) diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/copy.md b/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/copy.md index 21cdc5aa0..e5c8b35ce 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/copy.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/copy.md @@ -195,10 +195,6 @@ COPY (SELECT * FROM tbl WHERE host = 'host1') TO '/path/to/file.json.gz' WITH ( ); ``` -:::tip NOTE -使用压缩时,请确保文件扩展名与压缩类型匹配:gzip 使用 `.gz`,zstd 使用 `.zst`,bzip2 使用 `.bz2`,xz 使用 `.xz`。 -::: - 也可以在导出到 CSV 时指定自定义日期和时间格式: ```sql