diff --git a/docs/reference/sql/copy.md b/docs/reference/sql/copy.md index d0cdfa6f6..e8a396493 100644 --- a/docs/reference/sql/copy.md +++ b/docs/reference/sql/copy.md @@ -31,6 +31,19 @@ COPY tbl TO '/path/to/file.csv' WITH ( ); ``` +You can also export data to a compressed CSV or JSON file: + +```sql +COPY tbl TO '/path/to/file.csv.gz' WITH ( + FORMAT = 'csv', + compression_type = 'gzip' +); +``` + +:::tip NOTE +When using compression, ensure the file extension matches the compression type: `.gz` for gzip, `.zst` for zstd, `.bz2` for bzip2, and `.xz` for xz. +::: + #### `WITH` Option `WITH` adds options such as the file `FORMAT` which specifies the format of the exported file. In this example, the format is Parquet; it is a columnar storage format used for big data processing. Parquet efficiently compresses and encodes columnar data for big data analytics. @@ -39,6 +52,7 @@ COPY tbl TO '/path/to/file.csv' WITH ( |---|---|---| | `FORMAT` | Target file(s) format, e.g., JSON, CSV, Parquet | **Required** | | `START_TIME`/`END_TIME`| The time range within which data should be exported. `START_TIME` is inclusive and `END_TIME` is exclusive. | Optional | +| `compression_type` | Compression algorithm for the exported file. Supported values: `gzip`, `zstd`, `bzip2`, `xz`. Only supported for CSV and JSON formats. | Optional | | `TIMESTAMP_FORMAT` | Custom format for timestamp columns when exporting to CSV format. Uses [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) format specifiers (e.g., `'%Y-%m-%d %H:%M:%S'`). Only supported for CSV format. | Optional | | `DATE_FORMAT` | Custom format for date columns when exporting to CSV format. Uses [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) format specifiers (e.g., `'%Y-%m-%d'`). Only supported for CSV format. | Optional | | `TIME_FORMAT` | Custom format for time columns when exporting to CSV format. Uses [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) format specifiers (e.g., `'%H:%M:%S'`). Only supported for CSV format. | Optional | @@ -85,10 +99,20 @@ Specifically, if you only have one file to import, you can use the following syn COPY tbl FROM '/path/to/folder/xxx.parquet' WITH (FORMAT = 'parquet'); ``` +You can also import data from a compressed CSV or JSON file: + +```sql +COPY tbl FROM '/path/to/file.csv.gz' WITH ( + FORMAT = 'csv', + compression_type = 'gzip' +); +``` + | Option | Description | Required | |---|---|---| | `FORMAT` | Target file(s) format, e.g., JSON, CSV, Parquet, ORC | **Required** | | `PATTERN` | Use regex to match files. e.g., `*_today.parquet` | Optional | +| `compression_type` | Compression algorithm for the imported file. Supported values: `gzip`, `zstd`, `bzip2`, `xz`. Only supported for CSV and JSON formats. | Optional | :::tip NOTE The CSV file must have a header row to be imported correctly. The header row should contain the column names of the table. @@ -158,6 +182,7 @@ COPY () TO '' WITH (FORMAT = { 'CSV' | 'JSON' | 'PARQUET' }); | `QUERY` | The SQL SELECT statement to execute | **Required** | | `PATH` | The file path where the output will be written | **Required** | | `FORMAT` | The output file format: 'CSV', 'JSON', or 'PARQUET' | **Required** | +| `compression_type` | Compression algorithm for the exported file. Supported values: `gzip`, `zstd`, `bzip2`, `xz`. Only supported for CSV and JSON formats. | Optional | | `TIMESTAMP_FORMAT` | Custom format for timestamp columns when exporting to CSV format. Uses [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) format specifiers. Only supported for CSV format. | Optional | | `DATE_FORMAT` | Custom format for date columns when exporting to CSV format. Uses [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) format specifiers. Only supported for CSV format. | Optional | | `TIME_FORMAT` | Custom format for time columns when exporting to CSV format. Uses [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) format specifiers. Only supported for CSV format. | Optional | @@ -168,6 +193,15 @@ For example, the following statement exports query results to a CSV file: COPY (SELECT * FROM tbl WHERE host = 'host1') TO '/path/to/file.csv' WITH (FORMAT = 'csv'); ``` +You can also export query results to a compressed file: + +```sql +COPY (SELECT * FROM tbl WHERE host = 'host1') TO '/path/to/file.json.gz' WITH ( + FORMAT = 'json', + compression_type = 'gzip' +); +``` + You can also specify custom date and time formats when exporting to CSV: ```sql diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/copy.md b/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/copy.md index bdae6e7a5..e5c8b35ce 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/copy.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/copy.md @@ -27,6 +27,19 @@ COPY tbl TO '/path/to/file.csv' WITH ( ); ``` +也可以将数据导出为压缩的 CSV 或 JSON 文件: + +```sql +COPY tbl TO '/path/to/file.csv.gz' WITH ( + FORMAT = 'csv', + compression_type = 'gzip' +); +``` + +:::tip NOTE +使用压缩时,请确保文件扩展名与压缩类型匹配:gzip 使用 `.gz`,zstd 使用 `.zst`,bzip2 使用 `.bz2`,xz 使用 `.xz`。 +::: + #### `WITH` 选项 `WITH` 可以添加一些选项,比如文件的 `FORMAT` 用来指定导出文件的格式。本例中的格式为 Parquet,它是一种用于大数据处理的列式存储格式。Parquet 为大数据分析高效地压缩和编码列式数据。 @@ -35,6 +48,7 @@ COPY tbl TO '/path/to/file.csv' WITH ( |---|---|---| | `FORMAT` | 目标文件格式,例如 JSON, CSV, Parquet | **是** | | `START_TIME`/`END_TIME`| 需要导出数据的时间范围,时间范围为左闭右开 | 可选 | +| `compression_type` | 导出文件的压缩算法。支持的值:`gzip`、`zstd`、`bzip2`、`xz`。仅支持 CSV 和 JSON 格式。 | 可选 | | `TIMESTAMP_FORMAT` | 导出 CSV 格式时自定义时间戳列的格式。使用 [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) 格式说明符(例如 `'%Y-%m-%d %H:%M:%S'`)。仅支持 CSV 格式。 | 可选 | | `DATE_FORMAT` | 导出 CSV 格式时自定义日期列的格式。使用 [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) 格式说明符(例如 `'%Y-%m-%d'`)。仅支持 CSV 格式。 | 可选 | | `TIME_FORMAT` | 导出 CSV 格式时自定义时间列的格式。使用 [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) 格式说明符(例如 `'%H:%M:%S'`)。仅支持 CSV 格式。 | 可选 | @@ -78,10 +92,20 @@ COPY tbl FROM '/path/to/folder/' WITH (FORMAT = 'parquet', PATTERN = '.*parquet. COPY tbl FROM '/path/to/folder/xxx.parquet' WITH (FORMAT = 'parquet'); ``` +也可以从压缩的 CSV 或 JSON 文件导入数据: + +```sql +COPY tbl FROM '/path/to/file.csv.gz' WITH ( + FORMAT = 'csv', + compression_type = 'gzip' +); +``` + | 选项 | 描述 | 是否必需 | |---|---|---| | `FORMAT` | 目标文件格式,例如 JSON, CSV, Parquet, ORC | **是** | | `PATTERN` | 使用正则匹配文件,例如 `*_today.parquet` | 可选 | +| `compression_type` | 导入文件的压缩算法。支持的值:`gzip`、`zstd`、`bzip2`、`xz`。仅支持 CSV 和 JSON 格式。 | 可选 | :::tip NOTE CSV 文件必须带有 header,包含表的列名。 @@ -151,6 +175,7 @@ COPY () TO '' WITH (FORMAT = { 'CSV' | 'JSON' | 'PARQUET' }); | `QUERY` | 要执行的 SQL SELECT 语句 | **是** | | `PATH` | 输出文件的路径 | **是** | | `FORMAT` | 输出文件格式:'CSV'、'JSON' 或 'PARQUET' | **是** | +| `compression_type` | 导出文件的压缩算法。支持的值:`gzip`、`zstd`、`bzip2`、`xz`。仅支持 CSV 和 JSON 格式。 | 可选 | | `TIMESTAMP_FORMAT` | 导出 CSV 格式时自定义时间戳列的格式。使用 [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) 格式说明符。仅支持 CSV 格式。 | 可选 | | `DATE_FORMAT` | 导出 CSV 格式时自定义日期列的格式。使用 [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) 格式说明符。仅支持 CSV 格式。 | 可选 | | `TIME_FORMAT` | 导出 CSV 格式时自定义时间列的格式。使用 [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) 格式说明符。仅支持 CSV 格式。 | 可选 | @@ -161,6 +186,15 @@ COPY () TO '' WITH (FORMAT = { 'CSV' | 'JSON' | 'PARQUET' }); COPY (SELECT * FROM tbl WHERE host = 'host1') TO '/path/to/file.csv' WITH (FORMAT = 'csv'); ``` +也可以将查询结果导出为压缩文件: + +```sql +COPY (SELECT * FROM tbl WHERE host = 'host1') TO '/path/to/file.json.gz' WITH ( + FORMAT = 'json', + compression_type = 'gzip' +); +``` + 也可以在导出到 CSV 时指定自定义日期和时间格式: ```sql