Skip to content
Merged
34 changes: 34 additions & 0 deletions docs/reference/sql/copy.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,19 @@ COPY tbl TO '/path/to/file.csv' WITH (
);
```

You can also export data to a compressed CSV or JSON file:

```sql
COPY tbl TO '/path/to/file.csv.gz' WITH (
FORMAT = 'csv',
compression_type = 'gzip'
);
```

:::tip NOTE
When using compression, ensure the file extension matches the compression type: `.gz` for gzip, `.zst` for zstd, `.bz2` for bzip2, and `.xz` for xz.
:::

#### `WITH` Option

`WITH` adds options such as the file `FORMAT` which specifies the format of the exported file. In this example, the format is Parquet; it is a columnar storage format used for big data processing. Parquet efficiently compresses and encodes columnar data for big data analytics.
Expand All @@ -39,6 +52,7 @@ COPY tbl TO '/path/to/file.csv' WITH (
|---|---|---|
| `FORMAT` | Target file(s) format, e.g., JSON, CSV, Parquet | **Required** |
| `START_TIME`/`END_TIME`| The time range within which data should be exported. `START_TIME` is inclusive and `END_TIME` is exclusive. | Optional |
| `compression_type` | Compression algorithm for the exported file. Supported values: `gzip`, `zstd`, `bzip2`, `xz`. Only supported for CSV and JSON formats. | Optional |
| `TIMESTAMP_FORMAT` | Custom format for timestamp columns when exporting to CSV format. Uses [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) format specifiers (e.g., `'%Y-%m-%d %H:%M:%S'`). Only supported for CSV format. | Optional |
| `DATE_FORMAT` | Custom format for date columns when exporting to CSV format. Uses [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) format specifiers (e.g., `'%Y-%m-%d'`). Only supported for CSV format. | Optional |
| `TIME_FORMAT` | Custom format for time columns when exporting to CSV format. Uses [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) format specifiers (e.g., `'%H:%M:%S'`). Only supported for CSV format. | Optional |
Expand Down Expand Up @@ -85,10 +99,20 @@ Specifically, if you only have one file to import, you can use the following syn
COPY tbl FROM '/path/to/folder/xxx.parquet' WITH (FORMAT = 'parquet');
```

You can also import data from a compressed CSV or JSON file:

```sql
COPY tbl FROM '/path/to/file.csv.gz' WITH (
FORMAT = 'csv',
compression_type = 'gzip'
);
```

| Option | Description | Required |
|---|---|---|
| `FORMAT` | Target file(s) format, e.g., JSON, CSV, Parquet, ORC | **Required** |
| `PATTERN` | Use regex to match files. e.g., `*_today.parquet` | Optional |
| `compression_type` | Compression algorithm for the imported file. Supported values: `gzip`, `zstd`, `bzip2`, `xz`. Only supported for CSV and JSON formats. | Optional |

:::tip NOTE
The CSV file must have a header row to be imported correctly. The header row should contain the column names of the table.
Expand Down Expand Up @@ -158,6 +182,7 @@ COPY (<QUERY>) TO '<PATH>' WITH (FORMAT = { 'CSV' | 'JSON' | 'PARQUET' });
| `QUERY` | The SQL SELECT statement to execute | **Required** |
| `PATH` | The file path where the output will be written | **Required** |
| `FORMAT` | The output file format: 'CSV', 'JSON', or 'PARQUET' | **Required** |
| `compression_type` | Compression algorithm for the exported file. Supported values: `gzip`, `zstd`, `bzip2`, `xz`. Only supported for CSV and JSON formats. | Optional |
| `TIMESTAMP_FORMAT` | Custom format for timestamp columns when exporting to CSV format. Uses [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) format specifiers. Only supported for CSV format. | Optional |
| `DATE_FORMAT` | Custom format for date columns when exporting to CSV format. Uses [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) format specifiers. Only supported for CSV format. | Optional |
| `TIME_FORMAT` | Custom format for time columns when exporting to CSV format. Uses [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) format specifiers. Only supported for CSV format. | Optional |
Expand All @@ -168,6 +193,15 @@ For example, the following statement exports query results to a CSV file:
COPY (SELECT * FROM tbl WHERE host = 'host1') TO '/path/to/file.csv' WITH (FORMAT = 'csv');
```

You can also export query results to a compressed file:

```sql
COPY (SELECT * FROM tbl WHERE host = 'host1') TO '/path/to/file.json.gz' WITH (
FORMAT = 'json',
compression_type = 'gzip'
);
```

You can also specify custom date and time formats when exporting to CSV:

```sql
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,19 @@ COPY tbl TO '/path/to/file.csv' WITH (
);
```

也可以将数据导出为压缩的 CSV 或 JSON 文件:

```sql
COPY tbl TO '/path/to/file.csv.gz' WITH (
FORMAT = 'csv',
compression_type = 'gzip'
);
```

:::tip NOTE
使用压缩时,请确保文件扩展名与压缩类型匹配:gzip 使用 `.gz`,zstd 使用 `.zst`,bzip2 使用 `.bz2`,xz 使用 `.xz`。
:::

#### `WITH` 选项

`WITH` 可以添加一些选项,比如文件的 `FORMAT` 用来指定导出文件的格式。本例中的格式为 Parquet,它是一种用于大数据处理的列式存储格式。Parquet 为大数据分析高效地压缩和编码列式数据。
Expand All @@ -35,6 +48,7 @@ COPY tbl TO '/path/to/file.csv' WITH (
|---|---|---|
| `FORMAT` | 目标文件格式,例如 JSON, CSV, Parquet | **是** |
| `START_TIME`/`END_TIME`| 需要导出数据的时间范围,时间范围为左闭右开 | 可选 |
| `compression_type` | 导出文件的压缩算法。支持的值:`gzip`、`zstd`、`bzip2`、`xz`。仅支持 CSV 和 JSON 格式。 | 可选 |
| `TIMESTAMP_FORMAT` | 导出 CSV 格式时自定义时间戳列的格式。使用 [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) 格式说明符(例如 `'%Y-%m-%d %H:%M:%S'`)。仅支持 CSV 格式。 | 可选 |
| `DATE_FORMAT` | 导出 CSV 格式时自定义日期列的格式。使用 [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) 格式说明符(例如 `'%Y-%m-%d'`)。仅支持 CSV 格式。 | 可选 |
| `TIME_FORMAT` | 导出 CSV 格式时自定义时间列的格式。使用 [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) 格式说明符(例如 `'%H:%M:%S'`)。仅支持 CSV 格式。 | 可选 |
Expand Down Expand Up @@ -78,10 +92,20 @@ COPY tbl FROM '/path/to/folder/' WITH (FORMAT = 'parquet', PATTERN = '.*parquet.
COPY tbl FROM '/path/to/folder/xxx.parquet' WITH (FORMAT = 'parquet');
```

也可以从压缩的 CSV 或 JSON 文件导入数据:

```sql
COPY tbl FROM '/path/to/file.csv.gz' WITH (
FORMAT = 'csv',
compression_type = 'gzip'
);
```

| 选项 | 描述 | 是否必需 |
|---|---|---|
| `FORMAT` | 目标文件格式,例如 JSON, CSV, Parquet, ORC | **是** |
| `PATTERN` | 使用正则匹配文件,例如 `*_today.parquet` | 可选 |
| `compression_type` | 导入文件的压缩算法。支持的值:`gzip`、`zstd`、`bzip2`、`xz`。仅支持 CSV 和 JSON 格式。 | 可选 |

:::tip NOTE
CSV 文件必须带有 header,包含表的列名。
Expand Down Expand Up @@ -151,6 +175,7 @@ COPY (<QUERY>) TO '<PATH>' WITH (FORMAT = { 'CSV' | 'JSON' | 'PARQUET' });
| `QUERY` | 要执行的 SQL SELECT 语句 | **是** |
| `PATH` | 输出文件的路径 | **是** |
| `FORMAT` | 输出文件格式:'CSV'、'JSON' 或 'PARQUET' | **是** |
| `compression_type` | 导出文件的压缩算法。支持的值:`gzip`、`zstd`、`bzip2`、`xz`。仅支持 CSV 和 JSON 格式。 | 可选 |
| `TIMESTAMP_FORMAT` | 导出 CSV 格式时自定义时间戳列的格式。使用 [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) 格式说明符。仅支持 CSV 格式。 | 可选 |
| `DATE_FORMAT` | 导出 CSV 格式时自定义日期列的格式。使用 [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) 格式说明符。仅支持 CSV 格式。 | 可选 |
| `TIME_FORMAT` | 导出 CSV 格式时自定义时间列的格式。使用 [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) 格式说明符。仅支持 CSV 格式。 | 可选 |
Expand All @@ -161,6 +186,15 @@ COPY (<QUERY>) TO '<PATH>' WITH (FORMAT = { 'CSV' | 'JSON' | 'PARQUET' });
COPY (SELECT * FROM tbl WHERE host = 'host1') TO '/path/to/file.csv' WITH (FORMAT = 'csv');
```

也可以将查询结果导出为压缩文件:

```sql
COPY (SELECT * FROM tbl WHERE host = 'host1') TO '/path/to/file.json.gz' WITH (
FORMAT = 'json',
compression_type = 'gzip'
);
```

也可以在导出到 CSV 时指定自定义日期和时间格式:

```sql
Expand Down