[feature] Enhance COPY INTO with multi-format support and advanced options

### Search before asking

- [x] I searched in the [issues](https://github.com/apache/paimon/issues) and found nothing similar.



### Motivation

In production environments, bulk data import and export is a common and critical operation — loading datasets from external storage into Paimon tables, or exporting table data to files for downstream consumption. The `COPY INTO` statement provides a declarative SQL interface for these operations without requiring users to write custom ETL pipelines.

The initial `COPY INTO` implementation was recently introduced with basic CSV support. This umbrella issue tracks the effort to extend it with multi-format support and advanced options, aligning with the commonly adopted capabilities in the industry.

### Planned Features

We plan to implement the following 7 capabilities, which are the most commonly adopted features in the industry, as separate PRs:

#### Format Support
- [ ] **JSON format** — Support `FILE_FORMAT = (TYPE = JSON)` for both import and export. Options: `MULTI_LINE`, `COMPRESSION`, `NULL_IF`, `EMPTY_FIELD_AS_NULL`. JSON import uses column-name matching by default (no positional dependency).
- [ ] **Parquet format** — Support `FILE_FORMAT = (TYPE = PARQUET)` for both import and export. Leverages Spark's native schema inference. Options: `COMPRESSION`.

#### Error Handling
- [ ] **ON_ERROR = CONTINUE / SKIP_FILE** — Currently only `ABORT_STATEMENT` is supported. `CONTINUE` skips bad rows and reports error counts per file. `SKIP_FILE` skips entire files that fail and reports per-file LOADED/LOAD_FAILED status.

#### Column Matching
- [ ] **MATCH_BY_COLUMN_NAME** — Support `NONE` (default, positional), `CASE_SENSITIVE`, and `CASE_INSENSITIVE` modes. Only applicable to structured formats (JSON/Parquet), not CSV. Mutually exclusive with explicit column lists.

#### Export Enhancement
- [ ] **FROM (SELECT ...) query export** — Currently export only supports `FROM table_name`. This adds support for arbitrary SQL queries as the data source for `COPY INTO <location>`.

#### Load Management
- [ ] **PURGE** — When `PURGE = TRUE`, automatically delete source files after successful loading. Best-effort deletion (failures are silently ignored). In `SKIP_FILE` mode, only successfully loaded files are purged.
- [ ] **VALIDATION_MODE** — Validate data without actually loading it. Supports `RETURN_<n>_ROWS` (preview n rows), `RETURN_ERRORS` (show first error per file), and `RETURN_ALL_ERRORS` (show all errors across all files).

### Proposed SQL Syntax

#### Import
```sql
COPY INTO table_name [(col1, col2, ...)]
FROM 'source_path'
FILE_FORMAT = (TYPE = CSV | JSON | PARQUET [, option = value, ...])
[PATTERN = 'regex']
[FORCE = TRUE | FALSE]
[ON_ERROR = ABORT_STATEMENT | CONTINUE | SKIP_FILE]
[MATCH_BY_COLUMN_NAME = NONE | CASE_SENSITIVE | CASE_INSENSITIVE]
[PURGE = TRUE | FALSE]
[VALIDATION_MODE = RETURN_n_ROWS | RETURN_ERRORS | RETURN_ALL_ERRORS]
```

#### Export
```sql
COPY INTO 'target_path'
FROM table_name | ('SELECT ...')
FILE_FORMAT = (TYPE = CSV | JSON | PARQUET [, option = value, ...])
[OVERWRITE = TRUE | FALSE]
```

### Anything else?

_No response_

### Are you willing to submit a PR?

- [x] I'm willing to submit a PR!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature] Enhance COPY INTO with multi-format support and advanced options #8005

Search before asking

Motivation

Planned Features

Format Support

Error Handling

Column Matching

Export Enhancement

Load Management

Proposed SQL Syntax

Import

Export

Anything else?

Are you willing to submit a PR?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[feature] Enhance COPY INTO with multi-format support and advanced options #8005

Description

Search before asking

Motivation

Planned Features

Format Support

Error Handling

Column Matching

Export Enhancement

Load Management

Proposed SQL Syntax

Import

Export

Anything else?

Are you willing to submit a PR?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions