Support Parquet as export file format for COPY TO
#15955
Labels
need refined description
A maintainer should refine the description and clarify the scope
needs concrete use-case
needs info or feedback
Problem Statement
CrateDB's current export functionality is limited to JSON format, which results in the loss of type information and suboptimal data handling. The COPY TO command lacks support for exporting data in Parquet format, a standard that preserves type information and facilitates efficient data handling. This absence limits interoperability with data engineering workflows and libraries reliant on Parquet's schema structure. Addressing this gap by introducing Parquet support in the COPY TO command would enable seamless integration with various tools and frameworks. Also if read support for Parquet files would be added either directly or via e.g. 3rd party FDW implementations this could be used for archival and/or cold storage use cases.
Possible Solutions
Support
COPY quotes TO DIRECTORY '/tmp/' with (FORMAT PARQUET);
to export table partitions into parquet files, similar to the existing JSON export.Setting a compression mechanism used (zstd, gzip, snappy) and the size of row groups should also be possible
The CrateDB data types should be mapped to appropriate parquet data types e.g. (incomplete)
Considered Alternatives
.json
then transform to.parquet
The text was updated successfully, but these errors were encountered: