My current requirement is to have the following data pipeline:
PostgreSQL (Source)
Air byte
Minio - S3 storage (Destination)
Apache spark configure with (Minio and Delta lake formatting) since spark doesn’t support ACID transactions.
The goals to have air bye move data from PostgreSQL (Source) to Minio storage (Destination) saved in delta format. Spark then will come and read data from S3 expected to be with delta format.
My main issue with the output format for Air bye S3 connector. Currently is only supports 3 data types: CSV, Avro and JSON Lines (JSONL).
What is the recommend way to solve this problem? since I think, many companies are trying to build this data pipeline.
Is there plan to have this feature released in upcoming releases?
Should we implement this feature? If so, is there a good documentation of how to start about it?
Or, is there another method of going about it?
Thanks,
My current requirement is to have the following data pipeline:
PostgreSQL (Source)
Air byte
Minio - S3 storage (Destination)
Apache spark configure with (Minio and Delta lake formatting) since spark doesn’t support ACID transactions.
The goals to have air bye move data from PostgreSQL (Source) to Minio storage (Destination) saved in delta format. Spark then will come and read data from S3 expected to be with delta format.
My main issue with the output format for Air bye S3 connector. Currently is only supports 3 data types: CSV, Avro and JSON Lines (JSONL).
What is the recommend way to solve this problem? since I think, many companies are trying to build this data pipeline.
Is there plan to have this feature released in upcoming releases?
Should we implement this feature? If so, is there a good documentation of how to start about it?
Or, is there another method of going about it?
Thanks,