Skip to content

Destination S3: add delta lake/delta table support #16322

@mustafa-rmd

Description

My current requirement is to have the following data pipeline:
PostgreSQL (Source)
Air byte
Minio - S3 storage (Destination)
Apache spark configure with (Minio and Delta lake formatting) since spark doesn’t support ACID transactions.

The goals to have air bye move data from PostgreSQL (Source) to Minio storage (Destination) saved in delta format. Spark then will come and read data from S3 expected to be with delta format.

My main issue with the output format for Air bye S3 connector. Currently is only supports 3 data types: CSV, Avro and JSON Lines (JSONL).

What is the recommend way to solve this problem? since I think, many companies are trying to build this data pipeline.
Is there plan to have this feature released in upcoming releases?
Should we implement this feature? If so, is there a good documentation of how to start about it?
Or, is there another method of going about it?

Thanks,

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions