Skip to content

Support Reading data from Databricks Delta #20188

@damccorm

Description

@damccorm

Databricks Delta is an open source storage layer on top of different filesystems. The current implementation of Delta is strongly coupled with Spark so we cannot rely on it because it would break Beam portability.

However now there is an open specification for Delta's protocol.
https://github.com/delta-io/delta/blob/master/PROTOCOL.md

Another possible approach could be to investigate how if Beam could use a manifest based approach like Presto does:
https://docs.databricks.com/delta/presto-integration.html

Imported from Jira BEAM-10159. Original Jira may contain additional context.
Reported by: iemejia.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions