Skip to content

Add Apache XTable provider #40962

@gyli

Description

@gyli

Description

Apache XTable translates metadata among datalakes, allowing users to read from datalake with the tools don't have native support.
XTable can be executed with command like

java -jar xtable-utilities/target/xtable-utilities-0.1.0-SNAPSHOT-bundled.jar --datasetConfig my_config.yaml [--hadoopConfig hdfs-site.xml] [--convertersConfig converters.yaml] [--icebergCatalogConfig catalog.yaml]

An Airflow operator can be created to wrap this command and provide both file and dict input for those XTable config in YAML files.

Use case/motivation

AWS provides an example XTableOperator for XTable. This blog has good explanation about the Open table formats XTable provides. While this example operator is essentially an MVP version, and serves as an MWAA plugin. We can create Apache XTable provider making it available for more Airflow users, and providing more flexible user input.

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind:featureFeature Requestskind:new provider requestlabel to mark request for adding new providerneeds-triagelabel for new issues that we didn't triage yet

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions