Skip to content

[Enhancement] Provide Optional Built-in Hadoop Dependencies #40

@chenghuichen

Description

@chenghuichen

Apache Paimon has a strong dependency on the Hadoop Java environment, which consequently affects pypaimon. In contrast, Apache Iceberg handles this differently - For example, InMemoryCatalog doesn't have any external dependencies, and only HadoopCatalog requires Hadoop.

This leads to two constraints for pypaimon:

  • Usage Complexity: Users must set up and maintain a Hadoop environment even for simple use cases.
  • Testing Limitations: When integrating pypaimon into external framework, writing unit tests becomes impossible due to the mandatory Hadoop Java dependency.

We propose to provide optional, built-in Hadoop Java dependencies in pypaimon to address these issues while maintaining the flexibility for users to configure their own Hadoop environment.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions