Apache Paimon has a strong dependency on the Hadoop Java environment, which consequently affects pypaimon. In contrast, Apache Iceberg handles this differently - For example, InMemoryCatalog doesn't have any external dependencies, and only HadoopCatalog requires Hadoop.
This leads to two constraints for pypaimon:
- Usage Complexity: Users must set up and maintain a Hadoop environment even for simple use cases.
- Testing Limitations: When integrating pypaimon into external framework, writing unit tests becomes impossible due to the mandatory Hadoop Java dependency.
We propose to provide optional, built-in Hadoop Java dependencies in pypaimon to address these issues while maintaining the flexibility for users to configure their own Hadoop environment.