Skip to content

Move Iceberg-only API classes to iceberg-api module #3241

@andygrove

Description

@andygrove

Summary

Move classes that are exclusively used by Iceberg (not used by any non-@IcebergApi Comet code) from common into the new iceberg-api module, added in #3240. This will improve the separation of concerns and make the Iceberg API boundary clearer.

Classes to Move

The following 4 classes are only referenced by other @IcebergApi classes and exist purely for Iceberg integration:

Class Current Location Purpose
ParquetColumnSpec common/src/main/java/org/apache/comet/parquet/ Bridge class to pass column descriptors between Comet and Iceberg (since Iceberg shades Parquet)
RowGroupReader common/src/main/java/org/apache/comet/parquet/ PageReadStore implementation for row group reading
WrappedInputFile common/src/main/java/org/apache/comet/parquet/ Wraps Iceberg's InputFile to implement Parquet's InputFile interface
AbstractCometSchemaImporter common/src/main/java/org/apache/arrow/c/ Base class for schema import functionality

Important Considerations

Packaging/Shading Concerns

⚠️ This change requires careful attention to packaging and shading:

  1. Iceberg shades Parquet - ParquetColumnSpec exists specifically because Iceberg uses shaded Parquet classes. Any module structure changes must preserve this compatibility.

  2. JAR structure - Ensure the classes remain accessible to Iceberg after the move. The final packaged JAR must include these classes in a location Iceberg can access.

  3. Maven shade plugin configuration - Review and update shade plugin configuration if needed to ensure:

    • Classes are included in the shaded JAR
    • Package relocations don't break Iceberg's access
    • No duplicate classes across modules
  4. Dependency direction - The iceberg-public-api module currently only contains tests. If we move source classes there, we need to:

    • Update module dependencies
    • Ensure common module can still depend on these classes if needed
    • Consider if a separate iceberg-api module (non-test) is more appropriate

Testing Requirements

  • Verify Iceberg integration tests pass after the move
  • Verify shaded JAR contains all necessary classes
  • Test with actual Iceberg to ensure no classloader issues
  • Run iceberg-public-api module tests

Alternative Approaches

  1. Keep in common, just document - Leave classes in common but document they're Iceberg-specific
  2. New iceberg-api source module - Create a separate source module (not test-only) for Iceberg API classes
  3. Subpackage organization - Move to org.apache.comet.iceberg.* package within common

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions