Skip to content

Conversation

@andygrove
Copy link
Member

@andygrove andygrove commented Jan 22, 2026

Rationale

Iceberg depends on Comet, therefore Comet has a public API that needs to be maintained. This isn't documented very well.

This PR adds an @IcebergApi annotation to the public API.

There is also a new page in the contributor guide documenting this API.

I generated this information by analyzing the latest from Iceberg's main branch.

🤖 Generated with Claude Code

andygrove and others added 2 commits January 21, 2026 17:24
Add documentation detailing all Comet classes and methods that form
the public API used by Apache Iceberg. This helps contributors
understand which APIs may affect Iceberg integration and need
backward compatibility considerations.

The documentation covers:
- org.apache.comet.parquet: FileReader, RowGroupReader, ReadOptions,
  ParquetColumnSpec, column readers, BatchReader, Native JNI methods
- org.apache.comet: CometSchemaImporter
- org.apache.comet.vector: CometVector
- org.apache.comet.shaded.arrow: RootAllocator, ValueVector

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add a custom Java annotation @IcebergApi to mark all classes, methods,
constructors, and fields that form the public API used by Apache Iceberg.
This makes it easy to identify which APIs need backward compatibility
considerations when making changes.

The annotation is applied to:
- org.apache.comet.parquet: FileReader, RowGroupReader, ReadOptions,
  WrappedInputFile, ParquetColumnSpec, AbstractColumnReader, ColumnReader,
  BatchReader, MetadataColumnReader, ConstantColumnReader, Native, TypeUtil,
  Utils
- org.apache.comet: CometSchemaImporter
- org.apache.comet.vector: CometVector

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
andygrove and others added 2 commits January 21, 2026 17:35
Add annotations to:
- AbstractColumnReader.nativeHandle (protected field accessed by Iceberg
  subclasses)
- AbstractCometSchemaImporter.close() (called by Iceberg)

Also update documentation to include these APIs.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@codecov-commenter
Copy link

codecov-commenter commented Jan 22, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 60.04%. Comparing base (f09f8af) to head (7b58965).
⚠️ Report is 864 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3237      +/-   ##
============================================
+ Coverage     56.12%   60.04%   +3.92%     
- Complexity      976     1429     +453     
============================================
  Files           119      170      +51     
  Lines         11743    15774    +4031     
  Branches       2251     2606     +355     
============================================
+ Hits           6591     9472    +2881     
- Misses         4012     4981     +969     
- Partials       1140     1321     +181     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

import org.apache.comet.vector.CometVector;

/** Base class for Comet Parquet column reader implementations. */
@IcebergApi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we annotate the class with @IcebergApi, do we still need @IcebergApi in L66, L101 and L119?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing this @huaxingao. My plan was to annotate every class, method, and field referenced from Iceberg. Some methods are not referenced, so it would be safe to modify them without affecting Iceberg compatibility. The idea is that we know not to modify anything with the @IcebergApi annotation without careful consideration.

Add a new Maven module containing dedicated unit tests for all @IcebergApi
annotated classes, ensuring the public API contract with Apache Iceberg
remains stable and tested.

Key changes:
- Add iceberg-public-api module with 169 tests covering all @IcebergApi classes
- Fix CometVector constructor visibility (protected -> public) to match API annotation
- Add IcebergApiVerificationTest for reflection-based API verification
- Add tests for FileReader, BatchReader, ColumnReader, Native, TypeUtil, Utils
- Add tests for CometVector, CometSchemaImporter, WrappedInputFile

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@martin-g
Copy link
Member

How this will be maintained in the future ?
Is it expected that the Iceberg maintainers and contributors will open a PR here every time they start using a method/class from Comet that is not yet annotated with @IcebergApi? Or to remove @IcebergApi when they stop using some annotated method ?
Or it will be Comet's devs responsibility to "scan" Iceberg code for new/obsolete usage ?

@andygrove
Copy link
Member Author

How this will be maintained in the future ? Is it expected that the Iceberg maintainers and contributors will open a PR here every time they start using a method/class from Comet that is not yet annotated with @IcebergApi? Or to remove @IcebergApi when they stop using some annotated method ? Or it will be Comet's devs responsibility to "scan" Iceberg code for new/obsolete usage ?

The next step is #3240 which adds unit tests to ensure that the API is as expected. This will help prevent accidental changes to the current API methods.

We should probably update the release process documentation to suggest auditing the Iceberg API use before creating a new release.

// Methods used by Iceberg
public void setRequestedSchemaFromSpecs(List<ParquetColumnSpec> specs)
public RowGroupReader readNextRowGroup() throws IOException
public void skipNextRowGroup()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public void skipNextRowGroup()
public boolean skipNextRowGroup()


```java
// Constructor
public WrappedInputFile(org.apache.iceberg.io.InputFile inputFile)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually accepts java.lang.Object

@After
public void tearDown() throws IOException {
if (tempDir != null && Files.exists(tempDir)) {
Files.walk(tempDir).sorted(Comparator.reverseOrder()).map(Path::toFile).forEach(File::delete);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Files.walk(tempDir).sorted(Comparator.reverseOrder()).map(Path::toFile).forEach(File::delete);
try (Stream<Path> stream : Files.walk(tempDir)) {
stream.sorted(Comparator.reverseOrder()).map(Path::toFile).forEach(File::delete);
}

https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/nio/file/Files.html#walk(java.nio.file.Path,java.nio.file.FileVisitOption...) says:

The returned stream contains references to one or more open directories. The directories are closed by closing the stream.

/** Checks if native library is available. */
protected static boolean isNativeLibraryAvailable() {
try {
Class.forName("org.apache.comet.NativeBase");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not guarantee that the native library is available and loadable.

}

/**
* @deprecated since 0.10.0, will be removed in 0.11.0.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it should be undeprecated ?!
Currently is is both deprecated and important (used by Iceberg)

@Test
public void testCurrentBatchMethodExists() throws NoSuchMethodException {
Method method = BatchReader.class.getMethod("currentBatch");
assertThat(method).isNotNull();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assertThat(method).isNotNull();

@Test
public void testCloseMethodExists() throws NoSuchMethodException {
Method method = BatchReader.class.getMethod("close");
assertThat(method).isNotNull();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assertThat(method).isNotNull();

// Methods used by Iceberg
public static AbstractColumnReader getColumnReader(
DataType sparkType,
ColumnDescriptor descriptor,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter types do not match with https://github.com/apache/datafusion-comet/pull/3237/changes#diff-54de18a6f3ec3c2944f1628012f8c0b0af863da30419a8bc989eb6cd8ccb8cd1R39-R46

Suggested change
ColumnDescriptor descriptor,
ParquetColumnSpec columnSpec,

CometSchemaImporter importer,
int batchSize,
boolean useDecimal128,
boolean isConstant
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
boolean isConstant
boolean useLazyMaterialization,
boolean useLegacyTimestamp


1. Creates `CometSchemaImporter` with a `RootAllocator`
2. Uses `Utils.getColumnReader()` to create appropriate column readers
3. Calls `reset()` and `setPageReader()` for each row group
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which reset() method ?
Maybe:

Suggested change
3. Calls `reset()` and `setPageReader()` for each row group
3. Calls `Native.resetBatch()` and `setPageReader()` for each row group

?!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants