-
Notifications
You must be signed in to change notification settings - Fork 272
docs: add Iceberg public API documentation to contributor guide #3237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add documentation detailing all Comet classes and methods that form the public API used by Apache Iceberg. This helps contributors understand which APIs may affect Iceberg integration and need backward compatibility considerations. The documentation covers: - org.apache.comet.parquet: FileReader, RowGroupReader, ReadOptions, ParquetColumnSpec, column readers, BatchReader, Native JNI methods - org.apache.comet: CometSchemaImporter - org.apache.comet.vector: CometVector - org.apache.comet.shaded.arrow: RootAllocator, ValueVector Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add a custom Java annotation @IcebergApi to mark all classes, methods, constructors, and fields that form the public API used by Apache Iceberg. This makes it easy to identify which APIs need backward compatibility considerations when making changes. The annotation is applied to: - org.apache.comet.parquet: FileReader, RowGroupReader, ReadOptions, WrappedInputFile, ParquetColumnSpec, AbstractColumnReader, ColumnReader, BatchReader, MetadataColumnReader, ConstantColumnReader, Native, TypeUtil, Utils - org.apache.comet: CometSchemaImporter - org.apache.comet.vector: CometVector Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add annotations to: - AbstractColumnReader.nativeHandle (protected field accessed by Iceberg subclasses) - AbstractCometSchemaImporter.close() (called by Iceberg) Also update documentation to include these APIs. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3237 +/- ##
============================================
+ Coverage 56.12% 60.04% +3.92%
- Complexity 976 1429 +453
============================================
Files 119 170 +51
Lines 11743 15774 +4031
Branches 2251 2606 +355
============================================
+ Hits 6591 9472 +2881
- Misses 4012 4981 +969
- Partials 1140 1321 +181 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| import org.apache.comet.vector.CometVector; | ||
|
|
||
| /** Base class for Comet Parquet column reader implementations. */ | ||
| @IcebergApi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we annotate the class with @IcebergApi, do we still need @IcebergApi in L66, L101 and L119?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reviewing this @huaxingao. My plan was to annotate every class, method, and field referenced from Iceberg. Some methods are not referenced, so it would be safe to modify them without affecting Iceberg compatibility. The idea is that we know not to modify anything with the @IcebergApi annotation without careful consideration.
Add a new Maven module containing dedicated unit tests for all @IcebergApi annotated classes, ensuring the public API contract with Apache Iceberg remains stable and tested. Key changes: - Add iceberg-public-api module with 169 tests covering all @IcebergApi classes - Fix CometVector constructor visibility (protected -> public) to match API annotation - Add IcebergApiVerificationTest for reflection-based API verification - Add tests for FileReader, BatchReader, ColumnReader, Native, TypeUtil, Utils - Add tests for CometVector, CometSchemaImporter, WrappedInputFile Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
How this will be maintained in the future ? |
The next step is #3240 which adds unit tests to ensure that the API is as expected. This will help prevent accidental changes to the current API methods. We should probably update the release process documentation to suggest auditing the Iceberg API use before creating a new release. |
| // Methods used by Iceberg | ||
| public void setRequestedSchemaFromSpecs(List<ParquetColumnSpec> specs) | ||
| public RowGroupReader readNextRowGroup() throws IOException | ||
| public void skipNextRowGroup() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| public void skipNextRowGroup() | |
| public boolean skipNextRowGroup() |
|
|
||
| ```java | ||
| // Constructor | ||
| public WrappedInputFile(org.apache.iceberg.io.InputFile inputFile) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
datafusion-comet/common/src/main/java/org/apache/comet/parquet/WrappedInputFile.java
Line 40 in 640dd03
| public WrappedInputFile(Object inputFile) { |
java.lang.Object
| @After | ||
| public void tearDown() throws IOException { | ||
| if (tempDir != null && Files.exists(tempDir)) { | ||
| Files.walk(tempDir).sorted(Comparator.reverseOrder()).map(Path::toFile).forEach(File::delete); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Files.walk(tempDir).sorted(Comparator.reverseOrder()).map(Path::toFile).forEach(File::delete); | |
| try (Stream<Path> stream : Files.walk(tempDir)) { | |
| stream.sorted(Comparator.reverseOrder()).map(Path::toFile).forEach(File::delete); | |
| } |
The returned stream contains references to one or more open directories. The directories are closed by closing the stream.
| /** Checks if native library is available. */ | ||
| protected static boolean isNativeLibraryAvailable() { | ||
| try { | ||
| Class.forName("org.apache.comet.NativeBase"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not guarantee that the native library is available and loadable.
| } | ||
|
|
||
| /** | ||
| * @deprecated since 0.10.0, will be removed in 0.11.0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it should be undeprecated ?!
Currently is is both deprecated and important (used by Iceberg)
| @Test | ||
| public void testCurrentBatchMethodExists() throws NoSuchMethodException { | ||
| Method method = BatchReader.class.getMethod("currentBatch"); | ||
| assertThat(method).isNotNull(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| assertThat(method).isNotNull(); |
| @Test | ||
| public void testCloseMethodExists() throws NoSuchMethodException { | ||
| Method method = BatchReader.class.getMethod("close"); | ||
| assertThat(method).isNotNull(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| assertThat(method).isNotNull(); |
| // Methods used by Iceberg | ||
| public static AbstractColumnReader getColumnReader( | ||
| DataType sparkType, | ||
| ColumnDescriptor descriptor, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parameter types do not match with https://github.com/apache/datafusion-comet/pull/3237/changes#diff-54de18a6f3ec3c2944f1628012f8c0b0af863da30419a8bc989eb6cd8ccb8cd1R39-R46
| ColumnDescriptor descriptor, | |
| ParquetColumnSpec columnSpec, |
| CometSchemaImporter importer, | ||
| int batchSize, | ||
| boolean useDecimal128, | ||
| boolean isConstant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| boolean isConstant | |
| boolean useLazyMaterialization, | |
| boolean useLegacyTimestamp |
|
|
||
| 1. Creates `CometSchemaImporter` with a `RootAllocator` | ||
| 2. Uses `Utils.getColumnReader()` to create appropriate column readers | ||
| 3. Calls `reset()` and `setPageReader()` for each row group |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which reset() method ?
Maybe:
| 3. Calls `reset()` and `setPageReader()` for each row group | |
| 3. Calls `Native.resetBatch()` and `setPageReader()` for each row group |
?!
Rationale
Iceberg depends on Comet, therefore Comet has a public API that needs to be maintained. This isn't documented very well.
This PR adds an
@IcebergApiannotation to the public API.There is also a new page in the contributor guide documenting this API.
I generated this information by analyzing the latest from Iceberg's main branch.
🤖 Generated with Claude Code