Skip to content

Conversation

@Gezi-lzq
Copy link
Contributor

Protobuf to Avro Conversion Enhancements
  • Added LogicalMapProtobufData: This new class ensures that Protobuf map fields are annotated with Iceberg's LogicalMap logical type, allowing downstream conversion to preserve MAP semantics instead of defaulting to ARRAY<record<key,value>>.
  • Updated ProtoToAvroConverter: Now uses the new LogicalMapProtobufData singleton, applies logical type conversions (e.g., for timestamps), and consistently unwraps union schemas to their non-NULL element for conversion. Also improves handling of optional and oneof fields, ensuring correct Avro nullability and presence semantics. [1] [2]
Avro Schema Union Handling and Binder Precomputation
  • Improved union schema resolution in RecordBinder: Now unions are only unwrapped if they contain exactly one non-NULL type, with clear error handling for unsupported multi-type unions. This prevents incorrect assumptions about union schemas and improves error messages. [1] [2]
  • Refactored binder precomputation for nested types: The logic for creating binders for STRUCT, LIST, and MAP types is now modularized into dedicated methods, supporting both native Avro MAP and ARRAY-of-records representations, and correctly handling struct keys/values and unioned structs.

@Gezi-lzq Gezi-lzq requested review from Copilot and removed request for 1sonofqiu, superhx and woshigaopp November 18, 2025 17:05
Copilot finished reviewing on behalf of Gezi-lzq November 18, 2025 17:08
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances Protobuf-to-Avro data handling by introducing LogicalMap support for proper MAP type preservation and improving union schema handling. The changes include:

  • A new LogicalMapProtobufData class that annotates Protobuf map fields with Iceberg's LogicalMap logical type
  • Refactored ProtoToAvroConverter with improved union unwrapping and timestamp conversion
  • Enhanced RecordBinder with better union validation and modularized binder precomputation
  • Comprehensive test coverage additions for Protobuf conversion scenarios

Key Changes

  • LogicalMap Support: Protobuf map fields now preserve MAP semantics instead of defaulting to ARRAY<record<key,value>>
  • Union Handling: Improved validation to reject non-optional unions with multiple non-NULL types, with clear error messages
  • Test Coverage: Added ~900 lines of new tests covering edge cases, optional fields, oneofs, and complex nested structures

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
LogicalMapProtobufData.java New class extending ProtobufData to annotate map fields with LogicalMap logical type
ProtoToAvroConverter.java Refactored conversion logic with consistent union unwrapping and improved timestamp handling
RecordBinder.java Enhanced union resolution with validation and modularized binder creation for nested types
ProtobufRegistryConverter.java Updated to use LogicalMapProtobufData singleton
CodecSetup.java Added public accessor for LogicalMap instance
ProtobufRegistryConverterUnitTest.java New comprehensive test suite (719 lines)
ProtobufRegistryConverterTest.java Added map field test case
ProtoToAvroConverterTest.java New unit tests for converter edge cases
AvroRecordBinderTypeTest.java New comprehensive type conversion tests (1000 lines)
AvroRecordBinderTest.java Refactored to focus on specific binding scenarios

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Gezi-lzq Gezi-lzq enabled auto-merge (squash) November 19, 2025 02:57
@Gezi-lzq Gezi-lzq merged commit 147264b into main Nov 19, 2025
6 checks passed
@Gezi-lzq Gezi-lzq deleted the feat/pb-convert branch November 19, 2025 03:02
Gezi-lzq added a commit that referenced this pull request Nov 19, 2025
Gezi-lzq added a commit that referenced this pull request Nov 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants