Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Sep 15, 2025

This PR adds full support for reading and comparing Protocol Buffer (protobuf) data to the stream-diff tool, addressing the feature request to support protobuf data sources alongside the existing CSV and JSON support.

What's Added

New Protobuf Data Reader

  • JSON-serialized protobuf support: Handles the most common format for streaming protobuf data (.jsonpb files)
  • Flexible type aliases: Accepts both protobuf and proto as valid source types for convenience
  • Nested structure handling: Automatically flattens nested protobuf messages using dot-notation (e.g., profile.preferences.theme)
  • Array support: Properly handles repeated fields and nested arrays
  • Recursive JSON parsing: Optional support for JSON strings embedded within protobuf fields

Example Usage

source:
  type: protobuf  # or "proto" for short
  path: path/to/your/data.jsonpb
  parser_config:
    json_in_string: false  # Usually not needed for protobuf JSON
  sampler:
    sample_size: 1000  # Number of records to sample for schema detection

Schema Generation Integration

The protobuf reader seamlessly integrates with the existing schema generation system:

  • Automatic type detection: Identifies numeric, string, datetime, object, and array types from protobuf data
  • Nested field flattening: Converts complex protobuf messages into flat field structures for comparison
  • Statistical analysis: Generates the same comprehensive field statistics as CSV and JSON sources

Test Coverage

  • Comprehensive test suite: Added full test coverage for protobuf functionality
  • Realistic test data: Created sample protobuf data with nested structures, arrays, and various data types
  • Schema generation tests: Verified proper field detection and type inference
  • Integration tests: Ensured protobuf works seamlessly with existing comparison logic

Technical Implementation

The implementation follows the existing patterns established by the CSV and JSON readers:

  • Implements the DataReader interface for consistent behavior
  • Extends the factory pattern in datareader.New() to support protobuf types
  • Reuses existing schema generation and field analysis infrastructure
  • Maintains backward compatibility with all existing functionality

This change enables users to compare protobuf data streams using the same powerful schema detection, field analysis, and reporting capabilities that were previously available only for CSV and JSON data sources.

Dependencies

Added google.golang.org/protobuf v1.36.9 for future binary protobuf support (currently used minimally for JSON format handling).


💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.

Co-authored-by: clickstefan <5954967+clickstefan@users.noreply.github.com>
Copilot AI changed the title [WIP] Add support also for reading & comparing Protobufs. Add comprehensive protobuf support for reading & comparing protobuf data Sep 15, 2025
Copilot finished work on behalf of clickstefan September 15, 2025 01:13
Copilot AI requested a review from clickstefan September 15, 2025 01:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants