Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Extensibility vs Forward Compatibility vs Structured data in JSON and binary formats #294
This is a summary of what we've encountered while trying to support CloudEvents at Google. This is less an argument for or against property bags and more a summary of the concerns we have at Google and the tradeoffs the working group needs to make.
Forward compatibility and the semantic versioning that we use for future versions is less flexible in binary formats
The Extensibility problem:
Your Protobuf-based services will sometimes sit in the middle between a producer that speaks JSON and a consumer that expects yet something different. The producer will cook up an extension that you've never seen and you MUST forward that downstream.
The only way to solve that is for all extensions, in Proto, to flow in an extensions bag (!) that holds key/value pairs since you can't assign stable numbers to fields that your code generator didn't know about and that nobody will ever give you a proto schema for.
Keys of extensions are strings. They aren't numbers. Yes, extensions in proto and in thrift and in avro will run long since they need to carry metadata.
XML and JSON and all binary formats that lean on the JSON Maps/Arrays/Values model (EXI, AMQP, BSON, MsgPack, etc.) are doing perfectly fine with "expect more" inline extensibility. We don't need to chew through theory for that, it's provably working in thousands of projects, and very many APIs, and it's a solved problem in most serializers.
The part that I don't understand in this whole line of argumentation is (a) how JSON even plays a role here because that format is totally orthogonal to Protobuf, (b) why you think an abstract data model without an explicit extensions bag makes it impossible for you to create such a bag, as needed, in a proprietary wire-format implementation that you full control, and (c) why you appear to believe that unrelated on-wire formats must be structurally identical.
"Each specification that defines how to serialize a CloudEvent will define how extension attributes will appear."
That means, literally, that if you want and need an "extensions" bag in your proto event format, you are fully empowered to make one.
It's common best practice that any implementation that handles multiple wire formats/protocols will have "neutral" internal in-memory representation that renders into/from the supported wire formats using serialization helpers. We support 4 protocols on our messages brokers where you can send with one and receive with any of the others and that works by ways of a canonical internal format for the envelope that holds all metadata and that's exactly equivalent to what we have here. Of those, AMQP has a partially predefined schematized binary projection.
Lastly, I am flabbergasted by how you can think that a demand for "json implementations of the CE spec be compatible with proto lang" is even remotely appropriate to bring to the WG. Protobuf is proprietary Google technology under Google copyright and control.
There's a lot in here but I think it comes down to a few things:
1 - you're asking for new spec defined properties to only be added in major versions of the spec, even if they're optional. Can you point to an existing spec that has this requirement because I can't think of one off-hand and this goes against normal semver semantics?
2 - you're asking for our JSON serialization to align with proto's JSON serialization. This appears to be asking us to give up the freedom to design our own valid JSON serialization rules. For example, in #295 it says "The standard JSON of this protobuf format is not compatible from the official [CloudEvents JSON encoding][CE_JSON_ENCODING] at the time of writing" - so should we be expecting a PR from Google soon to change our JSON to match it?
3 - there appears to still be a disconnect when I see things like "Event consumers supporting binary formats: Cannot easily handle arbitrary top-level attributes" because the PR specifically says bindings can decide how to serialize extensions - which means they can add a bag if they want. But, this also implies that binary formats can not handle new optional top-level properties when I know this isn't true - not even for proto because the spec specifically talks about how it parses/saves them: https://developers.google.com/protocol-buffers/docs/proto3#unknowns
A couple of things that I think were left out of @rachelmyers's comments:
1 - The proposed proto binding in #295 describes a rather odd set of rules that need to be followed - see: https://github.com/cloudevents/spec/pull/295/files#diff-fa5ef5e6cd5d55a6800e45a7281530aaR193 . Some things that concern me are:
But I guess this is what @rachelmyers meant when she said "We know that the JSON is clunkier and this makes the promotion process for JSON-only systems more complicated" when there is a bag.
It would appear that these rules being specified in #295 would have to be moved out from that one binding spec and into our JSON spec, if not the main spec.
2 - It seems to me that you're suggesting that the appearance of new top-level properties (even if spec defined by v.minorNext) are problematic for proto in general, and this isn't true, but I'm struggling to understand why no other JSON spec has the issues you're raising, just ours.