-
Notifications
You must be signed in to change notification settings - Fork 582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolve quote issue for HTTP Binary Transport Encoding. #413
Conversation
Signed-off-by: Scott Nichols <nicholss@google.com>
http-transport-binding.md
Outdated
[JSON values][json-value] for well-known context attributes as defined by the | ||
[CloudEvents specification][ce] or [CloudEvents Extensions][extensions] MAY omit | ||
the surrounding single or double quote. All values are assumed to be a string | ||
unless well-known. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I get this correctly - if I send a not well-known attribute of type Integer
or Binary
over HTTP Binary, a valid implementation of the HTTP Binary Transport MUST transform the attribute into a String
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it wouldn't be better to not talk about "well-known" attributes since that implies spec defined attributes would be treated differently from extensions, but instead talk about the data type of the attribute. Meaning, for types like 'string' we remove the quotes. For integers, we're kind of stuck when extensions are copied from http headers into whatever CE representation is being used.
Another option is to make all attributes in CE variants of strings. Then if someone wants to convert it to some other datatype when they extract it from the in-memory representation of the CE, that's up to them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a valid implementation of the HTTP Binary Transport MUST transform the attribute into a String
It always sends it as a string in the header. We are currently pretending that a string is an escaped byte blob and that is a really strange choice for http headers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
implies spec defined attributes would be treated differently from extensions
I am saying exactly that. The exception is for known extensions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Well-known" is difficult because that set might change in CloudEvents v.Next and it might include promotion of previously proprietary attributes into the standard. For proprietary attributes you can't omit the quotes, but for well-known ones you can and thus the promotion of the attribute might break existing code. I understand that we could solve that through versioning, but I consider that a dirty trick.
There are three levels of strings here:
- The string that holds the JSON data and which is stored in the HTTP header
- The JSON string object that holds string data
- The string data
The quotes belong to 2. If you parse the string, you first get a JSON object for which you then get inferred type info and cast it to a JSON string and then you convert to a platform string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It always sends it as a string in the header. We are currently pretending that a string is an escaped byte blob and that is a really strange choice for http headers.
Yes, I'm aware that at HTTP level, we're always sending strings.
But I am talking about the CloudEvents type system when the event is transformed out of HTTP into something else (e.g. an in-memory representation of an SDK).
Maybe I'm wrong, but - as far as I understand your proposed text - if I'm sending an attribute myintegerattr
of type Integer
, it'll come back as type String
. In C-like syntax, int myintegerattr
will come back as String myintegerattr
.
So the type information gets lost, and the event does not conform to the types a consumer may expect. Or, put another way, if I send the same event in Binary and Structured mode, I get different events back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup. This difference in how we deal with things like Integers, combined with our inability to promote extensions in a dot-release leads, me to think that CloudEvents should treat everything as strings - at the spec level. What any particular implementation does with those values, and how they expose them to their users, is an implementation choice.
Of course, that will mean that JSON CEs will now have integers quoted, but TBH that's less weird than quoted HTTP Header strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A simple out for this problem is to wrap the value:
Binary to Structured:
ce-custom-num: 1
converts to{"custom":{"num":"1"}}
. <-- this is bad.
ce-custom: {"num":1}
converts to{"custom":{"num":1}}
. <-- this is good.
I think it is an easy way to get the best of both worlds. Don't flatten the map. Don't support Strings at the root level of extensions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, that will mean that JSON CEs will now have integers quoted, but TBH that's less weird than quoted HTTP Header strings.
Yes, it is a tradeoff decision. But we should also consider the likelihood of one format being used over the other. E.g. if 50% of the future users will use the Binary encoding, then it makes sense to optimize for a less-weird Binary encoding. However, if only 5% use it, and the other 95% use formats that properly support Integers (correct me if I'm wrong, but that seems to be all others formats?), then I'd rather have a weird design choice for the 5%, and a non-weird design choice for the 95%.
At this point it is mostly guesswork, but personally I'd not optimize for HTTP Binary, because I don't think it'll be the major way that CloudEvents will be transported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Knative only supports Binary mode fwiw. The reason being that the consumer can still decode the payload in the same way it did before.
http-transport-binding.md
Outdated
```text | ||
ce-specversion: 0.2 | ||
ce-type: com.example.someevent | ||
ce-custom: {"my":"value"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know examples aren't normative, but I'd prefer if we explicitly call out if this is a String
which happens to contain JSON, and not Map
according to the CE type system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a Map
encoded as a String
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A String
is encoded as a JSON String and a Map
is encoded as a JSON Object. The Map
and String
are unrelated in CloudEvents itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm missing something, but shouldn't the CloudEvents-Map
be encoded differently: https://github.com/cloudevents/spec/blob/master/http-transport-binding.md#3131-http-header-names ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maps are allowed to be represented as strings or promoted into the json struct and both should be equivalent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@n3wscott I'm sorry, I must be blind - could you point me to the part of the spec that says so? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cneijenhuis I think this line, unless I read it wrong:
If the datacontenttype value is either "application/json" or any media type with a structured +json suffix, the implementation MUST translate the data attribute value into a JSON value, and set the data attribute of the envelope JSON object to this JSON value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@n3wscott If I get it correctly, this only applies to the data
attribute in the json format - but we're not talking about the data
attribute here (or the json format).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! yes, you are right.
Given the discussion, I am inclined to suggest that Integers are not allowed at the top level of the context, but are allowed as keys in maps. And if you want to preserve the type info, those maps should not be flattened in binary mode. Further, if everything is a string, then they can be unquoted, single or double quoted for headers binary mode. Thoughts? |
Signed-off-by: Scott Nichols <nicholss@google.com>
@n3wscott We had this months-long discussion last year on how to promote extensions into the main spec, and why extensions can therefore not live in a separate extensions bag/map. I don't have a strong opinion either way, but I favor not having to have that discussion again 😉 Counter-proposal: Why not put all context attributes into a (single) header?
No quote issue, no strange flattening of maps etc. |
@cneijenhuis we use that model with the single JSON header for Azure Service Bus for all system-defined properties (see https://docs.microsoft.com/en-us/rest/api/servicebus/message-headers-and-properties) and I wish we would not have done that and kept it all just flat not least because it's pretty terrible to handle in the debugger and in logs. Mind that the example in the referenced doc for property parsing rules is confusing and I'll ask to have that clarified. |
I put this on the agenda for today, so please come prepared to speak up - we really need to decide how to deal with this. I'm inclined to say everything is a variant of string at the spec/wire level (including map values). |
What was the result of the meeting? |
@n3wscott we skipped it because both you and Adam weren't on the call and wanted your input. It's on this agenda again for this week. |
Hello mates@ |
Signed-off-by: Scott Nichols <nicholss@google.com>
Signed-off-by: Scott Nichols <nicholss@google.com>
I can't be on the call tomorrow (2/5) owing to a clash. .. my $0.02.. I would prefer "not" to stuff all the context into a single HTTP header, that will result in additional parsing when implementations might want to do routing based on those properties. A counter proposal might be to prefix strings/integers with some type identifier. Whatever we do i don't think we should rely on any sort of 'inference' - the mappings need to be explicit. |
I meant to write-up a more concrete proposal/PR, but ran out of time, but here's an idea I've been toying with:
This avoids the entire issue and allows our spec to (pretty much) focus just on what is seen on the wire - we basically don't talk about types at all. Maps are just a grouping mechanism, and technically we could drop those too and just say that if people want to group attributes then they should use common prefix names (e.g. myattrfoo, myattrbar, myattrzoo, ...). But ignoring the Map discussion... what do people think about the other parts... making everything strings? |
I like the direction of @duglin's proposal ... i do think we should retain the type information from the original event. So maybe that means I would hate to lose all the potential value of the defined type system just because the HTTP binding transport-frame looks "odd" .. |
Currently, we have two possible Samplinghttps://github.com/cloudevents/spec/blob/master/extensions/sampled-rate.md Note that the As a side note, I'm also suspicious of the formulation of this extension -- It is not specified how Sequencehttps://github.com/cloudevents/spec/blob/master/extensions/sequence.md
|
The Sequence extension does not have an It can logically contain an Integer, but this isn't related to the CE Type - i.e. the |
@duglin To hammer your point home, you can also lean on |
@@ -207,7 +207,7 @@ specification, header names are case-insensitive. | |||
##### 3.1.3.2 HTTP Header Values | |||
|
|||
The value for each HTTP header is constructed from the respective attribute's | |||
[JSON value][json-value] representation, compliant with the [JSON event | |||
unquoted [JSON value][json-value] representation, compliant with the [JSON event |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An encoding that isn't using quotes is no longer using JSON. "Unquoted JSON value" doesn't make sense if the only JSON value type for which quotes are used are strings and the quotes are significant for type inference. The convenient thing about using JSON here was that we get to borrow all of the type system and encoding and character set rules for when we need to distinguish between numbers and strings and complex types and we get automatic type inference so that generic frameworks can surface up the right types in spite of being unaware of specific rules of extensions.
If we choose to revert to "it's all just strings" and we would then still care about being able to represent numbers and complex types, we need to have a common model for how to restrict those strings, or it would be up to any individual extension to define such constraints locally within their scope.
For instance, we might keep "integer" and mandate that attributes of that type MUST be encoded, for instance, per https://tools.ietf.org/html/rfc7159.html#section-6 - or we abandon the type model altogether and have that encoding rule as a constraint next to the attribute(s) that need it.
For URI-reference and Timestamp we already have such constraints in place by referring to RFC3986 and RFC 3339 as a type-level constraint and it seems to make sense to do that just once and do the same for Integer.
What we're definitely losing with dropping the quotes is type inference for values of extensions that a generic CloudEvents framework doesn't know the specs for, because you can't easily tell whether 41751 is a number or a legitimate string (solution: it's my postal code and therefore I wanted it to be a string). I think that's a little sad, because we're sacrificing that capability for optics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is apples to oranges in regards to JSON encoding. JSON needs the "
because JSON is bytes. HTTP headers are strings. We are currently doing a hack to put bytes as sting to be interpreted as a string because it has quotes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what the apples and the oranges are. We currently define the header value as to be JSON encoded:
"The value for each HTTP header is constructed from the respective attribute's JSON value representation"
The JSON encoding of a string is a string in quotes, again rendered as a string. The JSON encoding of an integer is a stringified integer, rendered as a string. You're adding "unquoted" into the definition above, but you can't just change the rules of JSON from the outside.
The consequence of dropping the quotes is simply that we're no longer encoding strings in compliance with JSON, i.e. we need to drop the JSON reference altogether and just say that everything is a string and then constrain the string content either at the type level (if we want to keep those) or at the attribute level.
I agree I don't want to tackle this here, but I think this is also a mistake in the spec. How do you deal with objects that have dashes in their name? |
@n3wscott We already tackled this, the solution is https://github.com/cloudevents/spec/blob/master/spec.md#attribute-naming-convention - no dashes allowed. |
Shouldn't we really be starting from the spec and changing the type system (or constraining which types can be used where) - I don't see a change to that in this PR. This seems to be a case of the tail-wagging-the-dog in that the transport-binding spec(s) need to honor and support the defined type system. Basically this statement needs to be changed first.. |
+1 to changing the type system. Right now, each of the types has the following usage:
The last two types are specializations of The use of How much would it simplify the spec and implementations if context attributes were a simple |
@evankanderson @JemDay @n3wscott We need a type system if we believe that we can reuse encoding/formatting rules across multiple attributes. It's more consistent and easier to framework if we say "timestamp is an expression RFC 3339" once and then the @evankanderson on JSON is a cheap way to borrow a bunch of such rules and their implementations all at once. Since we don't seem to like that JSON mandates quotes around strings ("A string begins and ends with quotation marks." RFC8259 Sec. 7) we no longer can borrow JSON's rules, at least not the one for strings, and thus either put the encoding rules into a type system table or we put them by the attributes. What speaks for retaining the type table is that the per-type encoding really needs to go where the problem arises and that is indeed in the HTTP spec, i.e. the main spec defines the types and the HTTP spec defines the string projection; AMQP doesn't have the problem we're debating here because it has a built-in type system and the mapping is straightforward. |
@clemensv - Fully agree with your comments; the gist of my point was that in these sorts of situations we should really be looking at the type-system first and then at the transport bindings. IMHO the HTTP binding spec could be changed to support the defined type-system if the current binding doesn't work. |
I see two options here (ignoring Maps for now): 1 - everything is a string I think this would involve the spec dropping all mentions of types w.r.t. serialization. If someone defines an attribute as something other than a string then from a CE spec serialization perspective we treat it as a string. (I wonder if we'd need to define the "type<->string" conversion rules). If an impl of CE wants to convert non-string attributes to some other type for their users is up to them and out of scope for the spec. 2 - encode the type into the header name Similar to 1, we serialize everything as a string, but we encode the type into the attribute name (at least for http headers - like @JemDay suggested). For example: Even though option 2 is probably more "pure", I still prefer option 1 because it keeps our spec simple and leaves the serialization up to something above us - if they even want to deal with it. If we were developing a full blown typing system instead of just a spec that is meant to only add a few bits of metadata to existing messages, then I could be convinced that we need a more robust system, but we've been talking about how CE is meant to be light and minimal.... dealing with just strings helps keep us in that well defined box IMO. |
Problem brought up in the call: A is a producer which produces JSON output
A -> B -> C When We currently have 4.5 types which share the same JSON representation: |
We could take a page from Amazons SNS and send along the type info for extensions that are unknown to the declared version of the spec: https://docs.aws.amazon.com/sns/latest/dg/sns-message-attributes.html See: Work will have to be done to understand how to bind this data to each transport, in the case of something like AMQP or Proto, there is no need to make it because it is already in the transport. But for the case of json -> AMQP we have an out to preserve the type. |
I am at the OASIS AMQP TC F2F meeting in Berlin and we discussed this yesterday for how we can solve the AMQP side of the problem, and I think that’s a suitable approach for CloudEvents in general. In AMQP we will introduce a relaxed type model, which allows alternative representations for our base types. That means a timestamp can, for instance, also be represented by a string that is conformant with an ISO8601/RFC3339 date/time expression. It can also be represented with a 64Bit UNIX epoch and string representations of that. AMQP libraries will be required to perform a conversion towards the canonical representation when they encounter an alternative representation. In effect, that might mean that the type conversion might wait until the point when the value is being consumed. If a value in a map is of the type date, it might be stored in the map as a string, but we will provide a helper that will provide validation/conversion for the app. We assume here that if the app touches a property, it has a preconceived notion of what type it expects the property to be. If there’s a preassigned type for a value (all of the well-known CE attributes), then the SDK will perform the respective conversion and the client always gets to see a “normal” type, in spite of the value having come across the wire in any of the permitted representations. That is then also true for extensions with a strongly typed representation in the SDK. Summary: Each logical type can have one or many wire representations, explicitly including stringified ones. |
@clemensv Making this change to AMQP does not solve the issue of you not knowing what an extension's type when transmitted as a string. It possibly makes it worse if you happen to have a string that looks like a timestamp or number and the receiver is expecting a string, no? |
The way I look at it is like this....
|
I would prefer to keep the type system intact as-is and require that transport bindings maintain that type information. As such i would (re)propose that the HTTP binding be modified to encode the type information in the generated property name eg |
My attempt at getting this cleared up is to make the type system better: #432 |
Fixes #396
I am proposing no quotes. All values inside the CloudEvents context attributes assumed to be strings.
Signed-off-by: Scott Nichols nicholss@google.com