Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differing Interpretations of Protobuf Spec in Java and Go SDKs #759

Open
zach-robinson opened this issue Jan 21, 2022 · 33 comments
Open

Differing Interpretations of Protobuf Spec in Java and Go SDKs #759

zach-robinson opened this issue Jan 21, 2022 · 33 comments

Comments

@zach-robinson
Copy link

On the use of the text_data and binary_data fields in the protobuf-format spec, it says:

When the type of the data is text, the value MUST be stored in the text_data property.
datacontenttype SHOULD be populated with the appropriate media-type.
When the type of the data is binary the value MUST be stored in the binary_data property.
datacontenttype SHOULD be populated with the appropriate media-type.

The Java SDK's serializer always uses the text_data field when the datacontenttype is application/json, application/xml or text/*. Otherwise the binary_data field is used. However, in the Go SDK's deserializer binary_data is treated literally, while text_data gets serialized using the appropriate serializer for the datacontenttype. When a CloudEvent with a datacontenttype of application/json is serialized into protobuf using the Java SDK and then sent to a service using the Go SDK and deserialized there, this results in the JSON in the text_data field being interpreted as a JSON-String when it should be treated as a JSON-Object. So if I serialize this CloudEvent into protobuf using the Java SDK:

{
    "specversion" : "1.0",
    "type" : "com.example.someevent",
    "source" : "/mycontext",
    "subject": null,
    "id" : "C234-1234-1234",
    "time" : "2018-04-05T17:31:00Z",
    "comexampleextension1" : "value",
    "comexampleothervalue" : 5,
    "datacontenttype" : "application/json",
    "data" : {
        "appinfoA" : "abc",
        "appinfoB" : 123,
        "appinfoC" : true
    }
}

and then send the resulting protobuf object to a service using the Go SDK and deserialize it there back into JSON format, the result is:

{
    "specversion" : "1.0",
    "type" : "com.example.someevent",
    "source" : "/mycontext",
    "subject": null,
    "id" : "C234-1234-1234",
    "time" : "2018-04-05T17:31:00Z",
    "comexampleextension1" : "value",
    "comexampleothervalue" : 5,
    "datacontenttype" : "application/json",
    "data" : "{ \"appinfoA\" : \"abc\", \"appinfoB\" : 123, \"appinfoC\" : true }"
}

This seems to be unintended behavior, but I as far as I can tell, neither the Java nor the Go SDK implementations of the Protobuf format are technically wrong according to the way the spec is written, so I decided to raise this issue here to get clarification. Is this a bug in one or both of these libraries? Should the Protobuf format spec be more specific in how the binary_data and text_data fields should be used? Or is this behavior somehow intended?

@duglin
Copy link
Contributor

duglin commented Jan 24, 2022

Nice issue. @n3wscott @lance @JemDay any comments?

When the type of the data is text, the value MUST be stored in the text_data property

Should JSON really be just "text" in protobuf if it's going to treat it as a JSON string on the other side? If it were treated as binary would that avoid this issue?

@JemDay
Copy link

JemDay commented Jan 24, 2022

I'm curious is this a protobuf-format encoding issue or a json-format encoding issue - in the example above from the Go SDK is that the general behavior when the data is JSON (which I believe is in conflict with the JSON format specification).

What this does call-out is the need for defined inter-op or compliance tests for the various format implementations to ensure they do play well with one another.

@JemDay
Copy link

JemDay commented Jan 24, 2022

An example unit-test input for a Proto formatted event with a JSON data payload for the java-sdk is here, it's a little simplistic and could potentially warrant a complex example.

Note: The unit-test files use the JSON encoding for a protobuf object NOT the CE JSON format encoding.

@zach-robinson
Copy link
Author

zach-robinson commented Jan 24, 2022

I'm curious is this a protobuf-format encoding issue or a json-format encoding issue - in the example above from the Go SDK is that the general behavior when the data is JSON (which I believe is in conflict with the JSON format specification).

This issue is specific to the protobuf-deserializer in Go. It has to do with the difference in behavior that exists when the data is contained in the binary_data field vs the text_data field:

// NOTE: There are some issues around missing data content type values that
// are still unresolved. It is an optional field and if unset then it is
// implied that the encoding used for the envelope was also used for the
// data. However, there is no mapping that exists between data content types
// and the envelope content types. For example, how would this system know
// that receiving an envelope in application/cloudevents+protobuf know that
// the implied data content type if missing is application/protobuf.
//
// It is also not clear what should happen if the data content type is unset
// but it is known that the data content type is _not_ the same as the
// envelope. For example, a JSON encoded data value would be stored within
// the BinaryData attribute of the protobuf formatted envelope. Protobuf
// data values, however, are _always_ stored as a protobuf encoded Any type
// within the ProtoData field. Any use of the BinaryData or TextData fields
// means the value is _not_ protobuf. If content type is not set then have
// no way of knowing what the data encoding actually is. Currently, this
// code does not address this and only loads explicitly set data content
// type values.
contentType := ""
if container.Attributes != nil {
	attr := container.Attributes[datacontenttype]
	if attr != nil {
		if stattr, ok := attr.Attr.(*pb.CloudEventAttributeValue_CeString); ok {
			contentType = stattr.CeString
		}
	}
}
switch dt := container.Data.(type) {
case *pb.CloudEvent_BinaryData:
	e.DataEncoded = dt.BinaryData
	// NOTE: If we use SetData then the current implementation always sets
	// the Base64 bit to true. Direct assignment appears to be the only way
	// to set non-base64 encoded binary data.
	// if err := e.SetData(contentType, dt.BinaryData); err != nil {
	// 	return nil, fmt.Errorf("failed to convert binary type (%s) data: %s", contentType, err)
	// }
case *pb.CloudEvent_TextData:
	if err := e.SetData(contentType, dt.TextData); err != nil {
		return nil, fmt.Errorf("failed to convert text type (%s) data: %s", contentType, err)
	}

The SetData function uses the built-in Go serializer corresponding to the provided datacontenttype. The serializer converts the object to it's []byte representation, so if a struct is passed in it will be interpreted as a JSON-Object:

package main

import (
	"fmt"

	"github.com/cloudevents/sdk-go/v2/event"
)

type myObject struct {
	Hello string `json:"hello"`
}

func main() {
	e := event.New()
	e.SetSpecVersion("1.0")
	e.SetType("com.type")
	e.SetSource("/")
	e.SetID("123")
	e.SetData("application/json", &myObject{Hello: "world"})
	fmt.Printf("%v\n", e)
}

This yields the output:

Context Attributes,
  specversion: 1.0
  type: com.type
  source: /
  id: 123
  datacontenttype: application/json
Data,
  {
    "hello": "world"
  }

However if you attempt to serialize a string, then it is interpreted as a JSON-String, which is what causes the behavior we are seeing here:

package main

import (
	"fmt"

	"github.com/cloudevents/sdk-go/v2/event"
)

func main() {
	e := event.New()
	e.SetSpecVersion("1.0")
	e.SetType("com.type")
	e.SetSource("/")
	e.SetID("123")
	e.SetData("application/json", "{\"hello\":\"world\"}")
	fmt.Printf("%v\n", e)
}

This yields the output:

Context Attributes,
  specversion: 1.0
  type: com.type
  source: /
  id: 123
  datacontenttype: application/json
Data,
  "{\"hello\":\"world\"}"

Here is where the differing interpretations I mentioned come into play. Any datacontenttype could theoretically be encoded as binary, so the Go authors seem to have intentionally made a distinction here based on whether the text_data or binary_data field is used for the message. The serializer in Go always uses the binary_data field for non-protobuf datacontenttype:

container.Data = &pb.CloudEvent_BinaryData{
	BinaryData: e.Data(),
}
if e.DataContentType() == ContentTypeProtobuf {
	anymsg := &anypb.Any{
		TypeUrl: e.DataSchema(),
		Value:   e.Data(),
	}
	container.Data = &pb.CloudEvent_ProtoData{
		ProtoData: anymsg,
	}

This means that the Go protobuf serializer/deserializer will always round-trip successfully. However, the Java-SDK seems to have interpreted the spec to mean that there should be a one-to-one mapping between datacontenttype and the data envelope being used:

// If it's a proto message we can handle that directly.
if (data instanceof ProtoCloudEventData) {
    final ProtoCloudEventData protoData = (ProtoCloudEventData) data;
    if (protoData.getMessage() != null) {
        protoBuilder.setProtoData(Any.pack(protoData.getMessage()));
    }
} else {
    if (Objects.equals(dataContentType, PROTO_DATA_CONTENT_TYPE)) {
        // This will throw if the data provided is not an Any. The protobuf CloudEvent spec requires proto data to be stored as
        // an Any. I would be amenable to allowing people to also pass raw unwrapped protobufs, but it complicates the logic here.
        // Perhpas that can be a follow up if there is a need.
        Any dataAsAny = null;
        try {
            dataAsAny = Any.parseFrom(data.toBytes());
        } catch (InvalidProtocolBufferException e) {
            throw CloudEventRWException.newDataConversion(e, "byte[]", "com.google.protobuf.Any");
        }
        protoBuilder.setProtoData(dataAsAny);
    } else if (isTextType(dataContentType)) {
        protoBuilder.setTextDataBytes(ByteString.copyFrom(data.toBytes()));
    } else {
        ByteString byteString = ByteString.copyFrom(data.toBytes());
        protoBuilder.setBinaryData(byteString);
    }
}

In the Java code, when converting into protobuf the decision of whether to use binary_data or text_data is based on the result of isTextType(dataContentType):

// The proto spec says all text data should go into the text field. It is really difficult to figure out every case
// of text data based on the media type though, so I am just going to check for some common cases.
private static boolean isTextType(String type) {
    if (type == null) {
        return false;
    }
    return type.startsWith("text/") || "application/json".equals(type) || "application/xml".equals(type);
}

So this only pops up when sending protobuf CloudEvents between services written in different languages. You can tell from the comments though that the spec was interpreted differently by the authors of each. In the Go code we have:

For example, a JSON encoded data value would be stored within the BinaryData attribute of the protobuf formatted envelope.

Where as the Java code explicitly use the text_data field when the datacontenttype is set to application/json.

@JemDay
Copy link

JemDay commented Jan 24, 2022

This does deserve some conversation as this is obviously not a good situation.

I'm not a Go expert so I can't see where the specific "encoding format" is being applied - I'm curious what happens when you use a JSON format in those examples as they should produce the same output .. the JSON format encoding requires that the datacontentype be checked to see how the data is represented on the wire.

IN essence the only real difference here is that the JSON format requires the SDK to look for JSON data and represent it a set way, and the Proto spec extends that test to cover all textual media types - which granted is a bit of a wider net to cast but base don the same basic set of checks.

What you have shone a light on here is that the Java code has a short-coming as it's not checking for datacontenttype endsWith("+json") or endsWith("+xml")

I feel an issue coming on...

@zach-robinson
Copy link
Author

What you have shone a light on here is that the Java code has a short-coming as it's not checking for datacontenttype endsWith("+json") or endsWith("+xml")

I don't know that that is explicitly needed. application/json and application/xml should always be the correct MIME types for JSON and XML respectively. Also, this wouldn't fix the original issue that I was pointing out. In fact the problem here comes down to the interpretation of what qualifies as text_data vs binary_data. The authors of the Java-SDK seem to consider JSON to be a form of text_data, while the Go-SDK authors clearly consider it to be binary_data, which is what is causing the issue. Whether intentionally or not, the Go-SDK always treats text_data as a string-type rather than maintaining the original serialization of the data as it does for binary_data.

@JemDay
Copy link

JemDay commented Jan 25, 2022

Thanks @zach-robinson for flagging this - I feel this is a subject for the working group to discuss and clarify behavior as appropriate. Putting textual data into a binary construct raises the concern around how character-encoding schemes should be communicated.

My personal view is that the Go-SDK isn't adhering to the spirit of the protobuf format specification:

When the type of the data is text, the value MUST be stored in the text_data property.

  • datacontenttype SHOULD be populated with the appropriate media-type.

@lance
Copy link
Member

lance commented Jan 25, 2022

My personal view is that the Go-SDK isn't adhering to the spirit of the protobuf format specification

I agree with this.

@duglin
Copy link
Contributor

duglin commented Jan 26, 2022

What does When the type of the data is text mean? What defines something as text? Does it mean string? I'm not sure I would consider a JSON object a "string" or "text".

@JemDay
Copy link

JemDay commented Jan 27, 2022

ok, I'll bite - from either http://json.org or the ECMA standard you'll find this type of language "JSON is a lightweight, text-based, language-independent syntax for defining data interchange formats"

And I'd say text is "a sequence of characters".

If you put a "sequence of characters" into a "binary property" then you'd also need a mechanism to indicate the character encoding schema that was used so that it is correctly interpreted - thus the whole preamble construct for XML documents.

Granted JSON documents are always meant to be exchanged in UTF-8 so given a content-type that indicates JSON you should be able to reliably process it.

The protobuf string scalar type is UTF-8 encoded so there's no danger of mismatch when the string data is demarshalled.

@JemDay
Copy link

JemDay commented Jan 27, 2022

And in closing ... I 'may' be incorrect with my encoding comment for JSON, the standard does allow for other unicode encoding schemes which then demands introspection : See Section 3. Encoding here

@duglin
Copy link
Contributor

duglin commented Jan 27, 2022

True, JSON is text, but I think this line from the intro comment is interesting:

...this results in the JSON in the text_data field being interpreted as a JSON-String...

If text_data is {}, how would a receiver know if that is an empty JSON object or a string with a value of {} ? Both are valid JSON and would satisfy the app/json content-type. I don't know protobuf at all so this may not make any sense, but if this ambiguity I'm asking about is possible, then it seems like JSON (while technically "text") might need to treated as binary and text_data should be reserved for just "string" data types.

@jskeet
Copy link

jskeet commented Jan 28, 2022

Personally I'd say that if a JSON string with content "{}" is to be stored in a CloudEvent using the protobuf event format, it should be in text_data with content of "{}" including the double quotes. In other words, any JSON value (object, string, true/false, numeric) should be in text_data with the same string content that it would appear as if it were in a standalone JSON document.

The C# SDK has different ways of handling data in general - the CloudEvent.Data property is just object, and each event format specifies how it handles different object types. Here's what I'm proposing the C# SDK does for protobuf event format:

When encoding CloudEvents in structured mode, three kinds of data are supported, as indicated in the
event format. Text is stored in the CloudEvent.TextData field; binary data is stored
in the CloudEvent.BinaryData field; protobuf messages are stored in the
CloudEvent.ProtoData field. In the last case, the message is packed in an
Any message, to preserve information about which message is encoded, unless the message
is already an Any in which case it is stored directly. (This prevents "double-encoding"
when a CloudEvent is decoded and then re-encoded.) Attempts to serialize CloudEvents with any other data type
will fail. Derived classes can specialize all of this behavior by overriding
EncodeStructuredModeData(CloudEvent, V1.CloudEvent

When decoding CloudEvents in structured mode, text and binary data payloads are represented as strings and byte
arrays respectively. Protobuf message payloads are represented using the Any wrapper, without
attempting to "unpack" the message. This avoids any requirement for the underlying message type to be
known by the application consuming the CloudEvent. (The data may be stored for later processing by another
application with more awareness, for example.) Derived classes can specialize all of this behavior by
overriding DecodeStructuredModeData(V1.CloudEvent, CloudEvent).

This doesn't explicitly document what is meant by "binary" and "text" in the first paragraph, but the implementation is currently "byte arrays count as binary; strings count as text". This means if you want to put a JSON object (or JSON string) into the event, you (the caller) have to set the Data property to be the appropriate string representation, or binary if you really want to.

Structured mode event handling does not use the datacontenttype attribute at all in the protobuf event formatter - at least in the local copy I've got here. (None of this is committed.)

I don't think this issue needs to delay the protobuf event format 1.0 ratification, but we may want to add clarifying guidelines for SDKs. (We may want to think about the bigger issue of just how we specify SDK conformance in general, beyond this specific issue.)

@JemDay
Copy link

JemDay commented Feb 1, 2022

@jskeet - I was with you until the words ".. including the double quotes.."

My argument is that there are no quotes around a JSON value unless that value is a string; so the text_data field should contain:

[ { "author" : "dickens"}, { "author", "Shakespeare"} ]

And not

"[ { \"author\" : \"dickens\"}, { \"author\", \"Shakespeare\"} ]"

Essentially you MUST be able to take the content of text_data and pass it directly to a JSON parser, the parser would misinterpret the content if it had a leading quote (")

@jskeet
Copy link

jskeet commented Feb 1, 2022

@JemDay: But that "including the double quotes" was in the context of a JSON string value (emphasis mine):

Personally I'd say that if a JSON string with content "{}" is to be stored in a CloudEvent using the protobuf event format, it should be in text_data with content of "{}" including the double quotes.

An empty JSON object should be stored without the double quotes, yes.

That is my answer to @duglin's question here:

If text_data is {}, how would a receiver know if that is an empty JSON object or a string with a value of {} ?

In my view, if the text_data is {} (and the content type is application/json) then it represents an object value. If the text_data is "{}" then it represents a string value.

I suspect we're in violent agreement, and the lack of clarity is due to ambiguity between "a string that contains JSON" and "a JSON string value". In C#, to demonstrate the difference:

string json = "{}"; // This is just a string which could be parsed as a JSON object
JValue value = new JValue("{}"); // This is a JSON string value that happens to contain curly braces

Are we now actually in agreement?

@JemDay
Copy link

JemDay commented Feb 1, 2022

@jskeet - I believe we are sir :-)

@duglin duglin transferred this issue from cloudevents/spec Feb 17, 2022
@duglin
Copy link
Contributor

duglin commented Feb 17, 2022

Move to the sdk-go repo per the 2/17/2022 call.

@sghung
Copy link

sghung commented Apr 25, 2022

Any update on this issue?

@n3wscott
Copy link
Member

Apart from the confusion around a json string, I think this thread came to the wrong conclusion. I think the go sdk is doing this correctly.

in the original question, it has this example:

{
    "specversion" : "1.0",
    "type" : "com.example.someevent",
    "source" : "/mycontext",
    "subject": null,
    "id" : "C234-1234-1234",
    "time" : "2018-04-05T17:31:00Z",
    "comexampleextension1" : "value",
    "comexampleothervalue" : 5,
    "datacontenttype" : "application/json",
    "data" : {
        "appinfoA" : "abc",
        "appinfoB" : 123,
        "appinfoC" : true
    }
}

Which claims the java impl puts the json output into the proto data string. But I think this is incorrect, the output of marshaling a json object is not a string, it is a byte array. Casting to a string changes the value and the parsers on the other side can't make an assumption about that cast to be able to undo the cast. It makes the following non-deterministic:

    "data" : {
        "appinfoA" : "abc",
        "appinfoB" : 123,
        "appinfoC" : true
    }

vs

    "data" : "{ \"appinfoA\" : \"abc\", \"appinfoB\" : 123, \"appinfoC\" : true }"

The proto TextData should be the string { "appinfoA" : "abc", "appinfoB" : 123, "appinfoC" : true }, but the sdk must interpret this as the string "{ \"appinfoA\" : \"abc\", \"appinfoB\" : 123, \"appinfoC\" : true }".

This is exactly the same problem with parsing a cloudevent in structured mode. The spec says you need send json in json format and not a string.

{
    "specversion" : "1.0",
    "type" : "com.example.someevent",
    "source" : "/mycontext",
    "id" : "C234-1234-1234",
    "data" : {
        "this" : "is sent as binary and is a json object"
    }
}
{
    "specversion" : "1.0",
    "type" : "com.example.someevent",
    "source" : "/mycontext",
    "id" : "C234-1234-1234",
    "data" : "{\"this\" : \"is a json string and not an object\"}"
}

So, I think the go sdk is working as intended. The issue lies in the java sdk's misinterpretation of the output of the json marshaler. return type.startsWith("text/") || "application/json".equals(type) || "application/xml".equals(type);

@JemDay
Copy link

JemDay commented Jul 19, 2022

Has somebody tried verifying the behavior using the test files from the Java SDK.... you should be able to create a protobuf object from the proto-json (and that's a proto encoding NOT a CloudEvent encoding) representation and then demarshall it using the GO SDK.

The various proto to wire-level representations are here : https://github.com/cloudevents/sdk-java/tree/master/formats/protobuf/src/test/resources/v1

The JSON data payload is very simple .. it just caries an empty JSON object.

Ideally we should align on the wire-representation(s) and then ensure the SDKs handle them coherently.

I can't make the call tomorrow (7/20) unfortunately, and apologies for the multiple edits on this - I blame autocorrect.

@duglin
Copy link
Contributor

duglin commented Jul 20, 2022

@n3wscott wrote:

The proto TextData should be the string { "appinfoA" : "abc", "appinfoB" : 123, "appinfoC" : true }, but the sdk must interpret this as the string "{ "appinfoA" : "abc", "appinfoB" : 123, "appinfoC" : true }".

Scott - I'm wondering if comparing this to how things work for HTTP would help find the right answer. In HTTP, how do the SDKs deal with an HTTP body of:

{}

if a) content-type==app/json
and b) content-type==text/plain ?

Do both result in the same thing (json object vs just string) ?

the output of marshaling a json object is not a string

certainly true for golang, but I'm not sure this is a true statement in general since I think most people would say that JSON is a string serialization of data, not a binary one.

@n3wscott
Copy link
Member

n3wscott commented Jul 20, 2022

if a) content-type==app/json

it would be an empty json object.

and b) content-type==text/plain

it would be the string "{}"

I should add some calcification, the content-type signals how you should read and decode the byte buffer in the body, you could choose to read that buffer in and send it to a json unmarshaller for text/plain but you have make that choice in code to ignore the content type provided.

This is not exactly the case for the proto, it is received as a string, and you will have to explicitly choose to cast that string to a byte array to be something the json unmarshaller can act on. This is not something we can do programmatically without some introspection and assumption of the intent of the inbound string.

@duglin
Copy link
Contributor

duglin commented Jul 20, 2022

In that case, wouldn't:

  • datacontenttype=app/json
  • text_data={}

result in a JSON Object not a string as you suggest?

(I edited my previous comment)

@n3wscott
Copy link
Member

The simpler answer for your original question:

b) content-type==text/plain

it is invalid and the parser will throw an error. A smart parser will read in the body as the content type (string) and then will not know how to parse as json (bytes)

@JemDay
Copy link

JemDay commented Jul 20, 2022

if a) content-type==app/json
. it would be an empty json object.
and b) content-type==text/plain
it would be the string "{}"

That's exactly the expected behavior if you were to switch the datacontenttype in the Protobuf examples.

I'm a bit lost now as to whether we're talking purely about JSON over HTTP or JSON over a Protobuf encoded CE :-(

@duglin
Copy link
Contributor

duglin commented Jul 20, 2022

But you're assuming a JSON parser only takes binary data. That's not true for all languages. I'm looking at some Java samples and they all seem to assume it's a string - which is a fair assumption given JSON is a stringified format.

I guess it depends on whether the use of text_data and binary_data should override the datacontenttype attribute or not. I was assuming it wouldn't.

@n3wscott
Copy link
Member

n3wscott commented Jul 20, 2022

That's exactly the expected behavior if you were to switch the datacontenttype in the Protobuf examples.

I am not so sure, because how do I send the string "{}"?

I think if the proto gives me a string, and the content type is app/json, I should see that as a json string, and not try to cast it back to a byte array and then parse it.

As @JemDay is suggesting, if I get a proto text data as "{}", it will be parsed as a json string of {}, popping off one level of quotes.

I am saying we should see the proto text data field as having explicit quotes around the body, and any json parsing will always result in the string of the text data contents.

vs

Assume if datacontenttype is app/json, assume the text data field is bytes without the outer quotes.

I think this ends up in a broken world where if I do send a simple json string:

{
    "specversion" : "1.0",
    "type" : "com.example.someevent",
    "source" : "/mycontext",
    "id" : "C234-1234-1234",
    "data" : "this is text"
}

I could get either of these in the proto and they are equivalent:

(after : I am trying to show bytes on the wire)

binary_data:"this is text"
text_data:this is text

@duglin
Copy link
Contributor

duglin commented Jul 20, 2022

based on previous comments, I think:

I am saying we should see the proto text data field as having explicit quotes around the body

is the point of disagreement. When a string is serialized in json, (per json.org) it starts with quotes so I'm not sure why we would remove the quotes when we put it into proto - it's part of the business data/payload. W/o it you can't tell if 1.5 is a string or a number. Taken from the JSON format samples:

content-type: application/json

1.5

absence or presence of quotes around the 1.5 means a lot. So, modifying your example:

text_data: 1.5

is that a string or a number? If we assume "missing quotes" it's a string. But then how can I represent a number? Am I forced to use binary_data ? Feels wrong.

@n3wscott
Copy link
Member

is that a string or a number?

I would say it is the string "1.5" because it is formatted as a string. If you want that to be the number 1.5, but the bytes in the binary field. If it is json encoded, it will be 1.5

@n3wscott
Copy link
Member

I think the larger misunderstanding here is thinking proto.text_data is equivalent to the body of http, and it is not. We get access to the proto.text_data after the wire encoding has been interpreted. text_data is a string. Proto has an extra step we should respect in the type system.

@JemDay
Copy link

JemDay commented Jul 20, 2022

proto.text_data is not meant to mean anything other than "it's a character sequence" so if that's needed to carry an empty JSON Object it would be two characters long {}; if it's meant to represent an empty JSON Array it would also be two characters long []

So .. From an SDK perspective the Protobuf (and XML!!) sdks are able to say to their parent applications the 'data' of the CloudEvent was a character sequence and here it is ... if desired they can go one step further and say "based on the datacontenttype it happens to be JSON, so here's the JSON Value so you don't have to parse it"

When I did the Protobuf spec and eventually the XML spec is was heavily influenced by the way the JSON spec had evolved to enable 3 types of data, binary, character-sequence, and JSON and ended up replicating that pattern, so in both of them you see binary-data, character-sequence-data, and format-native-data.

To Scott's point there is no association or relationship to an HTTP body (or any other transport for that matter) - we should also think in terms of the CE Information Model and consider how that model maps into a 'format' and that is completely independent of AMQP/HTTP/KAFKA etc..

@JemDay
Copy link

JemDay commented Jul 20, 2022

And to continue that point ... if the JSON Value is a string containing the word hello then the proto.text_data would be 7 characters long - "hello"- including the "s because that's what makes it a JSON String value.

@JemDay
Copy link

JemDay commented Jul 20, 2022

That behavior I'd want to verify to ensure the Java SDK works like that, I don't believe I had a test-case for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants