Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Payload and Versioning #40

Closed
xibz opened this issue Jun 9, 2022 · 4 comments · Fixed by #55, #63 or #79
Closed

Payload and Versioning #40

xibz opened this issue Jun 9, 2022 · 4 comments · Fixed by #55, #63 or #79
Assignees
Labels
roadmap Items on the roadmap spec Specification - Core CDEvents
Milestone

Comments

@xibz
Copy link
Contributor

xibz commented Jun 9, 2022

I attended cdCon this week, and had a lovely chat with @afrittoli and wanted to highlight a couple issues we discussed.

Format

Currently the spec makes payload a nested object, but it NEEDs to be a first class citizen. The reason for this is anyone adopting this spec would have to break all their customers using their service that may or may not be following the spec. Further, anything that uses plugins, grpc, or something similar would be completely broken by the current standard. This is a non-starter for anyone who ensures they are never any breaking changes to their customers, ie AWS, Google, Azure.

I propose three alternative approaches.

Headers

What the spec currently has is this

{
    "specversion" : "1.0",
    "type" : "com.github.pull_request.opened",
    "source" : "https://github.com/cloudevents/spec/pull",
    "subject" : "123",
    "id" : "A234-1234-1234",
    "time" : "2018-04-05T17:31:00Z",
    "comexampleextension1" : "value",
    "comexampleothervalue" : 5,
    "datacontenttype" : "text/xml",
    "data" : "<much wow=\"xml\"/>" // Notice the nesting of the data
}

So instead we'd have headers, X-CloudEvents-Id for example. This makes the payload unchanged, which is huge for any end user. Looking at logs as well as other things remains unchanged otherwise these changes may surprise a user and break a user especially if they are doing anything with those logs, like log parsing.

And now our example above looks like

POST /cgi-bin/process.cgi HTTP/1.1
User-Agent: Mozilla/4.0 (compatible; MSIE5.01; Windows NT)
Host: some-host
Content-Type: text/xml
Content-Length: length
Accept-Language: en-us
Accept-Encoding: gzip, deflate
Connection: Keep-Alive
X-CloudEvents-Id: A234-1234-1234
X-CloudEvents-Spec-Version: 1.0
X.CloudEvents-Type: com.github.pull_request.opened
X-CloudEvents-Source: https://github.com/cloudevents/spec/pull
X-CloudEvents-Subject: 123
X-CloudEvents-Time: 2018-04-05T17:31:00Z
X-CloudEvents-Metadata: "comexampleextension1=value; comexampleothervalue=5"

<much wow=\"xml\"/>

The X-CloudEvents-Metadata would act as any custom key values needed by the service. The header format of foo=bar; baz=qux is widely used in HTTP headers when multiple key values are needed within a header. See Strict-Transport-Security.

The major benefits with going with this approach is the payload is COMPLETELY untouched and thus making completely backwards compatible with any pre-existing service.

Payload and Metadata

Another approach is to make pre-existing payloads that customers have a first class citizen, as in making that untouched, and adding a field within the payload that would house all the event information.

<much_wow>
        <somefield>hello world!</somefield>
        <cloud_events_metadata>
                <type>com.github.pull_request.opened</type>
                <source>https://github.com/cloudevents/spec/pull</source>
                <subject>123</subject>
                <time>2018-04-05T17:31:00Z</time>
                <comexampleextension1>value</comexampleextension1>
                <comexampleothervalue>5</comexampleothervalue>
                <id>A234-1234-1234</id>
                <spec-version>1.0</spec-version>
        </cloud_events_metadata>
</much_wow>

Hybrid

In this approach we use headers as well as a new cloud events field. Things like X-CloudEvents-Id could be a header, and the customer exclusive data like comexampleextension1 could be in the payload field.

Versioning

I know none of this is currently in the spec/standard, but I definitely think it's worth discussing. Basically how do you version and support interoperability between all the possibilities?

For instance, some service A could be on version 1, and some service B could be on version 2. Are major versions always going to be backwards compatible, so sending requests between these services should always work?

Further, types have to have a version as well. The com.github.pull_request.opened type may have a completely different payload years later. That adds a whole layer of complexity... In addition, adding a new type to the spec would have to be a major version bump of the spec, since anyone using version 1.x and 1.y adds a new type, then 1.x will have no idea on how to deal with that.

How does major versioning look like with interoperability? Does EVERY service need to update to the new major version? How do we plan on handling that???

@afrittoli
Copy link
Contributor

afrittoli commented Jun 14, 2022

I attended cdCon this week, and had a lovely chat with @afrittoli and wanted to highlight a couple issues we discussed.

Thanks @xibz - this is really useful feedback!

Format

Currently the spec makes payload a nested object, but it NEEDs to be a first class citizen. The reason for this is anyone adopting this spec would have to break all their customers using their service that may or may not be following the spec. Further, anything that uses plugins, grpc, or something similar would be completely broken by the current standard. This is a non-starter for anyone who ensures they are never any breaking changes to their customers, ie AWS, Google, Azure.

I propose three alternative approaches.

Headers

What the spec currently has is this

{
    "specversion" : "1.0",
    "type" : "com.github.pull_request.opened",
    "source" : "https://github.com/cloudevents/spec/pull",
    "subject" : "123",
    "id" : "A234-1234-1234",
    "time" : "2018-04-05T17:31:00Z",
    "comexampleextension1" : "value",
    "comexampleothervalue" : 5,
    "datacontenttype" : "text/xml",
    "data" : "<much wow=\"xml\"/>" // Notice the nesting of the data
}

The CloudEvents spec, and consequently the CDEvents one, support multiple transport bindings, including HTTP, MQTT, NATS and kafka.
Messages in the example are usually represented a single JSON which includes headers and payload, but the actual content of an event on the wire will depend on the specific binding and mode selected.

The format you included in there matches the structured content mode of the HTTP binding in CloudEvents.

For CDEvents we don't have to prescriptive of the mode, but until now we have used binary content mode which matches with what you describe below, and is the default mode in the CloudEvents SDKs.

Further to this, the CDEvents binding to CloudEvents is being discussed in #35.
Some of the fields in CDEvents fit nicely in CloudEvents headers, and will be included there. The rest will go in the payload, and we've been discussing in #35 how to structure it.

POST /someresource HTTP/1.1
Host: cdevents.example.com
ce-specversion: 1.0
ce-type: dev.cdevents.taskrun.started
ce-time: 2022-06-14T03:56:24Z
ce-id: 1234-1234-1234
ce-source: /staging/tekton/
ce-subject: /namespace/taskrun-123

Content-Type: application/json; charset=utf-8
Content-Length: nnnn

{
   "meta" : {
      "version" : "draft",
      "timestamp" : "2022-06-14T03:50:24Z"
   },
   "subject": {
     "id": "/namespace/taskrun-123",
     "type: "taskrun",
     "content": {
       "task": "my-task",
       "URL": "/apis/tekton.dev/v1beta1/namespaces/default/taskruns/my-taskrun-123"
       "pipelinerun": {
         "id": "/somewherelse/pipelinerun-123",
          "source": "/staging/jenkins/"
       }
     }
   },
  "extensions": [{}], // documented extensions could go here
  "payload": // free form nested data goes here
}

So instead we'd have headers, X-CloudEvents-Id for example. This makes the payload unchanged, which is huge for any end user. Looking at logs as well as other things remains unchanged otherwise these changes may surprise a user and break a user especially if they are doing anything with those logs, like log parsing.

And now our example above looks like

POST /cgi-bin/process.cgi HTTP/1.1
User-Agent: Mozilla/4.0 (compatible; MSIE5.01; Windows NT)
Host: some-host
Content-Type: text/xml
Content-Length: length
Accept-Language: en-us
Accept-Encoding: gzip, deflate
Connection: Keep-Alive
X-CloudEvents-Id: A234-1234-1234
X-CloudEvents-Spec-Version: 1.0
X.CloudEvents-Type: com.github.pull_request.opened
X-CloudEvents-Source: https://github.com/cloudevents/spec/pull
X-CloudEvents-Subject: 123
X-CloudEvents-Time: 2018-04-05T17:31:00Z
X-CloudEvents-Metadata: "comexampleextension1=value; comexampleothervalue=5"

<much wow=\"xml\"/>

The X-CloudEvents-Metadata would act as any custom key values needed by the service. The header format of foo=bar; baz=qux is widely used in HTTP headers when multiple key values are needed within a header. See Strict-Transport-Security.

The major benefits with going with this approach is the payload is COMPLETELY untouched and thus making completely backwards compatible with any pre-existing service.

This is a very good point. Services that already send CloudEvents could send CDEvents to existing consumers and they would still be able to parse the message (until they are ready to take advantage of CDEvents).
Some examples are Keptn, Jenkins and Tekton: all already produce CloudEvents.

This approach matches what we implemented in the SDKs for now and what we used in the initial POC, with the only difference that we used multiple CloudEvents extensions, one for each piece of data from the CDEvents spec.

The disadvantages of this approach that I can see are:

  • human readability of the messages: HTTP headers would be harder to parse than a JSON payload
  • encoding and decoding: we may have data that does not fit well in an HTTP header so we would need to do things like URL encoding for the data
  • the amount of HTTP headers and their size. Most web-servers will limit to 4k or 8k. This is still a lot of space, so perhaps this wouldn't be an issue very soon

Payload and Metadata

Another approach is to make pre-existing payloads that customers have a first class citizen, as in making that untouched, and adding a field within the payload that would house all the event information.

        <wow: xmlns="much">xml</wow:>
        <cloud_events_metadata>
                <comexampleextension1>value</comexampleextension1>
                <comexampleothervalue>5</comexampleothervalue>
                <id>A234-1234-1234</id>
                <spec-version>1.0</spec-version>
                <type>com.github.pull_request.opened</type>
                <source>https://github.com/cloudevents/spec/pull</source>
                <subject>123</subject>
                <time>2018-04-05T17:31:00Z</time>
        </cloud_events_metadata>

I think this might be difficult. You would still need a root element, and that would have to be the one from the original payload, if we want backward compatibility. However, the CDEvents SDK would not be able to parse the root element of the payload anymore. One way around the issue could be to have an ce-cdevents-container header that specifies the name of the root element that contains CDEvents data.

Hybrid

In this approach we use headers as well as a new cloud events field. Things like X-CloudEvents-Id could be a header, and the customer exclusive data like comexampleextension1 could be in the payload field.

My proposal would be include in the SDKs a mode that allows keeping all CDEvents data in headers, but defaulting to a CDEvents payload which includes a payload key for extra data that producers may want. The format is use could be identified via the content/type header.
Producers that expect to have non-CDEvents consumers could use the headers-only approach, with some limitations that may come from the use of headers. The SDK on producing and receiving events would be able to parse both modes.

Versioning

I know none of this is currently in the spec/standard, but I definitely think it's worth discussing. Basically how do you version and support interoperability between all the possibilities?

For instance, some service A could be on version 1, and some service B could be on version 2. Are major versions always going to be backwards compatible, so sending requests between these services should always work?

Further, types have to have a version as well. The com.github.pull_request.opened type may have a completely different payload years later. That adds a whole layer of complexity... In addition, adding a new type to the spec would have to be a major version bump of the spec, since anyone using version 1.x and 1.y adds a new type, then 1.x will have no idea on how to deal with that.

How does major versioning look like with interoperability? Does EVERY service need to update to the new major version? How do we plan on handling that???

@afrittoli
Copy link
Contributor

@xibz About versioning, I totally agree we should find an answer for the questions you're posing. I would like to keep the discussion about that in a separate issue to avoid crossing threads of discussion. I'd be happy to copy that part in a new issue unless you'd prefer to do it yourself.

@afrittoli
Copy link
Contributor

@xibz is the implementation of customData sufficient to address this, or is there still some scope to be covered?

@afrittoli afrittoli self-assigned this Dec 13, 2022
@afrittoli afrittoli added this to the v0.2 milestone Dec 13, 2022
@afrittoli afrittoli added spec Specification - Core CDEvents roadmap Items on the roadmap labels Dec 13, 2022
@afrittoli
Copy link
Contributor

@xibz I will close this for now. Feel free to re-open or create a new issue if there is still something that needs to be addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
roadmap Items on the roadmap spec Specification - Core CDEvents
Projects
Status: Done
2 participants