Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kafka transport binding v2 #337

Merged
merged 2 commits into from Jun 27, 2019
Merged

Kafka transport binding v2 #337

merged 2 commits into from Jun 27, 2019

Conversation

@bluemonk3y
Copy link
Contributor

bluemonk3y commented Nov 1, 2018

Kafka transport binding for CloudEvents, similar to the HTTP binding and proposed NATS, MQTT, AMQP bindings.

Follow-on PR from #300

Signed-off-by: Neil Avery neil@confluent.io

@duglin duglin mentioned this pull request Nov 1, 2018
@duglin

This comment has been minimized.

Copy link
Collaborator

duglin commented Nov 1, 2018

CI error is ok since it'll be automagically fixed once merged. But the DCO issue is real.

kafka-transport-binding.md Outdated Show resolved Hide resolved
kafka-transport-binding.md Outdated Show resolved Hide resolved

The receiver of the event can distinguish between the two content modes by
inspecting the `cloudEvents_contentType` property of the Kafka message. If the
value is prefixed with the CloudEvents media type `application/cloudevents`, indicating the use of a known [event

This comment has been minimized.

Copy link
@duglin

duglin Nov 1, 2018

Collaborator

wrap at 80


#### 3.1.3. Metadata Headers

All [CloudEvents][CE] attributes with exception of `data` MUST be individually

This comment has been minimized.

Copy link
@duglin

duglin Nov 1, 2018

Collaborator

should we include extensions in this too?

This comment has been minimized.

Copy link
@bluemonk3y

bluemonk3y Nov 2, 2018

Author Contributor

yep

@bluemonk3y

This comment has been minimized.

Copy link
Contributor Author

bluemonk3y commented Nov 1, 2018

CI error is ok since it'll be automagically fixed once merged. But the DCO issue is real.

@duglin - yep I will fix that tomorrow

kafka-transport-binding.md Outdated Show resolved Hide resolved

##### 3.1.3.1 Property Names

Cloud Event attributes are prefixed with "cloudEvents_" for use in the

This comment has been minimized.

Copy link
@matzew

matzew Nov 7, 2018

Member

cloudEvents_ ? instead of "cloudEvents_" ?

This comment has been minimized.

Copy link
@matzew

matzew Nov 7, 2018

Member

I wonder if we should use some sort of fqn: io.cloudevents. to avoid potential clashes ?

This comment has been minimized.

Copy link
@bluemonk3y

bluemonk3y Nov 7, 2018

Author Contributor

We are using cloudEvents_ as it has been generally adopted by other transport bindings. I believe we also need to avoid '.' characters. We should be consistent across all bindings.
@duglin - your thoughts on this?

This comment has been minimized.

Copy link
@duglin

duglin Nov 7, 2018

Collaborator

I like consistency, but I think we need to be consistent with that users of the transport expect. I did a quick search and came across this: https://streamsets.com/documentation/controlhub/3.5.0/help/pdesigner/datacollector/UserGuide/Origins/KConsumer.html
and it talks about things like ssl.truststore.location and com.sun.security.auth.module.Krb5LoginModule. So Kafka does seem to be ok with '.'.
I don't know enough about Kafka to know if FQNs would be more "expected" or not, I'll leave that to others - I'd be ok with either.

This comment has been minimized.

Copy link
@bluemonk3y

bluemonk3y Nov 8, 2018

Author Contributor

@clemensv - what is your opinion on this (above)? Do we adopt language packaging/namespace semantics like io.cloudevents or cloudEvents_ It does raise the question about event propagation between transports.

This comment has been minimized.

Copy link
@gwenshap

gwenshap Nov 20, 2018

Just for the record, Apache Kafka never managed to decide on whether "." or "_" are preferred as separators. Configuration parameters are always "." separated (as the Streamset doc shows), but topics are often namespaced with "_" (for example, the internal "__consumer_offsets" topic). I've definitely seen both used in "the wild" (a fact that was sometimes inconvenient as the project had to deal with converting topic names to metric names...)

Either separator will be acceptable by the Apache Kafka community (or rather, either will be objected to, by different sets of people).

(duglin: I edited the comment to escape the underscore since it wasn't showing up and was being treated as the "turn on italics" symbol)


* `eventTime` maps to `cloudEvents_eventTime`
* `eventID` maps to `cloudEvents_eventID`
* `cloudEventsVersion` maps to `cloudEvents_cloudEventsVersion`

This comment has been minimized.

Copy link
@matzew

matzew Nov 7, 2018

Member

+1 on the general concept of this.

Our group has been doing exactly the same:
https://github.com/rh-event-flow/jcloudevents/blob/master/kafka/src/main/java/io/streamzi/cloudevents/kafka/util/KafkaHeaderUtil.java#L42

Ok, biggest diff is ours is on internal RecordHeader(s) API, and no prefix.

Perhaps it should be based on the Header(s) interface API

This comment has been minimized.

Copy link
@duglin

duglin Jun 13, 2019

Collaborator

s/cloudEventsVersion/specversion/

kafka-transport-binding.md Outdated Show resolved Hide resolved
kafka-transport-binding.md Outdated Show resolved Hide resolved
@matzew matzew mentioned this pull request Nov 7, 2018
@matzew

This comment has been minimized.

Copy link
Member

matzew commented Nov 7, 2018

@bluemonk3y we have some POC code for this. and it would be nice to get this into the Cloudevents-sdk for java.

See my proposal here:
cloudevents/sdk-java#5

looking forward working with you on that

------------------- key ----------------------
Key: mykey

This comment has been minimized.

Copy link
@gunnarmorling

gunnarmorling Nov 7, 2018

Contributor

The Kafka message key is where things are still a bit blurry to me. How will its value be obtained from the CloudEvents event? Have you foreseen some extractor function or similar?

This comment has been minimized.

Copy link
@gunnarmorling

gunnarmorling Nov 8, 2018

Contributor

Thinking more about that one, perhaps a header key should be defined whose value, if present, will be propagated as key of the Kafka message, e.g. cloudEvents_messageKey. Alternatively, the binding, wherever it's running, could be configured with some kind of path expression which gets applied to the data value to extract the value, e.g. /customer/id.

This comment has been minimized.

Copy link
@bluemonk3y

bluemonk3y Nov 8, 2018

Author Contributor

Yep, there is a PR for this: #218

This comment has been minimized.

Copy link
@gunnarmorling

gunnarmorling Nov 8, 2018

Contributor

Ah yes, thanks. I've commented over there, too. Seems that'd need resolution first.

This comment has been minimized.

Copy link
@clemensv

clemensv Nov 8, 2018

Contributor

The conflict we have here is that the partition key is a transport specific concern (and the requirements may vary across different transport options) and that the publisher ought not to care about what transport the event route gets bound to. We may also have the situation that an event gets first published via MQTT from a little device and then gets put on Kafka by a device gateway.

I think this binding needs to define a rule by which a key is constructed from the event rather than expecting that the event brings it along.

This comment has been minimized.

Copy link
@AndrewJSchofield

AndrewJSchofield Nov 19, 2018

It seems to me that the natural way to get a key for a Kafka message is to use the source.

This comment has been minimized.

Copy link
@gwenshap

gwenshap Nov 20, 2018

I'm not sure if source is the right place - I'd expect all events produced by same application to have identical source, why partitioning typically applies to events produced by same source.

I like the proposal from @clemensv in #218 :
" instead of putting the burden of producing an event key on the client and then only having one, any transport that requires particular constructs such as keys such define a mechanism by which you can harvest/synthesize a key from an incoming CloudEvent as some sort of transform. The spec doesn't need to prescribe how -- it just needs to say that that's how the key materializes. A transform that just cooks up a random key might also be valid if that's what you want."

KafkaConnect already has "key extraction" transformation, exactly because external records require mapping to Kafka keys, and the logic for doing so varies between use-cases.

This comment has been minimized.

Copy link
@duglin

duglin Nov 20, 2018

Collaborator

@gwenshap @clemensv that would mean that the kafka transport spec would leave the determination of the "key" as an exercise for the implementer (or admin configuring the transport) ?

This comment has been minimized.

Copy link
@gwenshap

gwenshap Nov 20, 2018

I would hope a reasonable implementation of transport would allow plugging in the "key selection" logic, since we can't know in advance what it will be. (Although I'm still quite new to CloudEvents, so take my opinions with a bit of salt)

This comment has been minimized.

Copy link
@gunnarmorling

gunnarmorling Mar 27, 2019

Contributor

Just added a comment on this over at #218. The idea being that instead of having fully pluggable logic for retrieving the key instead there could be a mechanism to just select the key from a given event field when sending a cloud event to a Kafka Topic.

@clemensv

This comment has been minimized.

Copy link
Contributor

clemensv commented Nov 8, 2018

The header prefixes might be a bit generous. I think we'll end up collapsing them to "ce" across the board.

@matzew

This comment has been minimized.

Copy link
Member

matzew commented Dec 12, 2018

I think we'll end up collapsing them to "ce" across the board.

+1

kafka-transport-binding.md Outdated Show resolved Hide resolved
@JemDay

This comment has been minimized.

Copy link
Contributor

JemDay commented Mar 19, 2019

It seems this PR is a bit behind the 0.2 spec version ...

@duglin

This comment has been minimized.

Copy link
Collaborator

duglin commented May 10, 2019

ping @hschaffner - please review and add any comments w.r.t. why the partitionkey extension should not be needed.

ping @clemensv - please add comments about the call-back mechanism you mentioned in PR #429

@bluemonk3y now that #429 is merged, can you rebase this one to deal with the merge conflicts? Also, you may need to edit some of the text to deal with recent spec changes - such as the names of the attributes being changed (or lower-cased).

thanks everyone for your patience.

@bluemonk3y

This comment has been minimized.

Copy link
Contributor Author

bluemonk3y commented May 10, 2019

@duglin - yep as soon as I get a chance.

@duglin

This comment has been minimized.

Copy link
Collaborator

duglin commented May 28, 2019

@bluemonk3y sorry to pester :-) but any chance of getting an update of this one? I get the sense we're nearing a pretty big milestone and this is one of the bigger outstanding items for us.

@bluemonk3y

This comment has been minimized.

Copy link
Contributor Author

bluemonk3y commented May 29, 2019

@duglin - sure, I'm currently away but will pick it up on Monday.

@gunnarmorling

This comment has been minimized.

Copy link
Contributor

gunnarmorling commented May 29, 2019

now that #429 is merged

@duglin, I reckon you mean #218?

Could you (or someone else) perhaps give a summary of the latest state of the discussion? The closed PR #218 adds the partitionkey attribute, but it actually might be removed again as per #430?

What @clemensv describes in #218 (comment) makes sense to me: make this a concern of the transport binding which would allow for a callback function for retrieving the partition/message key on an per-event basis. The spec-defined partitionkey attribute would still remain in place as basis for the default behavior in case no custom callback has been configured for the transport and a value can be defined by the producer.

One open question to me is how the callback would be defined. What exactly is it? E.g. a (Java) class implementing a certain interface for extracting the key based on an incoming event? Or would at the spec level just be defined that there is a callback mechanism but it'd then be up to specific implementations of this to define the concrete mechanism?

Thanks!

@duglin duglin added the needs work label May 30, 2019
@duglin

This comment has been minimized.

Copy link
Collaborator

duglin commented Jun 4, 2019

@bluemonk3y the travis errors seem real. Also, can you sign your commits?

@duglin duglin added the try-for-v1.0 label Jun 8, 2019
@duglin

This comment has been minimized.

Copy link
Collaborator

duglin commented Jun 10, 2019

Even though I didn't tag this as "v1.0" (it's "try-for-v1.0"), it's been lingering for a while so it'll be first on the PR review list this week - please look it over when you get a chance.


This example shows the *binary* mode mapping of an event into the
Kafka message. All other CloudEvents attributes
are mapped to Kafka Header fields with prefix `cloudEvents_`.

This comment has been minimized.

Copy link
@clemensv

clemensv Jun 13, 2019

Contributor

ce_ might just be sufficient.

This comment has been minimized.

Copy link
@fabiojose

fabiojose Jun 20, 2019

Contributor

LGTM


Examples:

* `eventTime` maps to `cloudEvents_eventTime`

This comment has been minimized.

Copy link
@clemensv

clemensv Jun 13, 2019

Contributor

needs update

This comment has been minimized.

Copy link
@duglin

duglin Jun 13, 2019

Collaborator

s/eventTime/time/

Example for the [JSON format][JSON-format]:

``` text
content-type: application/cloudevents+json; charset=UTF-8

This comment has been minimized.

Copy link
@clemensv

clemensv Jun 13, 2019

Contributor

This shows no prefix

This comment has been minimized.

Copy link
@duglin

duglin Jun 19, 2019

Collaborator

Is this missing a ce_ or is the sentence in the first paragraph of this section incorrect?

This comment has been minimized.

Copy link
@duglin

duglin Jun 19, 2019

Collaborator

I think the intro paragraph is wrong

placed into the Kafka message value section
using an [event format](#14-event-formats).

In the *binary* content mode, the value of the event `data` attribute MUST be

This comment has been minimized.

Copy link
@duglin

duglin Jun 13, 2019

Collaborator

would it make sense to move this paragraph up so it's next to the one on line 52, that also talks about "binary" mode?

Examples:

* `eventTime` maps to `cloudEvents_eventTime`
* `eventID` maps to `cloudEvents_eventID`

This comment has been minimized.

Copy link
@duglin

duglin Jun 13, 2019

Collaborator

s/eventID/id/

@duglin

This comment has been minimized.

Copy link
Collaborator

duglin commented Jun 13, 2019

@hschaffner did you have any comments on this one? In particular around the key stuff

@duglin

This comment has been minimized.

Copy link
Collaborator

duglin commented Jun 17, 2019

@bluemonk3y rebase needed - and there are some outstanding comments

kafka-transport-binding.md Outdated Show resolved Hide resolved
cloudEvents_eventTime: "2018-04-05T03:56:24Z"
cloudEvents_eventID: "1234-1234-1234"
cloudEvents_source: "/mycontext/subcontext"
ce_specVersion: "0.1"

This comment has been minimized.

Copy link
@duglin

duglin Jun 17, 2019

Collaborator

s/specVersion/specversion/

This comment has been minimized.

Copy link
@duglin

duglin Jun 17, 2019

Collaborator

and "0.4-wip"

README.md Outdated Show resolved Hide resolved
json-format.md Outdated Show resolved Hide resolved
kafka-transport-binding.md Outdated Show resolved Hide resolved
------------------- value --------------------
{
"specVersion" : "0.1",

This comment has been minimized.

Copy link
@duglin

duglin Jun 17, 2019

Collaborator

s/specVersion/specversion/
s/0.1/0.4-wip/

This comment has been minimized.

Copy link
@bluemonk3y

bluemonk3y Jun 19, 2019

Author Contributor

I believe I fixed the wonky rebase!

@duglin

This comment has been minimized.

Copy link
Collaborator

duglin commented Jun 17, 2019

I think the rebase got a little wonky

Kafka transport binding for CloudEvents, similar to the HTTP binding and proposed NATS, MQTT, AMQP bindings.

Signed-off-by: Neil Avery <neil@confluent.io>
### 2.1. data Attribute

The `data` attribute is assumed to contain opaque application data that is
encoded as declared by the `contentType` attribute.

This comment has been minimized.

Copy link
@duglin

duglin Jun 19, 2019

Collaborator

contentType or "datacontenttype" ?

This comment has been minimized.

Copy link
@duglin

duglin Jun 19, 2019

Collaborator

hmm I think I might be wrong, but I wanted to verify.

here.

The receiver of the event can distinguish between the two content modes by
inspecting the `ce_contentType` [Header][Kafka-Message-Header] of the

This comment has been minimized.

Copy link
@duglin

duglin Jun 19, 2019

Collaborator

s/ce_contentType/ce_datacontenttype/ I think (but it's the "T" that jumped out at me)

kafka-transport-binding.md Outdated Show resolved Hide resolved

#### 3.3.1. Kafka Content-Type

The [Kafka `ce_contentType`] property field MUST be set to the media

This comment has been minimized.

Copy link
@duglin

duglin Jun 19, 2019

Collaborator

"T" -> "t"

This comment has been minimized.

Copy link
@duglin

duglin Jun 19, 2019

Collaborator

Actually, I think the "ce_" here is incorrect since this isn't a CE property. Right?

This comment has been minimized.

Copy link
@bluemonk3y

bluemonk3y Jun 21, 2019

Author Contributor

agree

------------------ headers -------------------
ce_contentType: application/cloudevents+json; charset=UTF-8

This comment has been minimized.

Copy link
@duglin

duglin Jun 19, 2019

Collaborator

I thin the "ce_" here is incorrect.

This comment has been minimized.

Copy link
@bluemonk3y

bluemonk3y Jun 21, 2019

Author Contributor

agree - removed

@@ -1,4 +1,4 @@
# CloudEvent Specs for Proprietary Protocols and Encodings
### CloudEvent Specs for Proprietary Protocols and Encodings

This comment has been minimized.

Copy link
@duglin

duglin Jun 19, 2019

Collaborator

I think this change is invalid

This comment has been minimized.

Copy link
@bluemonk3y

bluemonk3y Jun 21, 2019

Author Contributor

reverted

spec.md Outdated
@@ -58,6 +58,14 @@ that does not support that feature will then silently ignore that part of the
message. The sender needs to be prepared for the situation where a receiver
ignores that feature.

For clarity, when a feature is marked as "OPTIONAL" this means that it is

This comment has been minimized.

Copy link
@duglin

duglin Jun 19, 2019

Collaborator

I think the edits in this doc are invalid

This comment has been minimized.

Copy link
@bluemonk3y

bluemonk3y Jun 21, 2019

Author Contributor

@duglin - I believe I have applied the changes correctly ;) - would you mind reviewing?

Kafka transport binding for CloudEvents, similar to the HTTP binding and proposed NATS, MQTT, AMQP bindings.

Signed-off-by: Neil Avery <neil@confluent.io>
@duglin

This comment has been minimized.

Copy link
Collaborator

duglin commented Jun 21, 2019

@bluemonk3y looks good to me.

Can I get one more LGTM and then I'll merge!!

@duglin

This comment has been minimized.

Copy link
Collaborator

duglin commented Jun 27, 2019

Approved on the 6/20 call with the minor rebase fixes

@duglin

This comment has been minimized.

Copy link
Collaborator

duglin commented Jun 27, 2019

Approved, again :-) , on the 6/28 call - but this time with the rebase fixes

@duglin duglin merged commit 7c3bdb1 into cloudevents:master Jun 27, 2019
1 of 2 checks passed
1 of 2 checks passed
continuous-integration/travis-ci/pr The Travis CI build failed
Details
DCO DCO
Details
@duglin

This comment has been minimized.

Copy link
Collaborator

duglin commented Jun 27, 2019

@bluemonk3y et al... thanks a ton for your patience and hard work on this one!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

10 participants
You can’t perform that action at this time.