-
Notifications
You must be signed in to change notification settings - Fork 577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Renaming/broadening "source" to "topic", consolidating source-id/source-type into "subject" #95
Conversation
spec.md
Outdated
* Constraints: | ||
* REQUIRED | ||
* MUST be a non-empty string | ||
* OPTIONAL. The subject may be self-evident from the "source" context. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to put this explanatory text in the Description field.
But either way: s/may/might/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll move it.
spec.md
Outdated
@@ -161,9 +161,11 @@ that contains both context and data). | |||
system might identify the CRM system as the "source", might further qualify the | |||
event as 'new-customer-added' in the "event-type" relative to its "namespace", | |||
and then further qualify the subject of the event (the new record) with the content | |||
of this field. The subject is a free-form string defined by the publisher. | |||
of this property. The subject is a free-form string defined by the publisher. The | |||
property is optional, because the subject might already be self-evident from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps: The property might not be present, because the subject...
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit weird to tap dance around OPTIONAL here. Shall I uppercase or move it back?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the bigger concern, at least to me, is saying its OPTIONAL twice - once here and once below. Duplicating normative statements isn't good, its opening the possibility of someone updating one of them but not the other - then we're inconsistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have a constraints section, we should also allow rationalization of the constraint in that section, IMO. I'm moving it back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't necessary object to that - I'm just trying to avoid two sections that give lots of explanatory text and people not knowing what text goes where - so they end up duplicating thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, on a side note, this is partially why I tend to prefer to not have a "constraints" section because it forces this kind of split. I'd prefer just a set of sentences with the RFC keywords in the appropriate spots and they can live next to the explanatory text. But I didn't want to drastically change the original format of the spec when we first pulled it in. But if people are open to the change I can take a pass at a PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping @ac360 - any thoughts on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Providing descriptions without mentioning the constraints will in many cases be hard. @duglin I agree with you that a split of the text is hard to achieve. On the other hand I see some value in a restrictions section that only lists the restrictions without providing additional text. An implementor of the spec who just wants to quickly check the restrictions of an attribute, might benefit from that. Using the same normative statements in the description would be even better than paraphrasing them, because it would help maintaining consistency between description and constraints.
I'm sorry, I don't really understand what's going on here. It looks like this diff
Lastly, I don't understand what sort of data would go in this field, or what the it means to be called I'm having trouble forming any opinion about this PR given my inability to figure out what's going on. (I see the chatter in the associated issue, which unfortunately seems to echo my confusion rather than help clear it up.) |
I rename source-id. Here's the scenario: The emitter of the event, the "source" is some object that you subscribe on. Let's say that's a storage service container/bucket. You are interested when a new object is created in that container/bucket.
To do any generic pub/sub filtering on that event with pattern matching or startsWith/endsWith in generic infra that doesn't want to reason about the details of the event, you now still need a hint here what that event is actually about, i.e. which blob was created here. Thus:
|
@clemensv your usecase makes me wonder whether the source (or "entity that generated the event") should be the bucket or the newly created resource? Based on how I look at it I can see an argument for either one. My concern is that we now have, sort of, two "sources": 1) the entity to which you subscribed to for the event stream, and 2) the entity that did something that caused the event, which may or may not be the same as no. 1 - but I suspect in some (many?) cases will be. Which means we then have to have a very clear set of rules so people know when to use one vs the other since they might appear to overlap. Not trying to cram too much into the 'source', but couldn't this information be included within the "source" value? Perhaps as the tail part of the URL path or query parameter? You can still do your regex filtering that way I believe. |
You will always have a a notion of a thing that emits events and matters that are going on inside of that thing that the events are about. Same as with email. Or github issues. I am the source, identified by my SMTP URI, and the subject line is what I want to tell you today. Since we say the "source" SHOULD be a URI, but we're not mandating it (and I do think there are good reasons not to), having a separate descriptor for the subject makes sense, and there's plenty of prior art to look at in messaging for that split. |
I'm amending this proposal by renaming "source" to "topic" and proving a broader definition that allows for both a concrete concept of originating source and also a more abstract notion of classification that isn't bound to a concrete origin as a primary consideration. Origin/provenance considerations can be enormously complex in some verticals and need to be resolved by processing of the event payload. The rationale for this proposal of consolidating the route/dispatch metadata to |
* Resource path, relative URI reference: | ||
/tenant/group/type/myresource | ||
* Machine component, relative URI reference: | ||
/robot/drives/3/temperature |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Request: add clarification for URI segments, like /namespaces/{namespaceId}/buckets/{bucketId}/objects or /robots/{robotId}/drives/{driveId}/temperature
The following was motivated specifically by some of the discussion on the WG call today... To what extent is this spec prescribing behavior for the middleware? It was my understanding that the primary goal was to describe a common format for events without making assumptions about the way those attributes would be interpreted, since event brokers don't all behave the same way, even when they use common terminology. For example, while there's common usage of "topics" across brokers, not all brokers support hierarchical and wildcard subscriptions. Some only support subscribing at the granularity of the literal topic name. In the latter case, it can be a common pattern to create derived streams from a more general upstream topic. The limitations of naming topics on a specific broker won't always be flexible enough to support the range of syntax used in these examples, such as URI (Kafka for example). So it could lead to confusion between the logical and "physical" topic. Also, it's not typically a requirement to include the topic name in the event itself. A consumer subscribes to the topic on the broker, often that information is provided via declarative configuration within the context of the consuming application. Consumers might subscribe to multiple topics, and producers might even send the same event to multiple topics. |
@markfisher I agree that the term topic is misleading. In my opinion it belongs to the domain of messaging and not to events. I was actually happy with the term source. Unfortunately I wasn't able to attend the calls this week. So I don't know what concerns people have regarding source. What about origin? |
From my perspective, the whole purpose of this spec is to separate source/producer and destination/action/consumer from transport/middleware. From my perspective, this proposal completely changes what we're doing here. |
Source is clear to me, Topic is not clear, and perhaps even misleading. Not sure I understand the thought process. |
@notque @ultrasaurus @deissnerk @markfisher "Topic" is a concept that is broader than and a full superset of "source". It often reflects the originating context, but also works in cases where any party other than the originator needs to publish an event on behalf of the same context. We even have a fresh example of that in the discussion of the usage scenarios The abstract topic concept also happens to map straight to numerous existing middleware platforms and also maps right onto RESTful resource graphs, i.e. onto HTTP. From the mailing list, I'm also putting the following on the record: Topics and Subjects Presentation Doc Add supporting evidence for how broadly the topic concept is supported across the industry and with existing standards and platforms that many users of this specification will most certainly want to leverage, consider this incomplete list of examples: HTTP WebHook specs:
Messaging Standards: IoT Platforms: Cloud Event Routers (Push) Cloud Event Ingestion Platforms Cloud and On-Prem PubSub brokers:
It's impossible to argue that the "topic" concept and terminology choice isn't grounded in an existing industry consensus. For the few deviations from the word "Topic" in the list above, there are actually good reasons. RabbitMQ uses "exchange" and ActiveMQ uses "address" for their top-level concept to avoid a name clash with the JMS topic that they also both implement on top of their base construct. Event Hubs doesn't use "Topic" because we have the same name already taken by Service Bus as a top level concept. "Streams" in MQ disambiguates from native topics. |
I don't think anyone is arguing that. In fact, my comments were motivated by the fact that topics are ubiquitous but unable to map one-to-one in a consistent way to either the syntax or semantics that you are suggesting. So it would inevitably lead to situations where the "topic" header value does not match the actual topic name where the event has been published. It's not common to include the topic name in messages in the first place. I'd even consider it an anti-pattern, since it's possible to have a many-to-many relationship between events and topics. |
@ultrasaurus - regarding this comment We have a shared definition of the goals. In those goals, we specifically call out that "... producer and consumer can be developed and deployed independently. A producer can generate events before a consumer is listening, and a consumer can express an interest in an event or class of events that is not yet being produced." That level of decoupling is typically achieved by introducing "middleware" intermediaries between producer and consumer. The minimal intermediary typically used for this is the dispatch logic of an HTTP request framework that maps paths to method invocations. NodeJS "Express", ASP.NET, and Apache Struts all fit the middleware definition of the usage scenario PR. Their "topic" is the request target as defined in HTTP; the request URI is generally a reference to a graph, most simply into the file system. The abstract nature of "topic" indeed strongly echoes that of a resource in REST, and middleware is a more generalized notion than caches, proxies and gateways that Fielding enumerates in his seminal dissertation. Please explain how my proposal concretely contradicts the existing shared definition of goals. |
Closing in support of #123 |
See issue #94
Restated "source-id" as "subject", which now more clearly serving to qualify the event in relation to the source, which is missing as an explicit concept. I also dropped the "source-type" in this proposal as we already have namespace/event-type as event qualifier and namespace/source/subject as context qualifiers.