Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Renaming/broadening "source" to "topic", consolidating source-id/source-type into "subject" #95

Closed
wants to merge 4 commits into from

Conversation

clemensv
Copy link
Contributor

See issue #94

Restated "source-id" as "subject", which now more clearly serving to qualify the event in relation to the source, which is missing as an explicit concept. I also dropped the "source-type" in this proposal as we already have namespace/event-type as event qualifier and namespace/source/subject as context qualifiers.

spec.md Outdated
* Constraints:
* REQUIRED
* MUST be a non-empty string
* OPTIONAL. The subject may be self-evident from the "source" context.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to put this explanatory text in the Description field.
But either way: s/may/might/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll move it.

spec.md Outdated
@@ -161,9 +161,11 @@ that contains both context and data).
system might identify the CRM system as the "source", might further qualify the
event as 'new-customer-added' in the "event-type" relative to its "namespace",
and then further qualify the subject of the event (the new record) with the content
of this field. The subject is a free-form string defined by the publisher.
of this property. The subject is a free-form string defined by the publisher. The
property is optional, because the subject might already be self-evident from
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps: The property might not be present, because the subject...
?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit weird to tap dance around OPTIONAL here. Shall I uppercase or move it back?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the bigger concern, at least to me, is saying its OPTIONAL twice - once here and once below. Duplicating normative statements isn't good, its opening the possibility of someone updating one of them but not the other - then we're inconsistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have a constraints section, we should also allow rationalization of the constraint in that section, IMO. I'm moving it back.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't necessary object to that - I'm just trying to avoid two sections that give lots of explanatory text and people not knowing what text goes where - so they end up duplicating thoughts.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, on a side note, this is partially why I tend to prefer to not have a "constraints" section because it forces this kind of split. I'd prefer just a set of sentences with the RFC keywords in the appropriate spots and they can live next to the explanatory text. But I didn't want to drastically change the original format of the spec when we first pulled it in. But if people are open to the change I can take a pass at a PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @ac360 - any thoughts on this?

Copy link
Contributor

@deissnerk deissnerk Feb 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Providing descriptions without mentioning the constraints will in many cases be hard. @duglin I agree with you that a split of the text is hard to achieve. On the other hand I see some value in a restrictions section that only lists the restrictions without providing additional text. An implementor of the spec who just wants to quickly check the restrictions of an attribute, might benefit from that. Using the same normative statements in the description would be even better than paraphrasing them, because it would help maintaining consistency between description and constraints.

@maplebed
Copy link

maplebed commented Feb 28, 2018

I'm sorry, I don't really understand what's going on here. It looks like this diff

  • removes source-type
  • removes source-id
  • adds subject
    Could those be separate diffs, or at least separate the removal of source-type and source-id from the addition of subject?

Lastly, I don't understand what sort of data would go in this field, or what the it means to be called subject. Is this trying to evoke images of the Subject line in an email?

I'm having trouble forming any opinion about this PR given my inability to figure out what's going on. (I see the chatter in the associated issue, which unfortunately seems to echo my confusion rather than help clear it up.)

@clemensv
Copy link
Contributor Author

clemensv commented Mar 1, 2018

I rename source-id.

Here's the scenario:

The emitter of the event, the "source" is some object that you subscribe on. Let's say that's a storage service container/bucket. You are interested when a new object is created in that container/bucket.

namespace: azure.com
source: /subscriptions/{id}/providers/Microsoft.Storage/storageAccounts/myStorageAccount
eventType: Microsoft.Storage.BlobCreated

To do any generic pub/sub filtering on that event with pattern matching or startsWith/endsWith in generic infra that doesn't want to reason about the details of the event, you now still need a hint here what that event is actually about, i.e. which blob was created here. Thus:

subject: /blobServices/default/containers/testcontainer/blobs/testfile.txt 

@duglin
Copy link
Collaborator

duglin commented Mar 1, 2018

@clemensv your usecase makes me wonder whether the source (or "entity that generated the event") should be the bucket or the newly created resource? Based on how I look at it I can see an argument for either one. My concern is that we now have, sort of, two "sources": 1) the entity to which you subscribed to for the event stream, and 2) the entity that did something that caused the event, which may or may not be the same as no. 1 - but I suspect in some (many?) cases will be. Which means we then have to have a very clear set of rules so people know when to use one vs the other since they might appear to overlap.

Not trying to cram too much into the 'source', but couldn't this information be included within the "source" value? Perhaps as the tail part of the URL path or query parameter? You can still do your regex filtering that way I believe.

@clemensv
Copy link
Contributor Author

clemensv commented Mar 1, 2018

You will always have a a notion of a thing that emits events and matters that are going on inside of that thing that the events are about. Same as with email. Or github issues. I am the source, identified by my SMTP URI, and the subject line is what I want to tell you today.

Since we say the "source" SHOULD be a URI, but we're not mandating it (and I do think there are good reasons not to), having a separate descriptor for the subject makes sense, and there's plenty of prior art to look at in messaging for that split.

@clemensv clemensv mentioned this pull request Mar 1, 2018
@clemensv clemensv changed the title Renaming source-id to subject, dropping source-type Renaming/broadening "source" to "topic", consolidating source-id/source-type into "subject" Mar 7, 2018
@clemensv
Copy link
Contributor Author

clemensv commented Mar 7, 2018

I'm amending this proposal by renaming "source" to "topic" and proving a broader definition that allows for both a concrete concept of originating source and also a more abstract notion of classification that isn't bound to a concrete origin as a primary consideration. Origin/provenance considerations can be enormously complex in some verticals and need to be resolved by processing of the event payload.

The rationale for this proposal of consolidating the route/dispatch metadata to topic, subject, and eventType, including links to prior art and current practice across a range of platforms and infrastructures is laid out in detail in this issue: #112

@clemensv clemensv mentioned this pull request Mar 21, 2018
* Resource path, relative URI reference:
/tenant/group/type/myresource
* Machine component, relative URI reference:
/robot/drives/3/temperature
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Request: add clarification for URI segments, like /namespaces/{namespaceId}/buckets/{bucketId}/objects or /robots/{robotId}/drives/{driveId}/temperature

@markfisher
Copy link

The following was motivated specifically by some of the discussion on the WG call today...

To what extent is this spec prescribing behavior for the middleware? It was my understanding that the primary goal was to describe a common format for events without making assumptions about the way those attributes would be interpreted, since event brokers don't all behave the same way, even when they use common terminology.

For example, while there's common usage of "topics" across brokers, not all brokers support hierarchical and wildcard subscriptions. Some only support subscribing at the granularity of the literal topic name. In the latter case, it can be a common pattern to create derived streams from a more general upstream topic.

The limitations of naming topics on a specific broker won't always be flexible enough to support the range of syntax used in these examples, such as URI (Kafka for example). So it could lead to confusion between the logical and "physical" topic. Also, it's not typically a requirement to include the topic name in the event itself. A consumer subscribes to the topic on the broker, often that information is provided via declarative configuration within the context of the consuming application. Consumers might subscribe to multiple topics, and producers might even send the same event to multiple topics.

@deissnerk
Copy link
Contributor

@markfisher I agree that the term topic is misleading. In my opinion it belongs to the domain of messaging and not to events. I was actually happy with the term source. Unfortunately I wasn't able to attend the calls this week. So I don't know what concerns people have regarding source. What about origin?
With the terms topic and subject I don't see any attribute left that is specific for an event instead of a message. Probably this is what Vaughn Vernon meant when he addressed the group on the mailing list a few weeks ago. Maybe it is a good preparation for tomorrow's call to think of reasons why the spec is called CloudEvents and not CloudMessages.

@ultrasaurus
Copy link
Contributor

From my perspective, the whole purpose of this spec is to separate source/producer and destination/action/consumer from transport/middleware. From my perspective, this proposal completely changes what we're doing here.

@notque
Copy link
Contributor

notque commented Mar 22, 2018

Source is clear to me, Topic is not clear, and perhaps even misleading. Not sure I understand the thought process.

@clemensv
Copy link
Contributor Author

clemensv commented Mar 23, 2018

@notque @ultrasaurus @deissnerk @markfisher

"Topic" is a concept that is broader than and a full superset of "source". It often reflects the originating context, but also works in cases where any party other than the originator needs to publish an event on behalf of the same context. We even have a fresh example of that in the discussion of the usage scenarios

The abstract topic concept also happens to map straight to numerous existing middleware platforms and also maps right onto RESTful resource graphs, i.e. onto HTTP.

From the mailing list, I'm also putting the following on the record:

Topics and Subjects Presentation Doc
Topics and Subjects Presentation Screencast

Add supporting evidence for how broadly the topic concept is supported across the industry and with existing standards and platforms that many users of this specification will most certainly want to leverage, consider this incomplete list of examples:

HTTP WebHook specs:

Messaging Standards:

IoT Platforms:

Cloud Event Routers (Push)

Cloud Event Ingestion Platforms

Cloud and On-Prem PubSub brokers:

It's impossible to argue that the "topic" concept and terminology choice isn't grounded in an existing industry consensus. For the few deviations from the word "Topic" in the list above, there are actually good reasons. RabbitMQ uses "exchange" and ActiveMQ uses "address" for their top-level concept to avoid a name clash with the JMS topic that they also both implement on top of their base construct. Event Hubs doesn't use "Topic" because we have the same name already taken by Service Bus as a top level concept. "Streams" in MQ disambiguates from native topics.

@markfisher
Copy link

It's impossible to argue that the "topic" concept and terminology choice isn't grounded in an existing industry consensus

I don't think anyone is arguing that. In fact, my comments were motivated by the fact that topics are ubiquitous but unable to map one-to-one in a consistent way to either the syntax or semantics that you are suggesting. So it would inevitably lead to situations where the "topic" header value does not match the actual topic name where the event has been published. It's not common to include the topic name in messages in the first place. I'd even consider it an anti-pattern, since it's possible to have a many-to-many relationship between events and topics.

@clemensv
Copy link
Contributor Author

@ultrasaurus - regarding this comment

We have a shared definition of the goals.

In those goals, we specifically call out that "... producer and consumer can be developed and deployed independently. A producer can generate events before a consumer is listening, and a consumer can express an interest in an event or class of events that is not yet being produced."

That level of decoupling is typically achieved by introducing "middleware" intermediaries between producer and consumer. The minimal intermediary typically used for this is the dispatch logic of an HTTP request framework that maps paths to method invocations.

NodeJS "Express", ASP.NET, and Apache Struts all fit the middleware definition of the usage scenario PR.

Their "topic" is the request target as defined in HTTP; the request URI is generally a reference to a graph, most simply into the file system.

The abstract nature of "topic" indeed strongly echoes that of a resource in REST, and middleware is a more generalized notion than caches, proxies and gateways that Fielding enumerates in his seminal dissertation.

Please explain how my proposal concretely contradicts the existing shared definition of goals.

@clemensv
Copy link
Contributor Author

Closing in support of #123

@clemensv clemensv closed this Mar 23, 2018
@clemensv clemensv mentioned this pull request Mar 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants