Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introducing "subject" #406

Merged
merged 3 commits into from Apr 10, 2019
Merged

Introducing "subject" #406

merged 3 commits into from Apr 10, 2019

Conversation

clemensv
Copy link
Contributor

Based on last week's debate on a missing context qualifier and the Knative project apparently now running into similar implementation issues for generic dispatch infra that we're having in Azure Event Grid, I am herewith reintroducing "subject" for consideration.

The (extensive) argument for the "subject" field is in the (closed) issue #112. I'm also talking through it in the audio track attached to this presentation https://github.com/cloudevents/spec/blob/master/share/2018-03-22-TopicsAndSubjects.pptx and the presentation itself illustrates the topic ("source") and subject split.

Signed-off-by: Clemens Vasters clemensv@microsoft.com

Copy link
Member

@n3wscott n3wscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it. Nits over the text needs to match the other texts in the spec.

a simple and efficient string-suffix filter for that subset of events.
* Constraints:
* OPTIONAL
* MUST be a non-empty string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Present, MUST be a non-empty string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no pushy?

spec.md Outdated
a subscriber will typically subscribe to events emitted by a `source`,
but the `source` identifier alone might not be sufficient as a qualifier for
any specific event if the `source` context has internal sub-structure.
* Example: A subscriber might register interest for when new blobs are created
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other fields, Example is after the Constrains section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can shuffle that

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the event source has an internal sub-structure, then the event consumer could be interested in any part of the sub-structure. How could the producer know which part of the sub-structure the consumer is interested in, thus which part to put in the subject? Also since it is an internal sub-structure of the source, how could the consumer know how to interpret the semantics of the subject?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is up to the pub/sub infrastructure to provide. In Event Grid we provide filters that let you run a string expression over the subject to filter that down. The publisher doesn’t know, it just publishes for the scope it was asked to deliver events for.

@@ -215,6 +215,30 @@ help intermediate gateways determine how to route the events.
* urn:event:from:myapi/resourse/123
* mailto:cncf-wg-serverless@lists.cncf.io

### subject
* Type: `String`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps Subject is a URIRef? and it relates to the source uri?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's some sort of subject identifier. I can be a filename or just some name of a thing that the source knows. I wouldn't constrain that as much as source.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

URIRef does all:

this is a valid uri ref

http://so.is/this

distinguish separate occurrences of a same-named blog having been created;
the name of the newly created blob is carried in `subject`.

Identifying the subject of the event in context metadata (opposed to only in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For source, we have these examples:

Would using subject turn them into:

* source: https://github.com/cloudevents/spec/
* subject: pull/123

* source: /cloudevents/spec/pull/123
* subject: pull/123

* source: urn:event:from:myapi/resourse/123
* subject: UPDATE `<-- ? picked a verb`

* source: mailto:cncf-wg-serverless@lists.cncf.io
* subject: Help 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last example bring the point home rather nicely. That's exactly what subject is.

@JemDay
Copy link
Contributor

JemDay commented Mar 19, 2019

I think it might be clearer to have some richer use-case examples added to the 'example' section as opposed to embedding them inside the description of each context attribute.

Does subject now become part of the idempotency set ?

@duglin
Copy link
Collaborator

duglin commented Mar 19, 2019

I believe we currently have "subject" as part of "source" (even if it's just implied by what's in the spec). Is this PR suggesting we pull it out because in order for generic middleware (e.g. filters) to do their job we can't count on them to understand the format of the source URI to know which parts to extract? And that the filter expression would be too hard to construct even if the entity creating the filter expression did know the format of the source URI?

Not pushing back, I just want to fully understand the reason for the splitting up of 'source' - plus this would all be good info to put into the Primer.

@mrutkows
Copy link

Before adding additional fields one should clearly identify the set of use cases it intends to solve to as well exhibit what the field would be used for. If we are merely separating some partial path from a URI for convenience, it does not seem like an additional field is warranted. URI (sources) that share the same paths (up to a point) indicate same source and as paths are added/they represent and distinguish sub-areas (components or subject areas) within the same "family" of sources.

@evankanderson
Copy link
Contributor

I believe the intent of this is to make it clear what portions of the current source field SHOULD be used to identify a unique producer (see also discussion in #391 and #403 about uniqueness of id being on a per-event-producer basis).

Three issues I'm aware of with the current definition of source:

  1. Middleware requires some sort of omniscience (knowledge of all event producers and the schema of their source attributes) in order to be able to determine the event producer given a source URL. In some cases, the source may represent a specific object, like a pull request or an individual object in an object store, but the producing system is GitHub or the object store implementation. In some cases, this may align on hostname boundaries, but in other cases, multiple APIs may be served from a single hostname, or the same API may be served from different hostnames (see Slack).

  2. Consumers who wish to consume events from a particular producer need a way to indicate what producer they are interested in (related to point 1) Again, if the producer uses a specific source URI of a particular resource in the system (as suggested by our examples), the consumer is then required to specify "the set of sources I'm interested in" as some sort of glob/pattern match/prefix match, depending on the system. If the source URI particularly represents the event producing system, and the subject represents the individual resource described by the event. Side note: I have a slight concern about conflating subject in this context with subject in an access control context; I might prefer producer and source or service and resource.

  3. Currently, source is allowed to be a relative URI ref, but it's not clear what that URI ref is intended to be relative to. If we separate source and subject, subject could logically be a reference relative to the source. I realize that this isn't directly in scope of this PR, but it is related. Similarly, we might benefit by suggesting that each event producer should have a (single) unique source URI which could be used by Clarify scope of eventID's uniqueness #391 and Add some additional design points for ID #403 to allow an external observer to make assumptions about the scope of uniqueness of id values.

@clemensv
Copy link
Contributor Author

@duglin The "subject" is currently necessarily embedded in "source", and since that's a URI it's hard for us to give hard rules for where the context given by the "subject" sits in that URI and therefore filtering is difficult. In Event Grid, we're making a product decision to put "subject" behind the '#' in the URI and thus effectively treat it as a distinct field. In pub/sub, you subscribe to events from a source and that source itself may raise various types of events on behalf of concepts associated with it, and the "subject" identifies those concepts.

@clemensv
Copy link
Contributor Author

@JemDay it's not part of the idempotency set. For that, 'source' and 'id' remain sufficient. I agree that we need more examples and in-depth explanation of how the attributes hang together.

@clemensv
Copy link
Contributor Author

@mrutkows @evankanderson If you haven't done so, please read through issue #112 which lays out in lot of detail why I'm reintroducing the field and it also enumerates a lot of examples of prior art in existing messaging and eventing infrastructure.

Copy link
Contributor

@cneijenhuis cneijenhuis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To build on Cathys comment, I am also struggling with how the subject is supposed to look like when we have several containers, and some things can sometimes be the subject, and sometimes a container.

Event Type Source
com.github.Org.Created /cloudevents
com.github.Repo.Created /cloudevents/cobol-sdk
com.github.PR.Created /cloudevents/spec/pull/123
com.github.PR.Review.Created /cloudevents/spec/pull/123/review/456
com.github.PR.Review.Comment.Created /cloudevents/spec/pull/123/review/456/comment/789

How would these look like with source + subject? It isn't clear whether the last one should be /comment/789 or /pull/123/review/456/comment/789.

(This isn't a theoretical issue for me, I can also provide some commerce use cases where we have objects that also act as containers)


What I see as positive is that some (existing) middleware allows filtering on metadata attributes, but only via a prefix. Splitting the source ideally makes it easier to filter. E.g. if the DCO checker should run for all CloudEvents repos, I logically want to match /cloudevents/*/pull/*, but I can't do that with a prefix alone. If it is split into /cloudevents/spec/ and /pull/123, I can match for /cloudevents/* and /pull/*.

However it depends on the event producer splitting the source in the "correct" way for the consumer. It reminds me of the issue with the partition key - the "correct" split can be different per consumer.

The perfect solution IMO is to have advanced matching on a URI-reference, but giving existing middleware this may still be an iterative improvement.

spec.md Outdated Show resolved Hide resolved
@duglin
Copy link
Collaborator

duglin commented Apr 4, 2019

@clemensv can you sign your commits?

@clemensv
Copy link
Contributor Author

clemensv commented Apr 4, 2019

@duglin @cneijenhuis used a novel feature which i haven't yet found on Github myself where you can contribute little patches to the patch, and accepting those signs differently:

commit 13241fb (HEAD -> patch-subject, origin/patch-subject)
Author: Christoph Neijenhuis christoph.neijenhuis@commercetools.de
Date: Wed Apr 3 18:34:19 2019 +0200

Update spec.md

Co-Authored-By: clemensv <clemens@vasters.com>

I did a soft reset and re-committed, but the DCO checker should accept these or we should rule out that feature.

@cneijenhuis
Copy link
Contributor

Sorry for that. I like the suggested changes feature, because a) the diff view is really nice and b) in theory it should help save time to incorporate suggestions.
I'm guessing you could manually add the Signed-off-by ... to the commit message in the GitHub UI, but it makes the process more clunky.

(The problem exists with all edits through the GitHub UI unfortunately...)

@duglin
Copy link
Collaborator

duglin commented Apr 4, 2019

rebase needed

@@ -1,4 +1,4 @@
# CloudEvents - Version 0.2
# CloudEvents - Version 0.2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updated version adds a extra whitespace to the end of 0.2.

the `data` payload) is particularly helpful in generic subscription filtering
scenarios where middleware is unable to interpret the `data` content. In the
above example, the subscriber might only be interested in blobs with names
ending with '.jpg' or '.jpeg' and the subject attribute allows for
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/subject/subject/

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this paragraph. Would it be helpful to add something about how putting this info into the "source" URI is just as hard to parse/filter-on ? And that's why we creating this property. Before this property I think people were pushed towards merging it into 'source' - if they wanted it outside of 'data'. Or, if this is too much text for the spec, perhaps in the primer?

* Constraints:
* OPTIONAL
* MUST be a non-empty string
* Example:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this example because it shows how all of the key properties relate to each other, and better explains what people should expect in each. So, can we move this example out from under the definition of this one property so it can be view in the bigger context? And then show "id" and "type" too - or even a total CE.

@duglin
Copy link
Collaborator

duglin commented Apr 4, 2019

@clemensv can you rebase this - then I can merge it.

Approved on the 4/4 call.

Clemens Vasters added 3 commits April 9, 2019 18:57
Signed-off-by: Clemens Vasters <clemensv@microsoft.com>
Signed-off-by: Clemens Vasters <clemensv@microsoft.com>
Signed-off-by: Clemens Vasters <clemensv@microsoft.com>
@duglin duglin merged commit ef193d5 into cloudevents:master Apr 10, 2019
- /cloudevents/spec/pull/123
- urn:event:from:myapi/resourse/123
- mailto:cncf-wg-serverless@lists.cncf.io
* Type: `URI-reference`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- is more clear than * that it is suppose to be a list. The rest of the spec uses -

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants