Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propose holistic simplification of source- attributes #129

Closed
wants to merge 3 commits into from

Conversation

inlined
Copy link
Contributor

@inlined inlined commented Mar 27, 2018

Overview

This PR drops source-types, applies namespacing to event-type and source, and clarifies that source path need not include the source-authority.

My goal is to build on Clemens' Usage Scenarios and revisit our attributes in related groups. This PR looks at event-type, namespace, and source, which are attributes that describe common metadata for the individual occurrence. I would like to cover the fields related to parsing data (event-type-version, schema-url, content-type) in a future PR.

Changes

  • Drops source-type; there is no clear use according to the usage
    scenarios.
  • Consolidates namespace and the source-authority
  • Clarifies that source-path should not be redundant with
    source-authority/namespace
  • Adds namespacing to event-type
  • Drops documentation for "source" that was redundant with each
    of its subfields. Opts for a name prefix instead.
  • Leans on the URI spec to name parts of source (e.g. source-id is now source-path)

Usage Scenarios

IoT Case

This case demonstrates an IoT vendor who may use UUIDs for the sensors they sell and use non-hierarchical attributes for filtering/routing events in an event-connected system. Note that the IoT device's source may not be addressable from the Action which receives the event data.

{
  "source-authority": "iotvendor.net",
  "source-path": "12345678901234-1234-1234-1234-12345678",
  "source-labels": {
    "sensor-type": "vibration",
    "sensor-deployment": "house1234",
    "sensor-location": "window",
  },
  "event-type": "net.iotvendor.window.break"
}

Hosted service

This use case focuses on a developer who extends their application by receiving events from a managed cloud service.

{
  "source-authority": "storage.googleapis.com",
  "source-path": "projects/_/buckets/myBucket/objects/foo/bar.jpg",
  "source-labels": {
    "tier": "preprod"
  },
  "event-type": "google.storage.object.archive"
}

Deployed software

This case demonstrates the value of splitting the concept of "namespace" to both the event-type and the source. Here "bigcompany.com" has deployed GitHub enterprise inside their corporate network. The event-type is still associated with GitHub (the software author) and the source is associated with "bigcompany.com". This example shows a pull request creation event for the "project88/backend" repo.

{
  "source-authority": "code.corp.bigcompany.com",
  "source-path": "project88/backend",
  "event-type": "com.github.pull.create"
}

I'd really like feedback related to:

  • Whether routing software (one of the official use cases) have any requirements related to this pull request.
  • Do we need a maximum number of source-labels? If so, what is a reasonable upper bound?
  • Should we specify that software MUST treat a missing source-labels the same as an empty source-labels?
  • Should we specify rules for how event-type must be interpreted (e.g. using DNS rules: names are matched regardless of case, but SHOULD be returned in their original case)?

@duglin
Copy link
Collaborator

duglin commented Mar 27, 2018

@inlined Obviously since it your PR you're free to keep it as-is, but it seems to me that it might be worth splitting the event-type discussion from the source* discussion since it would probably be easier to reason about them separately - it would be a more focused discussion. And, from your write-up it does look like each one could be adopted independent of the other (ie. I don't see a direct link between the two write-ups).

To the specific proposal:

  • source-attributes feels like a variant of our extensions property. While yours has a bit more focus to its definition, in both cases people will need to have knowledge about the additional fields in advance - which means whether they appears in one location or the other it should end up having the same semantic meaning. Also, if we merge them then we don't have to worry about writing text to explain when to use one vs the other - and explain what happens if an attribute appears in the wrong spot. Could avoid a lot of unnecessary bike-shedding. Simply naming the extensions attributes with a prefix of source-... would provide the same syntax grouping as your source-attributes.

  • You've split source into 2 parts. I've heard from others that they may want to do the same thing in follow-on PRs to Source as a URI #123 . This would allow for a very focused discussion. Meaning people could discuss why a certain aspect of the source URI should be pulled out w/o getting bogged-down in all of the other discussions that come with our broader "source" discussions. Would you be ok with that approach?

@inlined
Copy link
Contributor Author

inlined commented Mar 27, 2018

@duglin you make a lot of good points. RE:

  • Splitting PR: I'm totally fine with doing this if/when actual controversy or bike-shedding emerges. I'm testing a theory that we've had trouble in the past because we looked at isolated fields when many express a concept in concert. Event type got roped in inductively: the URI split doesn't have as clear a purpose until it became clear that I was removing namespace and applying it in two places.
    Should people agree on the URI split but want to discuss the other changes further, I can easily splice those changes off and then resubmit the delta or rebase it into this one.
  • attributes as extensions: In order to avoid collisions with other extensions, it seems like you're suggesting extensions.attributes.key = value? I nominated this for promotion to the top level spec because system labels are an very common concept that should merit common behavior. Also, it's my assumption that extensions should not be required for canonical use cases, and this seems fairly core to many IoT use case. I can see a much easier case though that attributes can be spliced into their own PR.

* Drops source-type; there is no clear use according to the usage
scenarios.
* Consolidates namespace and the source-authority
* Clarifies that source-path should not be redundant with
source-authority/namespace
* Adds namespacing to event-type
* Drops documentation for "source" that was redundant with each
  of its subfields. Opts for a name prefix instead.

Signed-off-by: Thomas Bouldin <inlined@google.com>
@inlined
Copy link
Contributor Author

inlined commented Mar 28, 2018

(rebase squashed to kick Travis and get it to correctly mark this PR as clean)

@rachelmyers
Copy link
Contributor

it seems to me that it might be worth splitting the event-type discussion from the source* discussion since it would probably be easier to reason about them separately

@duglin, since these are attributes that work together, it makes sense to me to have a conversation about them together.

Signed-off-by: Thomas Bouldin <inlined@google.com>
@inlined
Copy link
Contributor Author

inlined commented Mar 28, 2018

FYI: Renamed source-attributes to source-labels to better align with existing cloud platforms, K8s, and the Istio attribute vocabulary

* Examples:
* customer.created
* com.github.pull.create

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpicking: com.github.pull.create reads as a command rather than an event that took place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: github docs use "create" for a notification that something has been created... not that we need to conform to some specific vendor. We could include a few divergent examples to be clear that we're not mandating anything or consider some guidance about naming if someone wants to see if there's a trend in existing implementations or other spec we want to borrow from.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree this is a nit, and suggest it could be addressed in a follow-on PR

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a consumer of an event, "event-type", to me, denotes whether it is a storage event or an messaging event or a timer event , etc. Then depending on the type of event, it could have different properties. For example, a storage event could have properties like "bucket name", "action performed on that bucket (created or update or delete)", a messaging event could have properties like "topic of the message".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SeanFeldman I lean 20% more towards infinitive tense for a few reasons:

  1. It's nice that these names can be reused for other policies in the source software. E.g. only some developers will have "com.github.pull.create" permission in a repository.
  2. There seems to be precedent today for events to be named in the infinitive. E.g. onClick rather than onClicked

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First scenario is describing a permision, not an event. Permission to create a PR.
Second example is historical OnClick as an event handler for the click event. You'd not call it "WhenClick", but rather "WhenClicked" if that would be the convention back in the day.

I might remain in the minority with my belief that an event is a manifistation of something immutable that happened and therefore should be described in the past tense 🙂

spec.md Outdated
* Constraints:
* OPTIONAL
* Keys MUST match the regular expression `[_.a-z0-9]+`
* Keys SHOULD use the character "." is as namespace separator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave out "is"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops. This is what happens when I copy+paste from other standards.

Signed-off-by: Thomas Bouldin <inlined@google.com>
@inlined
Copy link
Contributor Author

inlined commented Mar 29, 2018

Can I get a 👍 or 👎 on this comment for whether source-labels should just be labels

@inlined
Copy link
Contributor Author

inlined commented Mar 29, 2018

Can I get a 🎉 to suggest that the source URI should be more strictly clarified to include an authority and a ❤️ to suggest that URI should be split to enforce that it includes an authority.

@inlined
Copy link
Contributor Author

inlined commented Mar 29, 2018

Please vote and explain why in the next 24hrs. I'll break this up into smaller PRs then. As I'll explain in the labels PR, I didn't leave a vote for labels as the source URI query string. There are many intuitive cases where the source should include literally a URI that an end-user has interacted with. These query fragments are application-level whereas labels are typically infrastructure-level. They have different meanings and are frankly considered with different levels of security (I trust the labels that come from my deployment. I verify the query fragments that come from my users)

CC @cathyhongzhang @yaronha

@deissnerk
Copy link
Contributor

@inlined if you make it just labels, what is labeled then? The event or still the source? I would like a general concept for labels, but it still needs to be clear what is labeled.

@yaronha
Copy link
Contributor

yaronha commented Mar 29, 2018

@inlined i think we need a single namespaced URI for source, also seems like its confusing the event details with the source e.g. the object bar.jpg is not the source of the event rather event details, the source is the object/blob service, note that a typical object bucket update even may even include multiple object names like in AWS S3 event schema so which one will you place in the source ? concatenate them all ?

IMO the event can be a notification from the blob service with cloudprovider.blob.update-message event type (or a common type if we get to some consensus on those down the road), the source is the blob service URI, and the details like the list of objects which were updated, other details like the modification time, end user who updated those, etc. are all part of the message body (data).

@SeanFeldman
Copy link

Agree with @yaronha about source-path. It's a specific case rather than general source.

One remark - I personally refer to events in the past tense. It's something that has happened and not an instruction to take place. As such, should be also in the past tense. E.g.: blob.updated rather than blob-update. Saying that, one can say I'm nitpicking 🙂

@duglin
Copy link
Collaborator

duglin commented Mar 30, 2018

@inlined will try to have a response/vote by EOD tomorrow - just slightly past your 24hr mark :-) but it takes time to try to gather the view-points.

@ultrasaurus
Copy link
Contributor

@deissnerk @yaronha in this context everything is an attribute of the event. The source is an event detail. My understanding of our work on describing shared meta-data is to figure out some details of the events that are commonly useful to include to help with routing, aggregation and to simply the work of a consumer.

The source-path could be named source-id (as was previously); however, calling it a "path" allows us to easily reference URI spec, so that there is a clear and commonly accepted character set, and for sources that have multiple components in the reference to the thing that generated the event, there's a clear way to specify.

Does that help clarify? or am I missing something?

* myorg/myrepo
* my/long/ftp/path.jpg

### source-labels

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do they refer to the "properties" of the source?
If so, I feel "source-properties" is a better name since property feels something intrinsic while "label" feels "add-on" .

Is it TRUE that different event source types could have different source-labels? Are we going to leave the definition of source-labels to event producers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I proposed source-labels because that's the use case I had in mind. I'm leaving myself open to being convinced that other systems would be able to add their own labels as well. If we allow this, we should have a discussion about whether they are explicitly annotating the source (letting us keep it source-labels) or if they're adding other annotations that shouldn't be tied to the source.

The labels that I'm used to interacting with are under the developer's control, though the developer's software might automatically manage some labels with the developer's credentials. Some examples I've used in the past:

  • Google was rolling out some new infrastructure near a major conference. They told speakers to add a specific label on our project if we wanted to defer migration. Because projects (like many resources on Google Cloud) support labels, the team was able to skip over projects that would be used for on-stage demos without any coordination from any other team or a new permanent attribute on their API like "defer_2017_rollout": true
  • Firebase releases its own simplified tooling on top of Google Cloud Functions, which provides our customers with a golden road. We want to track when customers use this tooling, so we add a label that says our CLI managed the deploy. Since our CLI experience is K8S style (the customer gives us source and we make the project look like that source), we use this label to avoid deleting functions that are being managed with other tools.

You seem to know dramatically more about IoT and can interpret how the label concept would apply there. It seems totally reasonable to me that the user portal for managing devices would manage labels on those devices. It's also true that there could be a bit of disconnect originally from vendors that they use different label names to refer to similar concepts. Personally, I think this problem will be smaller in practice than deciding that every IoT device can only be correlated by one property.

@yaronha
Copy link
Contributor

yaronha commented Mar 30, 2018

@ultrasaurus i dont think we are aligned

  1. IMO CloudEvents in a FedEx analogy is the Sticker on the package which ships an event message, the sticker contains info on From: (Source shipper), a Barcode (event ID), Date (Timestamp) and some basic data to help track the shipment. The label does not repeat the content of the package.

Our goal is to make sure events get out of a system and arrives to a different system and can be understood by the receiver. i dont see why it make sense to add so much data from within the event message and place it in the context (label/envelop).

  1. I really fail to understand how bar.jpg is a source, i can understand how the messaging service can be a source, can understand how the blob service is a source, and even the guy/machine placing the jpg may be a source, but it doesn't make sense that the object is a source since it didnt emit any event. at best the object name is the subject, even that is arguable since the same event may cover multiple objects.

Take AWS example (i placed code in Slack), SNS fires an event in which it is the source (no info about the object), in the SNS payload there is an S3 service message where the S3 service is the source, and in that there are a bunch of records which describe the modified objects. such a model make sense to me, right we may choose to make SNS invisible, but still, the S3 service is the event issuer and not the object name.

  1. I'm not sure why one would like to put so much context data running around without a proper schema, it is impossible to interpret the meaning of those, if one would like to they can just peek into the event body and use the event-type as the way to figure out the meaning of fields.

  2. @inlined re Labels, can you explain why would you need Labels in the context and why are they assosiated w source? who will use them and for what (and why he cant read the event itself), if we add something like it to address routing or observing i suggest to have something more generic like K8s annotations or private HTTP headers which can also be name-spaced (e.g. xx.com/my-attr) vs something which is specific to or associated with source , especially when the meaning of source isnt getting much consensus.

@cathyhongzhang
Copy link

It seems like we are still not in sync as to the semantics of "Event Source". From an event consumer point of view, what I care is the detail information about the event and event source, for example, is it a storage event or a messaging event or a streaming event or a timer event? if it is the storage event, is the event triggered by a bucket creation or modification or deletion? what is the bucket name? If it the the messaging event, what is the topic of the message, who is the user that triggers this message etc. ? If we can give self-explanatory names to these attributes, that is great. Otherwise, we may need more clarification on what each attribute name means.

As to the name of the "source-label", I think having the "source" keyword there makes it clear that the labels are associated with the "source". But as stated in my inline comment, I think a better name is "source property".

@yaronha
Copy link
Contributor

yaronha commented Apr 2, 2018

@cathyhongzhang why would you expect to have the event details as context fields ?
the only thing you need in the context is the event type/schema, and content-type, based on this you know how the event data fields are organised, and those can be very detailed.

The cloud-event context fields are NOT replacing the event body fields, they are merely the fields which allow us to understand the type and organisation of the event and few generic attributed for observers to watch (ID, Time, Source), every field which is event specific (e.g. Bucket name) is not a generic attribute by definition.

If we standard Event schema for common events like IoT sensor reports, Blob service updates, etc. those will be in a separate scope of work.

@duglin
Copy link
Collaborator

duglin commented Apr 2, 2018

@inlined sorry for the delay but with vacations/holidays I didn't get as quick feedback as I had hoped. From our side we're leaning towards a model where the source is a URI but whether or not there's an authority in there would be an implementation choice.

inlined added a commit to inlined/cloudevents that referenced this pull request Apr 2, 2018
This commit has been spliced off of cloudevents#129

Signed-off-by: Thomas Bouldin <inlined@google.com>
@duglin
Copy link
Collaborator

duglin commented Apr 4, 2018

@inlined should we close this one since you've opened up new PRs that appear to cover at least some of these same aspects?

inlined added a commit to inlined/cloudevents that referenced this pull request Apr 10, 2018
This commit has been spliced off of cloudevents#129

Signed-off-by: Thomas Bouldin <inlined@google.com>
@inlined
Copy link
Contributor Author

inlined commented Apr 10, 2018

Will close. Was waiting because the last part, URIs with authorities, was not yet spun off into a sub-PR. WDYT about adding a SHOULD clause that the URI should include an authority component?

When setting up the infrastructure to subscribe for events, we'll want more than a relative URI to know what service to wire up.

@inlined inlined closed this Apr 10, 2018
@duglin
Copy link
Collaborator

duglin commented Apr 11, 2018

@inlined saying "a 'source' SHOULD have an authority component to the URI" seems reasonable to me.

inlined added a commit to inlined/cloudevents that referenced this pull request Apr 15, 2018
This commit has been spliced off of cloudevents#129

Signed-off-by: Thomas Bouldin <inlined@google.com>
duglin pushed a commit that referenced this pull request Apr 15, 2018
* Add namespacing to an event.

This commit has been spliced off of #129

Signed-off-by: Thomas Bouldin <inlined@google.com>

* Update serialization.md to namespace event-type.

Noticed that "namespace" was still in the searialization
docs, even though it was removed from spec.md. Fixed.

Signed-off-by: Thomas Bouldin <inlined@google.com>

* Soften stance on namespacing to SHOULD.

Accept wording suggestions.

This closes issue #32

Signed-off-by: Thomas Bouldin <inlined@google.com>

* Clarify the meaning of the event-type package namespace.

It is explicitly OK for one software organization
to emit event-types of another organizatoin, so
long as they are conforming to the standard set
by the organization in the namespace.

Signed-off-by: Thomas Bouldin <inlined@google.com>
clemensv pushed a commit to clemensv/spec that referenced this pull request May 16, 2018
* Add namespacing to an event.

This commit has been spliced off of cloudevents#129

Signed-off-by: Thomas Bouldin <inlined@google.com>

* Update serialization.md to namespace event-type.

Noticed that "namespace" was still in the searialization
docs, even though it was removed from spec.md. Fixed.

Signed-off-by: Thomas Bouldin <inlined@google.com>

* Soften stance on namespacing to SHOULD.

Accept wording suggestions.

This closes issue cloudevents#32

Signed-off-by: Thomas Bouldin <inlined@google.com>

* Clarify the meaning of the event-type package namespace.

It is explicitly OK for one software organization
to emit event-types of another organizatoin, so
long as they are conforming to the standard set
by the organization in the namespace.

Signed-off-by: Thomas Bouldin <inlined@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants