Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[api] Event data model review #1

Closed
bruth opened this issue Sep 15, 2017 · 7 comments
Closed

[api] Event data model review #1

bruth opened this issue Sep 15, 2017 · 7 comments

Comments

@bruth
Copy link
Contributor

bruth commented Sep 15, 2017

For those interested, this is a request for review of the event data model. This includes primarily the public Event type and the related internal protobuf Event message.

  • Do the fields make sense?
  • Is the scope too narrow or too broad?
  • Are there additional common fields that are missing?
@gracebrownecodes
Copy link

The fields make sense and I think the scope is just right for the use cases you describe in the readme. The only question I have is about the concept of time in the data model. Based on your comment in the protobuf definition, I think you are currently thinking about event time as the time it is "ingested" by the stream backend service. There are two other times that could be of interest: publish and consume. Do you think fields to hold those times are worth adding?

@bruth
Copy link
Contributor Author

bruth commented Sep 18, 2017

Thanks for the feedback. You are correct about the time question. Often they are differentiated between "event time" and "received time". The event time is set by the client when the event actually took place and the received time is set once the event has been received by the server.

This is often useful if clients can generate events locally even if they are "offline". The classic example is playing a game on a phone and your connection cuts out or you hop on an airplane. Upon landing, the client reconnects and all of the events are sent.

"Windowing" in a stream processing context often relies on this event vs. receive time. The event time can be used to put the event in the bucket (window) it applies to even if it is received late by the server.

Supporting the event time is simple, but I am not sure if that windowing use case applies.

@gracebrownecodes
Copy link

One potential use of event time would be in the case where we are turning a relational source into an event stream. The producer could set the event time to the appropriate creation or update time from the source, which would be much more useful than the received time (since they will likely all be received in a small window determined by the producer schedule).

@bruth
Copy link
Contributor Author

bruth commented Sep 18, 2017

That is a valid point. Yea assuming the data are written to an intermediate source first (CSV file) then there definitely would be a delay, like with sql-extractor.

The fundamental question for any field on this Event envelope comes down to whether it is domain specific (and should be put in the event data) or it is generalizes well enough to be a top-level field. I think in this case it is well defined enough. Many stream processing frameworks have this idea baked in so it is not unfounded.

@gracebrownecodes
Copy link

Changes look good except I'm slightly concerned about adding the event time in this library. If we have some events coming from Clarity without times, the consumer needs to know that the event doesn't have a time in Clarity and therefore the time actually means a different thing.

@bruth
Copy link
Contributor Author

bruth commented Sep 19, 2017

I see what you mean, but it is intended to be when the event was published.. if it corresponds to, say, when a record was updated last (which is separate from when it was extracted), then that could be set as the event time. But that is the job of the producer. Otherwise setting the event time can be interpreted as "as far as we know this was the time it occurred".

@bruth bruth closed this as completed Dec 12, 2017
@bruth
Copy link
Contributor Author

bruth commented Dec 12, 2017

See #6 for new discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants