-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[api] Event data model review #1
Comments
The fields make sense and I think the scope is just right for the use cases you describe in the readme. The only question I have is about the concept of time in the data model. Based on your comment in the protobuf definition, I think you are currently thinking about event time as the time it is "ingested" by the stream backend service. There are two other times that could be of interest: publish and consume. Do you think fields to hold those times are worth adding? |
Thanks for the feedback. You are correct about the time question. Often they are differentiated between "event time" and "received time". The event time is set by the client when the event actually took place and the received time is set once the event has been received by the server. This is often useful if clients can generate events locally even if they are "offline". The classic example is playing a game on a phone and your connection cuts out or you hop on an airplane. Upon landing, the client reconnects and all of the events are sent. "Windowing" in a stream processing context often relies on this event vs. receive time. The event time can be used to put the event in the bucket (window) it applies to even if it is received late by the server. Supporting the event time is simple, but I am not sure if that windowing use case applies. |
One potential use of event time would be in the case where we are turning a relational source into an event stream. The producer could set the event time to the appropriate creation or update time from the source, which would be much more useful than the received time (since they will likely all be received in a small window determined by the producer schedule). |
That is a valid point. Yea assuming the data are written to an intermediate source first (CSV file) then there definitely would be a delay, like with sql-extractor. The fundamental question for any field on this Event envelope comes down to whether it is domain specific (and should be put in the event data) or it is generalizes well enough to be a top-level field. I think in this case it is well defined enough. Many stream processing frameworks have this idea baked in so it is not unfounded. |
Changes look good except I'm slightly concerned about adding the event time in this library. If we have some events coming from Clarity without times, the consumer needs to know that the event doesn't have a time in Clarity and therefore the time actually means a different thing. |
I see what you mean, but it is intended to be when the event was published.. if it corresponds to, say, when a record was updated last (which is separate from when it was extracted), then that could be set as the event time. But that is the job of the producer. Otherwise setting the event time can be interpreted as "as far as we know this was the time it occurred". |
See #6 for new discussion. |
For those interested, this is a request for review of the event data model. This includes primarily the public
Event
type and the related internal protobufEvent
message.The text was updated successfully, but these errors were encountered: