-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate whether to use message header timestamp vs payload timestamp #1
Comments
Looks like time handling is super complex and not well documented. Found related discussion in streams docs suggesting that timestamps built into Kafka may represent event-time, processing-time, or ingestion-time: https://kafka.apache.org/0110/documentation/streams/core-concepts#streams_time When a Connector creates a SourceRecord it can optionally use null for the timestamp: https://kafka.apache.org/26/javadoc/org/apache/kafka/connect/source/SourceRecord.html A Steams app at least recognizes that a timestamp may be invalid: See Also: https://kafka.apache.org/26/javadoc/index.html?org/apache/kafka/streams/processor/UsePreviousTimeOnInvalidTimestamp.html |
There appear to be two critical config options: https://kafka.apache.org/documentation/#message.timestamp.difference.max.ms They can be set per topic, else the global broker values are used. The default is for type "CreateTime" (application event-time) and difference max is set to largest possible value of long (thousands of centuries difference or something ridiculous). One concern is that if we start putting epics TIME_DBR timestatmps into the built-in timestamp field, they timestamp might occasionally differ more than message.timestamp.difference.max.ms, but likely only if someone changes the default value. It also isn't clear what setting type to LogAppendTime actually does - besides disable the possibility that the broker will reject the messages due to message.timestamp.max.ms. Specifically, it isn't clear if the LogAppendTime type is mostly just a hint to the consumers of how to interpret the timestamp or if possibly the broker literally overrides the timestamp provided in the SourceRecord API and inserts its own timestamp? |
Looks like LogAppendTime type does result in overwriting whatever timestamp the producer provides: wurstmeister/kafka-docker#269 CreateTime can be problematic if an IOC has a misconfigured clock. For example: https://medium.com/@jiangtaoliu/a-kafka-pitfall-when-to-set-log-message-timestamp-type-to-createtime-c17846813ca3 Note: that message index provides an ordering, but timestamp does not guarantee that, at least not with CreateTime type. So it is looking like it is probably best to embrace the default CreateTime type and put TIME_DBR timestamp into the built-in timestamp field in Kafka. Users can flip the config to LogAppendTime if they simply want stronger timestamp ordering guarantees (but lose monitor timestamp info). If they want both, they would need to use a Connect SimpleMessageTransform to hoist the timestamp from the header to a real field, then route the message to a different topic with LogAppendTime configured. |
Currently we place the EPICS monitor event timestamp in the Kafka message payload. This works. However, should we be using the built-in timestamp found in the Kafka message header to avoid an unnecessary additional field in each message? See:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-32+-+Add+timestamps+to+Kafka+message
Looks like even if you choose the native "CreateTime" timestamp type, the timestamp still may have some implications on topic compaction / partitioning. More investigation is needed.
The text was updated successfully, but these errors were encountered: