Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate whether to use message header timestamp vs payload timestamp #1

Closed
slominskir opened this issue Aug 3, 2020 · 3 comments
Labels
question Further information is requested

Comments

@slominskir
Copy link
Member

Currently we place the EPICS monitor event timestamp in the Kafka message payload. This works. However, should we be using the built-in timestamp found in the Kafka message header to avoid an unnecessary additional field in each message? See:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-32+-+Add+timestamps+to+Kafka+message

Looks like even if you choose the native "CreateTime" timestamp type, the timestamp still may have some implications on topic compaction / partitioning. More investigation is needed.

@slominskir slominskir added the question Further information is requested label Aug 3, 2020
@slominskir
Copy link
Member Author

Looks like time handling is super complex and not well documented.

Found related discussion in streams docs suggesting that timestamps built into Kafka may represent event-time, processing-time, or ingestion-time: https://kafka.apache.org/0110/documentation/streams/core-concepts#streams_time

When a Connector creates a SourceRecord it can optionally use null for the timestamp: https://kafka.apache.org/26/javadoc/org/apache/kafka/connect/source/SourceRecord.html
See Also: confluentinc/kafka-connect-jdbc#311

A Steams app at least recognizes that a timestamp may be invalid: See Also: https://kafka.apache.org/26/javadoc/index.html?org/apache/kafka/streams/processor/UsePreviousTimeOnInvalidTimestamp.html

@slominskir
Copy link
Member Author

There appear to be two critical config options:

https://kafka.apache.org/documentation/#message.timestamp.difference.max.ms
https://kafka.apache.org/documentation/#message.timestamp.type

They can be set per topic, else the global broker values are used. The default is for type "CreateTime" (application event-time) and difference max is set to largest possible value of long (thousands of centuries difference or something ridiculous). One concern is that if we start putting epics TIME_DBR timestatmps into the built-in timestamp field, they timestamp might occasionally differ more than message.timestamp.difference.max.ms, but likely only if someone changes the default value. It also isn't clear what setting type to LogAppendTime actually does - besides disable the possibility that the broker will reject the messages due to message.timestamp.max.ms. Specifically, it isn't clear if the LogAppendTime type is mostly just a hint to the consumers of how to interpret the timestamp or if possibly the broker literally overrides the timestamp provided in the SourceRecord API and inserts its own timestamp?

@slominskir
Copy link
Member Author

Looks like LogAppendTime type does result in overwriting whatever timestamp the producer provides: wurstmeister/kafka-docker#269

CreateTime can be problematic if an IOC has a misconfigured clock. For example: https://medium.com/@jiangtaoliu/a-kafka-pitfall-when-to-set-log-message-timestamp-type-to-createtime-c17846813ca3

Note: that message index provides an ordering, but timestamp does not guarantee that, at least not with CreateTime type.

So it is looking like it is probably best to embrace the default CreateTime type and put TIME_DBR timestamp into the built-in timestamp field in Kafka. Users can flip the config to LogAppendTime if they simply want stronger timestamp ordering guarantees (but lose monitor timestamp info). If they want both, they would need to use a Connect SimpleMessageTransform to hoist the timestamp from the header to a real field, then route the message to a different topic with LogAppendTime configured.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant