KAFKA-3025 Added timetamp to Message and use relative offset. #764

becketqin · 2016-01-13T14:03:13Z

See KIP-31 and KIP-32 for details.

A few notes on the patch:

This patch implements KIP-31 and KIP-32. The patch includes features in both KAFKA-3025, KAFKA-3026 and KAFKA-3036
All unit tests passed.
The unit tests were run with new and old message format.
When message format conversion occurs during consumption, the consumer will not be able to detect the message size too large situation. I did not try to fix this because the situation seems rare and only happen during migration phase.

hachikuji · 2016-01-19T21:48:28Z

core/src/main/scala/kafka/server/KafkaConfig.scala

  val NumRecoveryThreadsPerDataDirProp = "num.recovery.threads.per.data.dir"
  val AutoCreateTopicsEnableProp = "auto.create.topics.enable"
  val MinInSyncReplicasProp = "min.insync.replicas"
+  val MessageTimestampTypeProp = "messge.timestamp.type"


Typo in "messge"?

I wasn't clear from the KIP, but is this a broker-wide setting or can it be overridden for each topic?

@hachikuji I was also thinking about that. Currently whatever configurations in LogConfig are per topic configurations. And the message timestamp type is a legitimate log config. So currently it is a per topic configuration. I can see some benefit of doing so from migration point of view. Because most topics are owned by some applications. We can start to use the new format once all the client of that topic has migrated. And in the final state, we can choose to leave the topics whose owner are not able to migrate to use old format and still have zero-copy.

Thanks for the explanation. Makes sense to me. By the way, I've only done a quick pass on this patch so far, but I'm planning to spend a bit more time in the next couple days.

apovzner · 2016-01-21T20:02:06Z

@becketqin would you add the remaining KIP-31 and KIP-32 work to this patch (client side work and timestamp in produce response)? Or that would be a different patch?

becketqin · 2016-01-21T21:16:32Z

@apovzner This patch contains all the features in KIP-31 and KIP-32. The rest of the work is probably adding integration test. I have already added some unit tests but we can also add more if needed.

apovzner · 2016-01-21T23:04:46Z

clients/src/main/java/org/apache/kafka/clients/producer/ProducerRecord.java

@@ -12,6 +12,8 @@
 */
 package org.apache.kafka.clients.producer;

+import org.apache.kafka.common.record.Record;
+
 /**
 * A key/value pair to be sent to Kafka. This consists of a topic name to which the record is being sent, an optional
 * partition number, and an optional key and value.


Since we are adding timestamp field to ProducerRecord, I think we should add a comment to ProducerRecord class description about meaning of timestamp, what happens if user sets null, etc.

apovzner · 2016-01-21T23:31:50Z

@becketqin My question about remaining KIP-31 and KIP-32 work was based on outdated info -- I did not refresh my window and did not see that you added client-side implementation + returning timestamp in produce response. I see now that you also updated the PR description, thanks!

apovzner · 2016-01-21T23:52:00Z

clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java

@@ -494,6 +495,16 @@ private long waitOnMetadata(String topic, long maxWaitMs) throws InterruptedExce
        return time.milliseconds() - begin;
    }

+    private long getTimestamp(String topic, Long timestamp) {
+        // If log append times is used for the topic, we overwrite the timestamp to avoid server side re-compression.
+        if (metadata.isUsingLogAppendTime(topic))


Related to my other comment about learning about timestamp type for topic. So, the first set of produce messages will not have timestamp == INHERITED_TIMESTAMP if timestamp type == LogAppendTime, right? If setting timestamp to INHERITED_TIMESTAMP is required for compressed messages to work, does it mean we have a bug?

It is not required to be INHERITED_TIMESTAMP, but it is good to be so.

Like I answered in your other comment, if a topic is using LogAppendTime and a broker receives a message whose timestamp is not INHERITED_TIMESTAMP, it will overwrite it and do the recompression. So the first batch of a new producer might cause recompression on broker side, but after that, no recompression should be needed. I will add some comments so it is more clear.

guozhangwang · 2016-01-26T00:39:19Z

@becketqin Here are some high-level thoughts about the protocol:

Basically we want the consumer to return the timestamp of the type specified by that topic even for compressed message set, but without the additional information the consumer would not know if LogAppendTime or LogCreationTime is used. And as @apovzner mentioned by just setting the wrapper message as the max value of all inner message timestamps and letting consumer check if wrapper timestamp is the max value does not perfectly work since 1) it requires the consumers to always decompress the whole message before returning any to the user, hence restricting buffer memory management we wanted to add in the future, 2) there is a corner case that if LogAppendTime is used and broker overrides the wrapper timestamp, it happens to be the same as the max of inner timestamps.

I think we can add this information into the attribute field of the message, which currently only used 2 bits for four different compression types; instead we can make it a mask manner where the first 3 (or if we want to be safer, use 4) bits are preserved for indicating compression codec, leaves us a total of 8 (or 16) supported compression types, and use the forth (fifth) bit for indicating if the wrapper timestamp (for LogAppendTime, hence it is overridden) or the inner timestamp (for LogCreationTime) should be used to set the consumer record's timestamp.

And with this neither producer nor consumer needs to learn about this per-topic config from metadata responses, which makes the client change simpler, and other languages' adoption easier.

I am curious if the ducktape integration tests will be added in another PR?

becketqin · 2016-01-26T01:32:12Z

@guozhangwang Using attribute field is a good approach. It also lets consumers know the timestamp type. To make sure I understand your suggestion correctly:

The producer simply send message assuming the broker is using CreateTime. i.e. both attributes and timestamp will be using CreateTime.
If log append time is used, the broker only overrides the outer message's Attribute field and Timestmap field to use LogAppendTime
When consumer sees the message, it checks both magic as well as attribute field to see which timestamp is used (if magic > 0), and then decide whether it will override the inner message's timestamp or not.

Another thing is that we still need to decompress the entire compressed message, because of the reason I mentioned in one of the comments. Given the stream compression used by producer, we will not have a actual "relative offset compared with last message" until we close the batch. Instead, we only have the "relative offset compared with the first message" when we write a message into a batch. Because the outer message only has the absolute offset of the last message, in order to have the absolute offset of an inner message, we have to decompress the entire compressed message to find out the "relative offset compared with the last message", then compute the absolute offset.

I feel this is fine for new consumer because we are delivering messages in batch to use anyways.

becketqin · 2016-01-26T01:41:50Z

BTW, currently the CompressionCodecMask is set to 0x7, so it is 3 bits. Changing that to 4 bits is backward compatible so that should be fine.

guozhangwang · 2016-01-26T01:59:22Z

@becketqin yes your understanding is correct.

I was initially thinking about possibilities of NOT decompressing the whole message when we add the memory management feature in the future so that we can choose to buffer less decompressed messages. But it seems not possible now, which maybe still fine for us so let's forget about it.

becketqin · 2016-01-27T18:55:42Z

@apovzner @guozhangwang I updated the patch with Guozhang's proposal. I will add integration test in a separate PR.
The intended tests are:

Change timestamp type on the fly.
Test message format version upgrade
I actually want to do end to end test using different version of producers and consumers. But not sure if it is possible the current integration test because that requires different clients jars.

apovzner · 2016-01-28T04:47:00Z

@becketqin You can do end-to-end compatibility testing with system tests. Take a look at compatibility_test.py. It currently tests 0.9.X java producer against 0.8.X brokers and 0.8.X consumer against an 0.9.X brokers. They both succeed on expected failure. You can add couple of more system tests to that to test newer brokers with older producers and/or consumers. Note that you would need to update vagrant/base.sh to get Kafka release 0.9.0.0.

apovzner · 2016-01-28T04:59:25Z

clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerRecord.java

+                          int partition,
+                          long offset,
+                          long timestamp,
+                          Record.TimestampType timestampType,


If we are exposing timestamp type in ConsumerRecord, should we declare TimestampType outside of Record?

It is in KafkaProducer line 437. We just need a one liner now.

apovzner · 2016-01-28T05:08:59Z

@becketqin Maybe I missed it, but I don't see where producer assigns timestamps if the user does not specify the timestamp in ProducerRecord. The code was there before, but maybe it got accidentally removed with recent changes?

becketqin · 2016-01-28T05:49:37Z

@apovzner Thanks for the direction on compatibility test. Extracting timestamp type out makes sense, given we already did that for CompressionType. I will change server side as well.

apovzner · 2016-01-28T18:39:07Z

core/src/main/scala/kafka/message/ByteBufferMessageSet.scala

+   * 2. If the message is using log append time and is an uncompressed message, this method will overwrite the
+   *    timestamp of the message.
+   */
+  private def validateTimestamp(message: Message,


The comment above the method does not match implementation anymore -- we are now only checking acceptable range for CreateTime timestamps.

apovzner · 2016-01-28T19:27:13Z

@becketqin I reviewed the KIP-32 part of the patch (did not go in detail about KIP-31 related changes). Using timestamp type in attributes made producer/consumer code much cleaner! I made minor comments. Otherwise looks good to me.

becketqin · 2016-01-28T23:37:32Z

@apovzner Thanks for the review. @guozhangwang @junrao Will you help take a look at the patch? Thanks.

becketqin · 2016-01-29T01:39:13Z

The test failure is intermittent and is not related to this change.

junrao · 2016-02-18T02:12:21Z

docs/upgrade.html

+<p><b>For a rolling upgrade:</b></p>
+
+<ol>
+    <li> Update server.properties file on all brokers and add the following properties: inter.broker.protocol.version=CURRENT_KAFKA_VERSION(e.g. 0.8.2, 0.9.0.0), message.format.version=CURRENT_KAFKA_VERSION(e.g. 0.8.2, 0.9.0.0) </li>


message.format.version change is an optimization. So, it's not really required. We can probably just cover that in the section on performance impact.

junrao · 2016-02-18T02:16:08Z

Thanks for the patch. Looks good overall. Just left a few minor comments. Also, in TopicCommand, when listing the available config options, could we add a description that messageFormat will be ignored if it's not consistent with the inter broker protocol setting?

ijuma · 2016-02-18T14:48:18Z

core/src/main/scala/kafka/log/Log.scala

          } catch {
            case e: IOException => throw new KafkaException("Error in validating messages while appending to log '%s'".format(name), e)
          }
          appendInfo.lastOffset = offset.get - 1
+          // If log append time is used, we put the timestamp assigned to the messages in the append info.
+          if (config.messageTimestampType == TimestampType.LOG_APPEND_TIME)
+            appendInfo.timestamp = now


Is there a reason why we don't pass the timestamp as a parameter to analyzeAndValidateMessageSet? That would mean that timestamp could be a val instead of var. It's a straightforward change, but it would mean that we read config.messageTimestampType outside the synchronized block. Is that a problem?

@ijuma If we do that, it seems possible to cause inconsistency order of message offset and timestamp. For example, message A comes and is stamped t1 by the broker, but before it is appended to the log, message B comes and is stamped t2 (t2 > t1) and gets appended to the log. After that, message A is appended. In this case, message A will have a smaller timestamp but a larger offset than message B, which is a bit confusing.

We can put everything in the synchronized block, but it seems not worth doing if we only want to change a var to a val.

Makes sense, thanks.

becketqin · 2016-02-19T00:27:41Z

@junrao Thanks for the patient review. I think I have addressed previous comments. Could you take another look?

junrao · 2016-02-19T01:51:17Z

core/src/test/scala/unit/kafka/log/FileMessageSetTest.scala

+    convertedMessageSet = fileMessageSet.toMessageFormat(Message.MagicValue_V1)
+    verifyConvertedMessageSet(convertedMessageSet, Message.MagicValue_V1)
+
+    def verifyConvertedMessageSet(convertedMessageSet: MessageSet, magicByte: Byte) {


Can this be private?

It seems we cannot add scope modifier to a code block. Compiler gives the following error:

/core/src/test/scala/unit/kafka/log/FileMessageSetTest.scala:260: illegal start of statement (no modifiers allowed here) private def verifyConvertedMessageSet(convertedMessageSet: MessageSet, magicByte: Byte) {

verifyConvertedMessageSet itself seems private by nature and only accessible in testMessageFormatConversion.

junrao · 2016-02-19T01:54:55Z

@becketqin : Thanks for the latest patch. It looks good to me. Once you address the last few minor comments, I can merge this in.

junrao · 2016-02-19T15:53:53Z

@becketqin : Thanks a lot for working on the patch! LGTM

ijuma · 2016-02-19T16:04:01Z

Nice work @becketqin. And the reviewers too. :)

becketqin · 2016-02-20T00:15:15Z

Thank @junrao and @ijuma so much for the great help on review!

guozhangwang · 2016-02-25T23:44:22Z

@junrao @becketqin Some of the streams tests were incorrect when adding the timestamp. For example in ProcessorStateManagerTest:

new ConsumerRecord<>(persistentStoreTopicName, 2, 0L, offset, TimestampType.CREATE_TIME, key, 0) should be

new ConsumerRecord<>(persistentStoreTopicName, 2, offset, 0L, TimestampType.CREATE_TIME, key, 0)

Actually I'm thinking if it harms to keep the old constructor for ConsumerRecord and make default values of 0L and TimestampType.CREATE_TIME, and revert all the changes in stream tests? That way we can be free of incorporating the metadata timestamp until it is supported.

ijuma · 2016-02-25T23:56:32Z

@guozhangwang Well-spotted. I actually wanted to suggest moving TimestampType before the timestamp to make this kind of error harder, but I only noticed this potential problem late in the process and then wasn't sure if it was worth the effort. Having real bugs instead of theoretical ones adds motivation.

I would prefer if we don't add the old ConsumerRecord constructor personally as ConsumerRecord is used outside of streams too. Maybe we could add a utility method in streams in the meantime?

guozhangwang · 2016-02-26T00:13:40Z

@ijuma Makes sense. The only place streams use ConsumerRecord directly is in TimestampExtractor, what kind of utility method do you have in mind?

ijuma · 2016-02-26T00:19:09Z

@guozhangwang I just mean a method like newConsumerRecord that behaves exactly like the old constructor. Then you could revert the changes in the streams tests and then do a search and replace in the streams folder.

guozhangwang · 2016-02-26T00:22:43Z

@ijuma Sounds good.

hachikuji reviewed Jan 19, 2016
View reviewed changes

becketqin force-pushed the KAFKA-3025 branch 2 times, most recently from 7eee789 to be41afe Compare January 21, 2016 07:24

apovzner reviewed Jan 21, 2016
View reviewed changes

apovzner reviewed Jan 28, 2016
View reviewed changes

junrao reviewed Feb 18, 2016
View reviewed changes

ijuma reviewed Feb 18, 2016
View reviewed changes

becketqin added 2 commits February 18, 2016 09:47

Addressed Jun's comments

200d69b

Addressed Ismael's comments.

99a4a27

junrao reviewed Feb 19, 2016
View reviewed changes

Addressed Jun's comments

22c74cd

asfgit closed this in 45c8195 Feb 19, 2016

andimiller mentioned this pull request Mar 29, 2018

Message timestamp support Spinoco/fs2-kafka#20

Open

KAFKA-3025 Added timetamp to Message and use relative offset. #764

KAFKA-3025 Added timetamp to Message and use relative offset. #764

Conversation

becketqin commented Jan 13, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

apovzner commented Jan 21, 2016

becketqin commented Jan 21, 2016

Choose a reason for hiding this comment

apovzner commented Jan 21, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guozhangwang commented Jan 26, 2016

becketqin commented Jan 26, 2016

becketqin commented Jan 26, 2016

guozhangwang commented Jan 26, 2016

becketqin commented Jan 27, 2016

apovzner commented Jan 28, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

apovzner commented Jan 28, 2016

becketqin commented Jan 28, 2016

Choose a reason for hiding this comment

apovzner commented Jan 28, 2016

becketqin commented Jan 28, 2016

becketqin commented Jan 29, 2016

Choose a reason for hiding this comment

junrao commented Feb 18, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

becketqin commented Feb 19, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

junrao commented Feb 19, 2016

junrao commented Feb 19, 2016

ijuma commented Feb 19, 2016

becketqin commented Feb 20, 2016

guozhangwang commented Feb 25, 2016

ijuma commented Feb 25, 2016

guozhangwang commented Feb 26, 2016

ijuma commented Feb 26, 2016

guozhangwang commented Feb 26, 2016