CC-4318 Fix off-by-one error for offset reporting to Kafka topic #425

cyrusv · 2019-04-06T00:42:56Z

Consumer groups were reporting current-offset as one less than
log-end-offset in the fully caught-up state.

This fixes the topic's perception of current offset to match
the current offset we are storing in HDFS

rhauch

Thanks, @cyrusv. I have a question below ... not convinced this is the right approach.

rhauch · 2019-04-06T13:28:36Z

src/main/java/io/confluent/connect/hdfs/TopicPartitionWriter.java

-   */
-  public Long committedOffset() {
-    return committedOffset;
+  public long offset() {


@cyrusv, I'm not sure I understand why we're returning 1 past what was last committed to HDFS. We need to verify that what is committed to the consumer offsets exactly matches what is committed to HDFS. I'm not convinced this really identified/fixed the bug.

Second, offset is initialized to -1L, so is it ever possible that this method return that? Before this change, the offset getter might return null, but the code in DataWriter looked for this case and never wrote that into the offsets.

Finally, we lost JavaDoc - can we put it back to explain what offset this actually returns?

So I wonder if this is the source of the confusion: the consumer offset should point to the offset of the next record to be consumed, whereas the prior change recorded for the consumer offset the last record committed to HDFS.

It'd be good to document this, though I still think the offset() method should mention why it's 1 past the last committed offset and why (to match the consumer offsets). Also, we still need to handle the -1 case mentioned above.

Thanks for the feedback, @rhauch -- I've renamed some variables and added some documentation to clarify the work done here. You can see the work is most closely related to 284fbc2, for reference.

I decided to stick to a primitive with -1 initial value, since a lot of logic in the TopicPartitionWriter tests against -1 and I think keeping this code as-is (relative to pre-284fbc2) with a clearer name will provide the most stability. I could be convinced that null initial value would be preferable if we'd rather have NPEs than misleading long values, even though I decided it wasn't worth the tradeoff since we've seen this structure work successfully in many production envs.

rhauch

Thanks, @cyrusv. I think it's worth trying to minimize the changes and risk, so I have a few suggestions below. Perhaps the biggest is to realize that the TopicPartitionWriter has no notion of "consumer", and so "consumer" in fields and method names doesn't really make sense.

src/main/java/io/confluent/connect/hdfs/DataWriter.java

rhauch · 2019-04-09T16:59:41Z

src/main/java/io/confluent/connect/hdfs/TopicPartitionWriter.java

-    offset = -1L;
-    committedOffset = null;
+    // The next offset to consume after the last commit (one more than last offset written to HDFS)
+    committedConsumerOffset = -1;


I'm not sure that using "Consumer" in the variable name makes sense, because this offset really is all about the record offset that this writer has committed to HDFS.

I'm also not a fan of removing or renaming the old offset variable, mostly because that incurs changes on more lines, potentially affects the logic (especially on line 337), and makes this PR larger than it needs to be. Because the old committedOffset field would be changed to be the same, why not remove committedOffset instead and then add a comment around line 568 about why the + 1 is used. Then, the JavaDoc for committedOffset() can be changed as you have in this PR.

src/main/java/io/confluent/connect/hdfs/TopicPartitionWriter.java

rhauch · 2019-04-09T17:01:56Z

src/main/java/io/confluent/connect/hdfs/TopicPartitionWriter.java

   */
-  public Long committedOffset() {
-    return committedOffset;
+  public long committedConsumerOffset() {


Again, this method doesn't know about the consumer -- this offset literally is 1 past the offset of the last record committed to HDFS. Perhaps rename it to nextOffset() instead (as suggested by your test).

src/main/java/io/confluent/connect/hdfs/TopicPartitionWriter.java

Consumer groups were reporting current-offset as one less than log-end-offset in the fully caught-up state. This fixes the topic's perception of current offset to match the current offset we are storing in HDFS

rhauch

One more request to fix JavaDoc.

rhauch

One more request to fix JavaDoc. Otherwise this looks great.

src/main/java/io/confluent/connect/hdfs/FileUtils.java

rhauch

LGTM. Thanks, @cyrusv!

cyrusv requested a review from a team April 6, 2019 00:58

rhauch reviewed Apr 6, 2019

View reviewed changes

cyrusv force-pushed the cyrus-offsets branch from aff9b2c to afd77e1 Compare April 9, 2019 00:08

cyrusv changed the title ~~Fix off-by-one error for offset reporting to Kafka topic~~ CC-4318 Fix off-by-one error for offset reporting to Kafka topic Apr 9, 2019

rhauch reviewed Apr 9, 2019

View reviewed changes

cyrusv added 2 commits April 9, 2019 11:24

Fix off-by-one error for offset reporting to Kafka topic

7ad3b2c

Consumer groups were reporting current-offset as one less than log-end-offset in the fully caught-up state. This fixes the topic's perception of current offset to match the current offset we are storing in HDFS

Cleanup

667c824

cyrusv force-pushed the cyrus-offsets branch from afd77e1 to 667c824 Compare April 9, 2019 20:04

cyrusv added 2 commits April 9, 2019 13:28

Update docstring

95b61b7

checkstyle

19733a3

rhauch reviewed Apr 11, 2019

View reviewed changes

src/main/java/io/confluent/connect/hdfs/FileUtils.java Show resolved Hide resolved

Fix Javadoc

4647e83

rhauch approved these changes Apr 12, 2019

View reviewed changes

cyrusv merged commit ec22bdb into confluentinc:4.0.x Apr 12, 2019

cyrusv deleted the cyrus-offsets branch April 12, 2019 19:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CC-4318 Fix off-by-one error for offset reporting to Kafka topic #425

CC-4318 Fix off-by-one error for offset reporting to Kafka topic #425

cyrusv commented Apr 6, 2019

rhauch left a comment

rhauch Apr 6, 2019

rhauch Apr 8, 2019 •

edited

cyrusv Apr 9, 2019

cyrusv Apr 9, 2019

rhauch left a comment

rhauch Apr 9, 2019

rhauch Apr 9, 2019

rhauch left a comment

rhauch left a comment

rhauch left a comment

CC-4318 Fix off-by-one error for offset reporting to Kafka topic #425

CC-4318 Fix off-by-one error for offset reporting to Kafka topic #425

Conversation

cyrusv commented Apr 6, 2019

rhauch left a comment

Choose a reason for hiding this comment

rhauch Apr 6, 2019

Choose a reason for hiding this comment

rhauch Apr 8, 2019 • edited

Choose a reason for hiding this comment

cyrusv Apr 9, 2019

Choose a reason for hiding this comment

cyrusv Apr 9, 2019

Choose a reason for hiding this comment

rhauch left a comment

Choose a reason for hiding this comment

rhauch Apr 9, 2019

Choose a reason for hiding this comment

rhauch Apr 9, 2019

Choose a reason for hiding this comment

rhauch left a comment

Choose a reason for hiding this comment

rhauch left a comment

Choose a reason for hiding this comment

rhauch left a comment

Choose a reason for hiding this comment

rhauch Apr 8, 2019 •

edited