[SPARK-41375][SS] Avoid empty latest KafkaSourceOffset by wecharyu · Pull Request #38898 · apache/spark

wecharyu · 2022-12-03T19:20:55Z

What changes were proposed in this pull request?

Add the empty offset filter in latestOffset() for Kafka Source, so that offset remains unchanged if Kafka provides no topic partition during fetch.

Why are the changes needed?

KafkaOffsetReader may fetch empty partitions in some extreme cases like getting partitions while Kafka cluster is reassigning partitions, this will produce an empty PartitionOffsetMap (although there are topic-partitions being unchanged) and stored in committedOffsets after runBatch().

Then in the next batch, we fetch partitions normally and get the actual offsets, but when fetching data of this batch in KafkaOffsetReaderAdmin#getOffsetRangesFromResolvedOffsets() all partitions in endOffsets will be considered as new partitions since the startOffsets is empty, then these "new partitions" will fetch earliest offsets, which will cause the data duplication.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Add an unit test.

AmplabJenkins · 2022-12-04T22:26:22Z

Can one of the admins verify this patch?

jerrypeng

Can you write a unit test for this?

wecharyu · 2022-12-06T16:37:06Z

Can you write a unit test for this?

It seems a bit difficult to write unit test to cover the case where fetching empty partitions from Kafka cluster, any idea will be appreciated.

jerrypeng · 2022-12-07T07:45:08Z

@wecharyu can you run one batch and then delete all the partitions?

jerrypeng · 2022-12-07T17:55:16Z

To avoid the data duplication in the extreme cases where spark fetch empty latest Kafka source offset.

@wecharyu how does an empty latest Kafka source offset cause data duplication?

jerrypeng

To avoid the data duplication in the extreme cases where spark fetch empty latest Kafka source offset.

@wecharyu how does an empty latest Kafka source offset cause data duplication?

wecharyu · 2022-12-08T03:39:23Z

@jerrypeng the empty offset will be stored in committedOffsets, when we run next batch, the following code will record an empty map startOffset in newBatchesPlan:

spark/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala

Line 665 in 2179955

r.copy(startOffset = Some(start), endOffset = Some(end))

Then while fetching partitions, all the partitions are considered as "new partitions" and will fetch the earliest offsets, which will produce dupicate data.

spark/connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetReaderAdmin.scala

Lines 443 to 460 in 2179955

    
           override def getOffsetRangesFromResolvedOffsets( 
        
               fromPartitionOffsets: PartitionOffsetMap, 
        
               untilPartitionOffsets: PartitionOffsetMap, 
        
               reportDataLoss: String => Unit): Seq[KafkaOffsetRange] = { 
        
             // Find the new partitions, and get their earliest offsets 
        
             val newPartitions = untilPartitionOffsets.keySet.diff(fromPartitionOffsets.keySet) 
        
             val newPartitionInitialOffsets = fetchEarliestOffsets(newPartitions.toSeq) 
        
             if (newPartitionInitialOffsets.keySet != newPartitions) { 
        
               // We cannot get from offsets for some partitions. It means they got deleted. 
        
               val deletedPartitions = newPartitions.diff(newPartitionInitialOffsets.keySet) 
        
               reportDataLoss( 
        
                 s"Cannot find earliest offsets of ${deletedPartitions}. Some data may have been missed") 
        
             } 
        
             logInfo(s"Partitions added: $newPartitionInitialOffsets") 
        
             newPartitionInitialOffsets.filter(_._2 != 0).foreach { case (p, o) => 
        
               reportDataLoss( 
        
                 s"Added partition $p starts from $o instead of 0. Some data may have been missed") 
        
             }

HeartSaVioR · 2022-12-08T04:10:56Z

Could you please try to summarize the description of JIRA to PR template, especially the part of "Root Cause"? Also, is it "known" issue for Kafka consumer?

Also please note that we changed the default offset fetching mechanism from consumer group assignment from Kafka to active fetch via AdminClient, which won't have such issue.
https://github.com/apache/spark/blob/master/docs/ss-migration-guide.md#upgrading-from-structured-streaming-33-to-34

That said, your test case should turn on spark.sql.streaming.kafka.useDeprecatedOffsetFetching to true to not test with default config.

HeartSaVioR · 2022-12-08T04:24:19Z

I'm trying to understand the case - if my understanding is correct, the new test is just to trigger the same behavior rather than reproducing actual problem, right? In the new test, recognizing all topic partitions as new one and process all records in next microbatch is arguably NOT a wrong behavior for me, hence I really would like to understand the actual problem.

According to the JIRA description, the actual problem is that Kafka can "transiently" give no topic partition as assignment when it performs reassignment among consumers, specifically here:

      consumer.poll(0)
      val partitions = consumer.assignment()

which we expect Kafka to assign topic partitions to this consumer accordingly after calling poll.

Do I understand correctly?

(If I'm on the right track, the fix helps more for queries which failOnDataLoss is turned "on". Previously the query will just fail with surprising and incorrect error message - it's correct from Spark's point of view though - and after this fix the query won't fail.)

HeartSaVioR · 2022-12-08T04:25:38Z

...kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala

  }

+  test("SPARK-41375: empty partitions should not record to latest offset") {
+    val topicPrefix = newTopic()


Please set spark.sql.streaming.kafka.useDeprecatedOffsetFetching to true. You can do this with leveraging withSQLConf(...map of explicit config here...) { ...test code here... }

Well never mind. We're not reproducing the actual problem here, then it seems sufficient.

wecharyu · 2022-12-08T06:22:35Z

@HeartSaVioR yes you are right, the actual problem is that we may fetch empty partitions unexpectedly in one batch, and in the next batch we fetch the real partitions again. The new test is just used to mock the empty partitions, but it also make sense to not record the empty offset for the empty partitions.

HeartSaVioR

+1 Nice fix!

HeartSaVioR · 2022-12-08T08:10:48Z

Thanks! Merging to master/3.3.

### What changes were proposed in this pull request? Add the empty offset filter in `latestOffset()` for Kafka Source, so that offset remains unchanged if Kafka provides no topic partition during fetch. ### Why are the changes needed? KafkaOffsetReader may fetch empty partitions in some extreme cases like getting partitions while Kafka cluster is reassigning partitions, this will produce an empty `PartitionOffsetMap` (although there are topic-partitions being unchanged) and stored in `committedOffsets` after `runBatch()`. Then in the next batch, we fetch partitions normally and get the actual offsets, but when fetching data of this batch in `KafkaOffsetReaderAdmin#getOffsetRangesFromResolvedOffsets()` all partitions in endOffsets will be considered as new partitions since the startOffsets is empty, then these "new partitions" will fetch earliest offsets, which will cause the data duplication. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Add an unit test. Closes #38898 from wecharyu/SPARK-41375. Authored-by: wecharyu <yuwq1996@gmail.com> Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com> (cherry picked from commit 043475a) Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>

wecharyu · 2022-12-08T10:01:48Z

@jerrypeng @HeartSaVioR thanks for your review!

### What changes were proposed in this pull request? Add the empty offset filter in `latestOffset()` for Kafka Source, so that offset remains unchanged if Kafka provides no topic partition during fetch. ### Why are the changes needed? KafkaOffsetReader may fetch empty partitions in some extreme cases like getting partitions while Kafka cluster is reassigning partitions, this will produce an empty `PartitionOffsetMap` (although there are topic-partitions being unchanged) and stored in `committedOffsets` after `runBatch()`. Then in the next batch, we fetch partitions normally and get the actual offsets, but when fetching data of this batch in `KafkaOffsetReaderAdmin#getOffsetRangesFromResolvedOffsets()` all partitions in endOffsets will be considered as new partitions since the startOffsets is empty, then these "new partitions" will fetch earliest offsets, which will cause the data duplication. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Add an unit test. Closes apache#38898 from wecharyu/SPARK-41375. Authored-by: wecharyu <yuwq1996@gmail.com> Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>

[SPARK-41375][SS] Avoid empty latest KafkaSourceOffset

cfe6492

github-actions bot added SQL STRUCTURED STREAMING labels Dec 3, 2022

jerrypeng reviewed Dec 6, 2022

View reviewed changes

add an unit test

7318db3

jerrypeng approved these changes Dec 7, 2022

View reviewed changes

jerrypeng reviewed Dec 8, 2022

View reviewed changes

HeartSaVioR reviewed Dec 8, 2022

View reviewed changes

HeartSaVioR approved these changes Dec 8, 2022

View reviewed changes

HeartSaVioR closed this in 043475a Dec 8, 2022

Conversation

wecharyu commented Dec 3, 2022 • edited by HeartSaVioR Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

AmplabJenkins commented Dec 4, 2022

Uh oh!

jerrypeng left a comment

Choose a reason for hiding this comment

Uh oh!

wecharyu commented Dec 6, 2022

Uh oh!

jerrypeng commented Dec 7, 2022

Uh oh!

jerrypeng commented Dec 7, 2022

Uh oh!

jerrypeng left a comment

Choose a reason for hiding this comment

Uh oh!

wecharyu commented Dec 8, 2022

Uh oh!

HeartSaVioR commented Dec 8, 2022

Uh oh!

HeartSaVioR commented Dec 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HeartSaVioR Dec 8, 2022

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR Dec 8, 2022

Choose a reason for hiding this comment

Uh oh!

wecharyu commented Dec 8, 2022

Uh oh!

HeartSaVioR left a comment

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR commented Dec 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wecharyu commented Dec 8, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

wecharyu commented Dec 3, 2022 •

edited by HeartSaVioR

Loading

HeartSaVioR commented Dec 8, 2022 •

edited

Loading

HeartSaVioR commented Dec 8, 2022 •

edited

Loading