Skip to content

Commit

Permalink
[SPARK-25214][SS][FOLLOWUP] Fix the issue that Kafka v2 source may re…
Browse files Browse the repository at this point in the history
…turn duplicated records when `failOnDataLoss=false`

## What changes were proposed in this pull request?

This is a follow up PR for #22207 to fix a potential flaky test. `processAllAvailable` doesn't work for continuous processing so we should not use it for a continuous query.

## How was this patch tested?

Jenkins.

Closes #22230 from zsxwing/SPARK-25214-2.

Authored-by: Shixiong Zhu <zsxwing@gmail.com>
Signed-off-by: Shixiong Zhu <zsxwing@gmail.com>
  • Loading branch information
zsxwing committed Aug 25, 2018
1 parent 6c66ab8 commit c17a8ff
Showing 1 changed file with 6 additions and 2 deletions.
Expand Up @@ -80,7 +80,7 @@ trait KafkaMissingOffsetsTest extends SharedSQLContext {
}
}

class KafkaDontFailOnDataLossSuite extends KafkaMissingOffsetsTest {
class KafkaDontFailOnDataLossSuite extends StreamTest with KafkaMissingOffsetsTest {

import testImplicits._

Expand Down Expand Up @@ -165,7 +165,11 @@ class KafkaDontFailOnDataLossSuite extends KafkaMissingOffsetsTest {
.trigger(Trigger.Continuous(100))
.start()
try {
query.processAllAvailable()
// `processAllAvailable` doesn't work for continuous processing, so just wait until the last
// record appears in the table.
eventually(timeout(streamingTimeout)) {
assert(spark.table(table).as[String].collect().contains("49"))
}
} finally {
query.stop()
}
Expand Down

0 comments on commit c17a8ff

Please sign in to comment.