Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-27494][SS] Null keys/values don't work in Kafka source v2 #24441

Closed
wants to merge 6 commits into from

Conversation

@uncleGen
Copy link
Contributor

uncleGen commented Apr 23, 2019

What changes were proposed in this pull request?

Right now Kafka source v2 doesn't support null keys or values.

  • When processing a null key, all of the following keys in the same partition will be null. This is a correctness bug.
  • When processing a null value, it will throw NPE.

How was this patch tested?

add new unit tests

@uncleGen uncleGen changed the title Null values don't work in Kafka source v2 [SPARK-27494][SS] Null values don't work in Kafka source v2 Apr 23, 2019
@SparkQA
Copy link

SparkQA commented Apr 23, 2019

Test build #104829 has finished for PR 24441 at commit 5cb9502.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@uncleGen
Copy link
Contributor Author

uncleGen commented Apr 23, 2019

pending on adding new unit test

@SparkQA
Copy link

SparkQA commented Apr 24, 2019

Test build #104852 has finished for PR 24441 at commit 74bfb87.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA
Copy link

SparkQA commented Apr 24, 2019

Test build #104858 has finished for PR 24441 at commit aadad55.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.
@uncleGen
Copy link
Contributor Author

uncleGen commented Apr 24, 2019

retest this please

@SparkQA
Copy link

SparkQA commented Apr 24, 2019

Test build #104864 has finished for PR 24441 at commit aadad55.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
...est/scala/org/apache/spark/sql/kafka010/KafkaContinuousSourceSuite.scala Outdated Show resolved Hide resolved
@cloud-fan
Copy link
Contributor

cloud-fan commented Apr 24, 2019

good catch!

...est/scala/org/apache/spark/sql/kafka010/KafkaContinuousSourceSuite.scala Outdated Show resolved Hide resolved
@SparkQA
Copy link

SparkQA commented Apr 25, 2019

Test build #104885 has finished for PR 24441 at commit b6e9e31.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA
Copy link

SparkQA commented Apr 25, 2019

Test build #104886 has finished for PR 24441 at commit 2f30711.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
...est/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala Outdated Show resolved Hide resolved
@SparkQA
Copy link

SparkQA commented Apr 25, 2019

Test build #104889 has finished for PR 24441 at commit 9dd38eb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
...est/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala Outdated Show resolved Hide resolved
@uncleGen uncleGen force-pushed the uncleGen:SPARK-27494 branch to 45330de Apr 25, 2019
@SparkQA
Copy link

SparkQA commented Apr 25, 2019

Test build #104891 has finished for PR 24441 at commit 45330de.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA
Copy link

SparkQA commented Apr 25, 2019

Test build #104893 has finished for PR 24441 at commit d91977d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Apr 25, 2019

Although SPARK-27494 is reported on only v2, can we have a test coverage for Kafka source v1 too?

@uncleGen
Copy link
Contributor Author

uncleGen commented Apr 25, 2019

@dongjoon-hyun make sense, what about finishing it in a follow-up pr? v1 is already tested in KafkaMicroBatchV1SourceSuite

@cloud-fan
Copy link
Contributor

cloud-fan commented Apr 25, 2019

@dongjoon-hyun v1 is already tested because the test is in KafkaMicroBatchSourceSuiteBase

@uncleGen uncleGen closed this Apr 25, 2019
@uncleGen uncleGen reopened this Apr 25, 2019
Copy link
Contributor

gaborgsomogyi left a comment

Minor things found, basically looks good.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Apr 25, 2019

Got it. Thanks, @uncleGen and @cloud-fan !

@SparkQA
Copy link

SparkQA commented Apr 26, 2019

Test build #104920 has finished for PR 24441 at commit d2ffea1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@uncleGen uncleGen force-pushed the uncleGen:SPARK-27494 branch to 174321a Apr 26, 2019
@SparkQA
Copy link

SparkQA commented Apr 26, 2019

Test build #104921 has finished for PR 24441 at commit 174321a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@uncleGen
Copy link
Contributor Author

uncleGen commented Apr 26, 2019

@cloud-fan Take a second glance please.

Copy link
Member

zsxwing left a comment

LGTM

@cloud-fan cloud-fan closed this in d2656aa Apr 26, 2019
cloud-fan added a commit that referenced this pull request Apr 26, 2019
## What changes were proposed in this pull request?

Right now Kafka source v2 doesn't support null values. The issue is in org.apache.spark.sql.kafka010.KafkaRecordToUnsafeRowConverter.toUnsafeRow which doesn't handle null values.

## How was this patch tested?

add new unit tests

Closes #24441 from uncleGen/SPARK-27494.

Authored-by: uncleGen <hustyugm@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit d2656aa)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@cloud-fan
Copy link
Contributor

cloud-fan commented Apr 26, 2019

thanks, merging to master/2.4!

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Apr 26, 2019

Thank you all!

@zsxwing zsxwing changed the title [SPARK-27494][SS] Null values don't work in Kafka source v2 [SPARK-27494][SS] Null keys/values don't work in Kafka source v2 Apr 26, 2019
kai-chi added a commit to kai-chi/spark that referenced this pull request Jul 23, 2019
## What changes were proposed in this pull request?

Right now Kafka source v2 doesn't support null values. The issue is in org.apache.spark.sql.kafka010.KafkaRecordToUnsafeRowConverter.toUnsafeRow which doesn't handle null values.

## How was this patch tested?

add new unit tests

Closes apache#24441 from uncleGen/SPARK-27494.

Authored-by: uncleGen <hustyugm@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit d2656aa)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
kai-chi added a commit to kai-chi/spark that referenced this pull request Jul 23, 2019
## What changes were proposed in this pull request?

Right now Kafka source v2 doesn't support null values. The issue is in org.apache.spark.sql.kafka010.KafkaRecordToUnsafeRowConverter.toUnsafeRow which doesn't handle null values.

## How was this patch tested?

add new unit tests

Closes apache#24441 from uncleGen/SPARK-27494.

Authored-by: uncleGen <hustyugm@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit d2656aa)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
kai-chi added a commit to kai-chi/spark that referenced this pull request Jul 25, 2019
## What changes were proposed in this pull request?

Right now Kafka source v2 doesn't support null values. The issue is in org.apache.spark.sql.kafka010.KafkaRecordToUnsafeRowConverter.toUnsafeRow which doesn't handle null values.

## How was this patch tested?

add new unit tests

Closes apache#24441 from uncleGen/SPARK-27494.

Authored-by: uncleGen <hustyugm@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit d2656aa)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
kai-chi added a commit to kai-chi/spark that referenced this pull request Aug 1, 2019
## What changes were proposed in this pull request?

Right now Kafka source v2 doesn't support null values. The issue is in org.apache.spark.sql.kafka010.KafkaRecordToUnsafeRowConverter.toUnsafeRow which doesn't handle null values.

## How was this patch tested?

add new unit tests

Closes apache#24441 from uncleGen/SPARK-27494.

Authored-by: uncleGen <hustyugm@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit d2656aa)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

7 participants
You can’t perform that action at this time.