[SPARK-32832][SS] Use CaseInsensitiveMap for DataStreamReader/Writer options #29702

dongjoon-hyun · 2020-09-10T01:03:48Z

What changes were proposed in this pull request?

This PR aims to fix indeterministic behavior on DataStreamReader/Writer options like the following.

scala> spark.readStream.format("parquet").option("paTh", "1").option("PATH", "2").option("Path", "3").option("patH", "4").option("path", "5").load()
org.apache.spark.sql.AnalysisException: Path does not exist: 1;

Why are the changes needed?

This will make the behavior deterministic.

Does this PR introduce any user-facing change?

Yes, but the previous behavior is indeterministic.

How was this patch tested?

Pass the newly test cases.

…options

dongjoon-hyun · 2020-09-10T01:11:47Z

cc @cloud-fan and @HeartSaVioR

cloud-fan · 2020-09-10T01:52:57Z

sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala

@@ -535,5 +536,5 @@ final class DataStreamReader private[sql](sparkSession: SparkSession) extends Lo

  private var userSpecifiedSchema: Option[StructType] = None

-  private var extraOptions = new scala.collection.mutable.HashMap[String, String]
+  private var extraOptions = CaseInsensitiveMap[String](Map.empty)


DataStreamReader support reading DS v2, shall we get the original map at that place?

Thanks. It's updated now.

SparkQA · 2020-09-10T06:03:12Z

Test build #128475 has finished for PR 29702 at commit ed225b9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-09-10T06:30:13Z

Test build #128477 has finished for PR 29702 at commit 19f1486.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2020-09-10T06:41:14Z

Thank you again, @cloud-fan . Merged to master/3.0.

…options This PR aims to fix indeterministic behavior on DataStreamReader/Writer options like the following. ```scala scala> spark.readStream.format("parquet").option("paTh", "1").option("PATH", "2").option("Path", "3").option("patH", "4").option("path", "5").load() org.apache.spark.sql.AnalysisException: Path does not exist: 1; ``` This will make the behavior deterministic. Yes, but the previous behavior is indeterministic. Pass the newly test cases. Closes #29702 from dongjoon-hyun/SPARK-32832. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 2f85f95) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

SparkQA · 2020-09-10T07:05:03Z

Test build #128483 has finished for PR 29702 at commit 3c48139.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

…iter options ### What changes were proposed in this pull request? This is a backport of #29702 . This PR aims to fix indeterministic behavior on DataStreamReader/Writer options like the following. ```scala scala> spark.readStream.format("parquet").option("paTh", "1").option("PATH", "2").option("Path", "3").option("patH", "4").option("path", "5").load() org.apache.spark.sql.AnalysisException: Path does not exist: 1; ``` ### Why are the changes needed? This will make the behavior deterministic. ### Does this PR introduce _any_ user-facing change? Yes, but the previous behavior is indeterministic. ### How was this patch tested? Pass the newly test cases. Closes #29707 from dongjoon-hyun/SPARK-32832-2. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

…options This PR aims to fix indeterministic behavior on DataStreamReader/Writer options like the following. ```scala scala> spark.readStream.format("parquet").option("paTh", "1").option("PATH", "2").option("Path", "3").option("patH", "4").option("path", "5").load() org.apache.spark.sql.AnalysisException: Path does not exist: 1; ``` This will make the behavior deterministic. Yes, but the previous behavior is indeterministic. Pass the newly test cases. Closes apache#29702 from dongjoon-hyun/SPARK-32832. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 2f85f95) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

[SPARK-32832][SS] Use CaseInsensitiveMap for DataStreamReader/Writer …

ed225b9

…options

probot-autolabeler bot added SQL STRUCTURED STREAMING labels Sep 10, 2020

fix test name

19f1486

cloud-fan reviewed Sep 10, 2020

View reviewed changes

Address comments

3c48139

cloud-fan approved these changes Sep 10, 2020

View reviewed changes

dongjoon-hyun closed this in 2f85f95 Sep 10, 2020

dongjoon-hyun deleted the SPARK-32832 branch September 10, 2020 06:48

dongjoon-hyun mentioned this pull request Sep 10, 2020

[SPARK-32832][SS][2.4] Use CaseInsensitiveMap for DataStreamReader/Writer options #29707

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-32832][SS] Use CaseInsensitiveMap for DataStreamReader/Writer options #29702

[SPARK-32832][SS] Use CaseInsensitiveMap for DataStreamReader/Writer options #29702

dongjoon-hyun commented Sep 10, 2020 •

edited

Loading

dongjoon-hyun commented Sep 10, 2020

cloud-fan Sep 10, 2020

dongjoon-hyun Sep 10, 2020

SparkQA commented Sep 10, 2020

SparkQA commented Sep 10, 2020

dongjoon-hyun commented Sep 10, 2020 •

edited

Loading

SparkQA commented Sep 10, 2020

[SPARK-32832][SS] Use CaseInsensitiveMap for DataStreamReader/Writer options #29702

[SPARK-32832][SS] Use CaseInsensitiveMap for DataStreamReader/Writer options #29702

Conversation

dongjoon-hyun commented Sep 10, 2020 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

dongjoon-hyun commented Sep 10, 2020

cloud-fan Sep 10, 2020

Choose a reason for hiding this comment

dongjoon-hyun Sep 10, 2020

Choose a reason for hiding this comment

SparkQA commented Sep 10, 2020

SparkQA commented Sep 10, 2020

dongjoon-hyun commented Sep 10, 2020 • edited Loading

SparkQA commented Sep 10, 2020

dongjoon-hyun commented Sep 10, 2020 •

edited

Loading

dongjoon-hyun commented Sep 10, 2020 •

edited

Loading