[SPARK-36680][SQL][FOLLOWUP] Files with options should be put into resolveDataSource function #47370

logze · 2024-07-16T12:37:09Z

What changes were proposed in this pull request?

When reading csv, json and other files, pass the options parameter to the rules.resolveDataSource method to make the options parameter effective.

This is a bug fix for #46707 @szehon-ho

Why are the changes needed?

For the following SQL, the options parameter passed in does not take effect. This is because the rules.resolveDataSource method does not pass the options parameter during the datasource construction process

 SELECT * FROM csv.`/test/data.csv` WITH (`header` = true, 'delimiter' = '|')

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit test in SQLQuerySuite

Was this patch authored or co-authored using generative AI tooling?

No

…solveDataSource function

szehon-ho

Looks good to me, minor nit. Thanks for supporting Files DS

szehon-ho · 2024-07-16T18:56:24Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala

+      sparkSession,
+      paths = Seq(ident.last),
+      className = ident.head,
+      options = unresolved.options.asCaseSensitiveMap.asScala.toMap)


I think we can remove .asCaseSensitiveMap

Thank you , I have removed this asCaseSensitiveMap .

logze · 2024-07-17T11:12:57Z

cc @huaxingao @cloud-fan could you review this PR? This is my first PR in the Spark community. Thank you ：）

cloud-fan · 2024-07-17T14:26:51Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala

@@ -19,6 +19,8 @@ package org.apache.spark.sql.execution.datasources

 import java.util.Locale

+import scala.jdk.CollectionConverters.MapHasAsScala


I think it's more common to just import scala.jdk.CollectionConverters._

cloud-fan · 2024-07-17T14:27:27Z

sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala

+
+  test("SPARK-36680: Files hint options should be put into resolveDataSource function") {
+    val df1 = spark.range(100).toDF()
+    withTempPath(f => {


nit:

withTempPath { f => }

cloud-fan · 2024-07-17T14:28:51Z

sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala

+        """.stripMargin
+      )
+      checkAnswer(df2, df1)
+      df2.queryExecution.analyzed foreach {


to make the test more explicit, I suggest to do

val relations = df2.queryExecution.analyzed.collect { case LogicalRelation(fs: HadoopFsRelation, _, _, _) => fs } assert(relations.length == 1) assert(relations.head.options == ...)

logze · 2024-07-18T02:37:09Z

@cloud-fan Thank you very much, I modified the code according to your suggestion, can you take a look at it again?

cloud-fan · 2024-07-18T07:58:14Z

thanks, merging to master!

…solveDataSource function ### What changes were proposed in this pull request? When reading csv, json and other files, pass the options parameter to the rules.resolveDataSource method to make the options parameter effective. This is a bug fix for [apache#46707](apache#46707) szehon-ho ### Why are the changes needed? For the following SQL, the options parameter passed in does not take effect. This is because the rules.resolveDataSource method does not pass the options parameter during the datasource construction process ``` SELECT * FROM csv.`/test/data.csv` WITH (`header` = true, 'delimiter' = '|') ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit test in SQLQuerySuite ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#47370 from logze/hint-options. Authored-by: lizongze <lizongze@xiaomi.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

logze added 3 commits July 15, 2024 16:48

[SPARK-36680][SQL][FOLLOWUP] Files hint options should be put into re…

17cb9ab

…solveDataSource function

add unit test

9e1f5bc

fix code style

d402899

github-actions bot added the SQL label Jul 16, 2024

szehon-ho reviewed Jul 16, 2024

View reviewed changes

remove asCaseSensitiveMap

a21e9cd

cloud-fan reviewed Jul 17, 2024

View reviewed changes

make tests more explicit and fix rules import

9e2e420

cloud-fan approved these changes Jul 18, 2024

View reviewed changes

cloud-fan closed this in 1a428c1 Jul 18, 2024

logze deleted the hint-options branch July 18, 2024 11:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-36680][SQL][FOLLOWUP] Files with options should be put into resolveDataSource function #47370

[SPARK-36680][SQL][FOLLOWUP] Files with options should be put into resolveDataSource function #47370

logze commented Jul 16, 2024 •

edited

Loading

szehon-ho left a comment

szehon-ho Jul 16, 2024

logze Jul 17, 2024 •

edited

Loading

logze commented Jul 17, 2024

cloud-fan Jul 17, 2024

cloud-fan Jul 17, 2024

cloud-fan Jul 17, 2024

logze commented Jul 18, 2024

cloud-fan commented Jul 18, 2024

		@@ -19,6 +19,8 @@ package org.apache.spark.sql.execution.datasources

		import java.util.Locale

		import scala.jdk.CollectionConverters.MapHasAsScala

[SPARK-36680][SQL][FOLLOWUP] Files with options should be put into resolveDataSource function #47370

[SPARK-36680][SQL][FOLLOWUP] Files with options should be put into resolveDataSource function #47370

Conversation

logze commented Jul 16, 2024 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

szehon-ho left a comment

Choose a reason for hiding this comment

szehon-ho Jul 16, 2024

Choose a reason for hiding this comment

logze Jul 17, 2024 • edited Loading

Choose a reason for hiding this comment

logze commented Jul 17, 2024

cloud-fan Jul 17, 2024

Choose a reason for hiding this comment

cloud-fan Jul 17, 2024

Choose a reason for hiding this comment

cloud-fan Jul 17, 2024

Choose a reason for hiding this comment

logze commented Jul 18, 2024

cloud-fan commented Jul 18, 2024

logze commented Jul 16, 2024 •

edited

Loading

logze Jul 17, 2024 •

edited

Loading