[SPARK-51496][SQL] CaseInsensitiveStringMap comparison should ignore case #50275

huaxingao · 2025-03-14T05:46:37Z

What changes were proposed in this pull request?

Since both commandOptions and dsOptions are CaseInsensitiveStringMap objects, I think we should convert the keys and values to lowercase before comparing them

Why are the changes needed?

In iceberg/spark4.0 integration, I got a few assertion errors:

assertion failed
java.lang.AssertionError: assertion failed
	at scala.Predef$.assert(Predef.scala:264)
	at org.apache.spark.sql.execution.datasources.v2.V2Writes$.org$apache$spark$sql$execution$datasources$v2$V2Writes$$mergeOptions(V2Writes.scala:128)

The assertion error occurs when comparing commandOptions and dsOptions; the cases of the keys don't match.

assert(commandOptions == dsOptions)

Does this PR introduce any user-facing change?

No

How was this patch tested?

I have verified that the iceberg tests can pass with this fix.

Was this patch authored or co-authored using generative AI tooling?

no

…case

pan3793 · 2025-03-14T06:55:10Z

because one of commandOptions and dsOptions was canonicalized but another wasn't? if so, should it be better to do canonicalization on the caller side too?

huaxingao · 2025-03-16T00:31:22Z

@pan3793
When setting the option, Spark doesn't change the key to lower case. When creating a DataSourceV2Relation, the keys in dsOption are converted to lower case.
I think there are two ways to fix the problem, the first way is as what i did in the PR, the second way is to convert key to lower case in DataFrameWriter.option. The second way doesn't work, because in the iceberg test

    df2.select("id", "data")
            .sort("data")
            .write()
            .format("org.apache.iceberg.spark.source.ManualSource")
            .option(ManualSource.TABLE_NAME, manualTableName)

ManualSource.TABLE_NAME is upper case. If calling .option(ManualSource.TABLE_NAME, manualTableName) changing the key to lower case, I will get

Missing property TABLE_NAME
java.lang.IllegalArgumentException: Missing property TABLE_NAME
	at org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkArgument(Preconditions.java:143)
	at org.apache.iceberg.spark.source.ManualSource.getTable(ManualSource.java:64)
	at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.getTableFromProvider(DataSourceV2Utils.scala:98)

huaxingao · 2025-03-16T22:23:54Z

cc @cloud-fan

cloud-fan · 2025-03-17T13:33:18Z

shall we be case-preserving and do not lower case the keys when creating DataSourceV2Relation?

huaxingao · 2025-03-17T23:35:43Z

shall we be case-preserving and do not lower case the keys when creating DataSourceV2Relation?

When creating a CaseInsensitiveStringMap, the keys are changed to lower case in the constructor. This dsOptions is used to create DataSourceV2Relation.
Since DataSourceV2Relation.option is a CaseInsensitiveStringMap, I feel it's the correct behavior to change the key to lower case.

cloud-fan · 2025-03-18T05:35:01Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2Writes.scala

@@ -125,7 +125,14 @@ object V2Writes extends Rule[LogicalPlan] with PredicateHelper {
    // for DataFrame API cases, same options are carried by both Command and DataSourceV2Relation
    // for DataFrameV2 API cases, options are only carried by Command
    // for SQL cases, options are only carried by DataSourceV2Relation
-    assert(commandOptions == dsOptions || commandOptions.isEmpty || dsOptions.isEmpty)


The way we turn CaseInsensitiveStringMap to a scala Map is r.options.asScala.toMap, I think we should change it to r.options.asCaseSensitiveMap.asScala.toMap to be case preserving

cloud-fan · 2025-03-20T23:03:07Z

thanks, merging to master/4.0!

…case ### What changes were proposed in this pull request? Since both `commandOptions` and `dsOptions` are `CaseInsensitiveStringMap` objects, I think we should convert the keys and values to lowercase before comparing them ### Why are the changes needed? In iceberg/spark4.0 integration, I got a few assertion errors: ``` assertion failed java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:264) at org.apache.spark.sql.execution.datasources.v2.V2Writes$.org$apache$spark$sql$execution$datasources$v2$V2Writes$$mergeOptions(V2Writes.scala:128) ``` The assertion error occurs when comparing commandOptions and dsOptions; the cases of the keys don't match. ``` assert(commandOptions == dsOptions) ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? I have verified that the iceberg tests can pass with this fix. ### Was this patch authored or co-authored using generative AI tooling? no Closes #50275 from huaxingao/mergeOptions. Authored-by: huaxingao <huaxin.gao11@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit bfe63a3) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

huaxingao · 2025-03-20T23:33:09Z

Thanks @cloud-fan

[SPARK-51496][SQL] CaseInsensitiveStringMap comparison should ignore …

9d78c35

…case

github-actions bot added the SQL label Mar 14, 2025

only change key to lower case

094bb05

huaxingao force-pushed the mergeOptions branch from ece04f7 to 094bb05 Compare March 16, 2025 04:31

cloud-fan reviewed Mar 18, 2025

View reviewed changes

address comments

03e6937

cloud-fan approved these changes Mar 20, 2025

View reviewed changes

cloud-fan closed this in bfe63a3 Mar 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-51496][SQL] CaseInsensitiveStringMap comparison should ignore case #50275

[SPARK-51496][SQL] CaseInsensitiveStringMap comparison should ignore case #50275

huaxingao commented Mar 14, 2025

pan3793 commented Mar 14, 2025

huaxingao commented Mar 16, 2025

huaxingao commented Mar 16, 2025

cloud-fan commented Mar 17, 2025

huaxingao commented Mar 17, 2025

cloud-fan Mar 18, 2025

cloud-fan commented Mar 20, 2025

huaxingao commented Mar 20, 2025

[SPARK-51496][SQL] CaseInsensitiveStringMap comparison should ignore case #50275

[SPARK-51496][SQL] CaseInsensitiveStringMap comparison should ignore case #50275

Conversation

huaxingao commented Mar 14, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

pan3793 commented Mar 14, 2025

huaxingao commented Mar 16, 2025

huaxingao commented Mar 16, 2025

cloud-fan commented Mar 17, 2025

huaxingao commented Mar 17, 2025

cloud-fan Mar 18, 2025

Choose a reason for hiding this comment

cloud-fan commented Mar 20, 2025

huaxingao commented Mar 20, 2025