-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-48356][FOLLOW UP][SQL] Improve FOR statement's column schema inference #51053
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cc @cloud-fan @dejankrak-db @miland-db @dusantism-db please review |
@@ -122,6 +122,8 @@ case class StructType(fields: Array[StructField]) extends DataType with Seq[Stru | |||
private lazy val nameToIndex: Map[String, Int] = SparkCollectionUtils.toMapWithIndex(fieldNames) | |||
private lazy val nameToIndexCaseInsensitive: CaseInsensitiveMap[Int] = | |||
CaseInsensitiveMap[Int](nameToIndex.toMap) | |||
lazy val nameToDataType: collection.immutable.Map[String, DataType] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StructType
is a public API, we should only add new methods when we have to. It's also in the Spark Connect side, which means users need to upgrade the client version.
Can we build this map in the caller side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed. Wasn't aware of it, thanks!
sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala
Outdated
Show resolved
Hide resolved
…ptingExecutionNode.scala
The linter failure is unrelated, thanks, merging to master! |
…nference ### What changes were proposed in this pull request? This pull request changes `FOR` statement to infer column schemas from the query DataFrame, and no longer implicitly infer column schema in SetVariable. This is necessary due to type mismatch errors with complex nested types, e.g. `ARRAY<STRUCT<..>>`. ### Why are the changes needed? Bug fix for FOR statement. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New unit test that specifically targets problematic case. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #51053 from davidm-db/for_schema_inference. Lead-authored-by: David Milicevic <david.milicevic@databricks.com> Co-authored-by: David Milicevic <163021185+davidm-db@users.noreply.github.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 23e6274) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…nference ### What changes were proposed in this pull request? This pull request changes `FOR` statement to infer column schemas from the query DataFrame, and no longer implicitly infer column schema in SetVariable. This is necessary due to type mismatch errors with complex nested types, e.g. `ARRAY<STRUCT<..>>`. ### Why are the changes needed? Bug fix for FOR statement. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New unit test that specifically targets problematic case. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#51053 from davidm-db/for_schema_inference. Lead-authored-by: David Milicevic <david.milicevic@databricks.com> Co-authored-by: David Milicevic <163021185+davidm-db@users.noreply.github.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
This pull request changes
FOR
statement to infer column schemas from the query DataFrame, and no longer implicitly infer column schema in SetVariable. This is necessary due to type mismatch errors with complex nested types, e.g.ARRAY<STRUCT<..>>
.Why are the changes needed?
Bug fix for FOR statement.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
New unit test that specifically targets problematic case.
Was this patch authored or co-authored using generative AI tooling?
No.