-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-50066][SQL] Codegen Support for SchemaOfXml
(by Invoke & RuntimeReplaceable)
#48594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…timeReplaceable)
import org.apache.spark.sql.types.{ArrayType, DataType, StructType} | ||
import org.apache.spark.unsafe.types.UTF8String | ||
|
||
case class SchemaOfXmlEvaluator(options: Map[String, String]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about to place it inside of a XmlExpressionEvalUtils
object as you did for JSON expressions in JsonExpressionEvalUtils
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, updated. Thanks!
+1, LGTM. Merging to master. |
Thank you, @panbingkun and @MaxGekk . |
Thank all, @MaxGekk and @dongjoon-hyun ❤️ |
dataType, | ||
"schemaOfXml", | ||
Seq(Literal(xmlInferSchema, xmlInferSchemaObjectType), child), | ||
Seq(xmlInferSchemaObjectType, child.dataType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should set returnNullable
to false, otherwise it's a regression. SchemaOfXml#nullable
always return false.
cc @panbingkun
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I have fixed this issue with followup #48987. I think scheme_of_xml
, schema_of_csv
and schema_of_json
has similar issues, so I will fix them all at once.
Thanks!
…lable as false ### What changes were proposed in this pull request? The pr is following up [schema_of_json](#48473), [schema_of_xml](#48594) and [schema_of_csv](#48595), to make returnNullable as false. ### Why are the changes needed? As `cloud-fan`'s comment https://github.com/apache/spark/pull/48594/files#r1860534460, we should follow the original logic, otherwise it's a regression. https://github.com/apache/spark/blob/1a502d32ef5a69739e10b827be4c9063b2a20493/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala#L846 https://github.com/apache/spark/blob/1a502d32ef5a69739e10b827be4c9063b2a20493/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/xmlExpressions.scala#L166 https://github.com/apache/spark/blob/1a502d32ef5a69739e10b827be4c9063b2a20493/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/csvExpressions.scala#L141 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48987 from panbingkun/SPARK-50066_FOLLOWUP. Authored-by: panbingkun <panbingkun@apache.org> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
The pr aims to add
Codegen
Support forschema_of_xml
.Why are the changes needed?
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Pass GA & Existed UT (eg: XmlFunctionsSuite#
*schema_of_xml*
)Was this patch authored or co-authored using generative AI tooling?
No.