Skip to content

Conversation

panbingkun
Copy link
Contributor

What changes were proposed in this pull request?

The pr aims to add Codegen Support for schema_of_xml.

Why are the changes needed?

  • improve codegen coverage.
  • simplified code.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass GA & Existed UT (eg: XmlFunctionsSuite#*schema_of_xml*)

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Oct 22, 2024
@panbingkun panbingkun marked this pull request as ready for review November 12, 2024 11:49
@panbingkun
Copy link
Contributor Author

cc @MaxGekk @cloud-fan

import org.apache.spark.sql.types.{ArrayType, DataType, StructType}
import org.apache.spark.unsafe.types.UTF8String

case class SchemaOfXmlEvaluator(options: Map[String, String]) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about to place it inside of a XmlExpressionEvalUtils object as you did for JSON expressions in JsonExpressionEvalUtils?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, updated. Thanks!

@MaxGekk
Copy link
Member

MaxGekk commented Nov 13, 2024

+1, LGTM. Merging to master.
Thank you, @panbingkun.

@MaxGekk MaxGekk closed this in bc9b259 Nov 13, 2024
@dongjoon-hyun
Copy link
Member

Thank you, @panbingkun and @MaxGekk .

@panbingkun
Copy link
Contributor Author

Thank all, @MaxGekk and @dongjoon-hyun ❤️

dataType,
"schemaOfXml",
Seq(Literal(xmlInferSchema, xmlInferSchemaObjectType), child),
Seq(xmlInferSchemaObjectType, child.dataType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should set returnNullable to false, otherwise it's a regression. SchemaOfXml#nullable always return false.

cc @panbingkun

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I have fixed this issue with followup #48987. I think scheme_of_xml, schema_of_csv and schema_of_json has similar issues, so I will fix them all at once.
Thanks!

cloud-fan pushed a commit that referenced this pull request Nov 28, 2024
…lable as false

### What changes were proposed in this pull request?
The pr is following up [schema_of_json](#48473), [schema_of_xml](#48594) and [schema_of_csv](#48595), to make  returnNullable as false.

### Why are the changes needed?
As `cloud-fan`'s comment https://github.com/apache/spark/pull/48594/files#r1860534460, we should follow the original logic, otherwise it's a regression.

https://github.com/apache/spark/blob/1a502d32ef5a69739e10b827be4c9063b2a20493/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala#L846

https://github.com/apache/spark/blob/1a502d32ef5a69739e10b827be4c9063b2a20493/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/xmlExpressions.scala#L166

https://github.com/apache/spark/blob/1a502d32ef5a69739e10b827be4c9063b2a20493/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/csvExpressions.scala#L141

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #48987 from panbingkun/SPARK-50066_FOLLOWUP.

Authored-by: panbingkun <panbingkun@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants