Skip to content

Conversation

@panbingkun
Copy link
Contributor

@panbingkun panbingkun commented Oct 23, 2024

What changes were proposed in this pull request?

The pr aims to add Codegen Support for xpath*, include:

  • xpath_boolean
  • xpath_short
  • xpath_int
  • xpath_long
  • xpath_float
  • xpath_double
  • xpath_string
  • xpath

Why are the changes needed?

  • improve codegen coverage.
  • simplified code.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass GA & Existed UT (eg: XPathFunctionsSuite, XPathExpressionSuite, CollationSQLExpressionsSuite#*XPath*, CollationExpressionWalkerSuite)

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Oct 23, 2024
@panbingkun panbingkun changed the title [SPARK-50081][SQL] Codegen Support for XPath...(by Invoke & RuntimeReplaceable) [SPARK-50081][SQL] Codegen Support for XPath*(by Invoke & RuntimeReplaceable) Oct 23, 2024
@panbingkun panbingkun marked this pull request as ready for review October 23, 2024 08:35
@panbingkun
Copy link
Contributor Author

cc @MaxGekk @cloud-fan

case LongType => XPathLongEvaluator(path)
case FloatType => XPathFloatEvaluator(path)
case DoubleType => XPathDoubleEvaluator(path)
case dt if dt.isInstanceOf[StringType] => XPathStringEvaluator(path)
Copy link
Contributor Author

@panbingkun panbingkun Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we need to support Collation (eg: StringType(...)), we have to write it this way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it work after your changes? Just in case, are there any tests? cc @stefankandic @uros-db

Copy link
Contributor

@uros-db uros-db Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@panbingkun please add expression-level tests (using checkEvaluation) for xpath_* functions
see for example: CollationStringExpressionsSuite

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it work after your changes? Just in case, are there any tests? cc @stefankandic @uros-db

What I described may have caused misunderstandings,
Let me rephrase: This is written so that the case CollationExpressionWalkerSuite#SPARK-48280: Expression Walker for SQL query examples can be passed:

for (funInfo <- funInfos.filter(f => !toSkip.contains(f.getName))) {
for (query <- "> .*;".r.findAllIn(funInfo.getExamples).map(s => s.substring(2))) {
try {
val resultUTF8 = sql(query)
withSQLConf(SqlApiConf.DEFAULT_COLLATION -> "UTF8_LCASE") {
val resultUTF8Lcase = sql(query)
assert(resultUTF8.collect() === resultUTF8Lcase.collect())
}
} catch {
case e: SparkRuntimeException => assert(e.getCondition == "USER_RAISED_EXCEPTION")
case other: Throwable => throw other

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@panbingkun please add expression-level tests (using checkEvaluation) for xpath_* functions see for example: CollationStringExpressionsSuite

Currently, it is not supported. My description caused a misunderstanding, and I have corrected it.

@panbingkun panbingkun requested a review from MaxGekk October 23, 2024 11:50
Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@panbingkun Could you resolve conflicts, please.

@panbingkun
Copy link
Contributor Author

@panbingkun Could you resolve conflicts, please.

Updated, thanks!

case LongType => XPathLongEvaluator(path)
case FloatType => XPathFloatEvaluator(path)
case DoubleType => XPathDoubleEvaluator(path)
case dt if dt.isInstanceOf[StringType] => XPathStringEvaluator(path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: just for consistency w/ other places:

Suggested change
case dt if dt.isInstanceOf[StringType] => XPathStringEvaluator(path)
case _: StringType => XPathStringEvaluator(path)

case FloatType => XPathFloatEvaluator(path)
case DoubleType => XPathDoubleEvaluator(path)
case dt if dt.isInstanceOf[StringType] => XPathStringEvaluator(path)
case ArrayType(elementType, _) if elementType.isInstanceOf[StringType] =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
case ArrayType(elementType, _) if elementType.isInstanceOf[StringType] =>
case ArrayType(_: StringType, _) =>


object XPathEvaluatorFactory {
def create(dataType: DataType, path: UTF8String): XPathEvaluator = {
dataType match {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chasing the dataType doesn't look nice. How about to distribute instantiations across expressions:

  • Define in XPathExtract:
  protected def evaluator: XPathEvaluator
  • and override it childs:
  @transient override lazy val evaluator = XPathBooleanEvaluator(pathUTF8String)

@MaxGekk
Copy link
Member

MaxGekk commented Nov 23, 2024

+1, LGTM. Merging to master.
Thank you, @panbingkun.

@MaxGekk MaxGekk closed this in 656ece1 Nov 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants