-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-50081][SQL] Codegen Support for XPath*(by Invoke & RuntimeReplaceable)
#48610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
XPath...(by Invoke & RuntimeReplaceable)XPath*(by Invoke & RuntimeReplaceable)
| case LongType => XPathLongEvaluator(path) | ||
| case FloatType => XPathFloatEvaluator(path) | ||
| case DoubleType => XPathDoubleEvaluator(path) | ||
| case dt if dt.isInstanceOf[StringType] => XPathStringEvaluator(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we need to support Collation (eg: StringType(...)), we have to write it this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it work after your changes? Just in case, are there any tests? cc @stefankandic @uros-db
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@panbingkun please add expression-level tests (using checkEvaluation) for xpath_* functions
see for example: CollationStringExpressionsSuite
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it work after your changes? Just in case, are there any tests? cc @stefankandic @uros-db
What I described may have caused misunderstandings,
Let me rephrase: This is written so that the case CollationExpressionWalkerSuite#SPARK-48280: Expression Walker for SQL query examples can be passed:
spark/sql/core/src/test/scala/org/apache/spark/sql/CollationExpressionWalkerSuite.scala
Lines 735 to 745 in be18b7c
| for (funInfo <- funInfos.filter(f => !toSkip.contains(f.getName))) { | |
| for (query <- "> .*;".r.findAllIn(funInfo.getExamples).map(s => s.substring(2))) { | |
| try { | |
| val resultUTF8 = sql(query) | |
| withSQLConf(SqlApiConf.DEFAULT_COLLATION -> "UTF8_LCASE") { | |
| val resultUTF8Lcase = sql(query) | |
| assert(resultUTF8.collect() === resultUTF8Lcase.collect()) | |
| } | |
| } catch { | |
| case e: SparkRuntimeException => assert(e.getCondition == "USER_RAISED_EXCEPTION") | |
| case other: Throwable => throw other |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@panbingkun please add expression-level tests (using
checkEvaluation) forxpath_*functions see for example:CollationStringExpressionsSuite
Currently, it is not supported. My description caused a misunderstanding, and I have corrected it.
MaxGekk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@panbingkun Could you resolve conflicts, please.
Updated, thanks! |
| case LongType => XPathLongEvaluator(path) | ||
| case FloatType => XPathFloatEvaluator(path) | ||
| case DoubleType => XPathDoubleEvaluator(path) | ||
| case dt if dt.isInstanceOf[StringType] => XPathStringEvaluator(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: just for consistency w/ other places:
| case dt if dt.isInstanceOf[StringType] => XPathStringEvaluator(path) | |
| case _: StringType => XPathStringEvaluator(path) |
| case FloatType => XPathFloatEvaluator(path) | ||
| case DoubleType => XPathDoubleEvaluator(path) | ||
| case dt if dt.isInstanceOf[StringType] => XPathStringEvaluator(path) | ||
| case ArrayType(elementType, _) if elementType.isInstanceOf[StringType] => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| case ArrayType(elementType, _) if elementType.isInstanceOf[StringType] => | |
| case ArrayType(_: StringType, _) => |
|
|
||
| object XPathEvaluatorFactory { | ||
| def create(dataType: DataType, path: UTF8String): XPathEvaluator = { | ||
| dataType match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chasing the dataType doesn't look nice. How about to distribute instantiations across expressions:
- Define in
XPathExtract:
protected def evaluator: XPathEvaluator
- and override it childs:
@transient override lazy val evaluator = XPathBooleanEvaluator(pathUTF8String)|
+1, LGTM. Merging to master. |
What changes were proposed in this pull request?
The pr aims to add
CodegenSupport forxpath*, include:xpath_booleanxpath_shortxpath_intxpath_longxpath_floatxpath_doublexpath_stringxpathWhy are the changes needed?
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Pass GA & Existed UT (eg:
XPathFunctionsSuite,XPathExpressionSuite,CollationSQLExpressionsSuite#*XPath*,CollationExpressionWalkerSuite)Was this patch authored or co-authored using generative AI tooling?
No.