Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-43935][SQL][PYTHON][CONNECT] Add xpath_* functions to Scala and Python #41470

Closed
wants to merge 6 commits into from

Conversation

panbingkun
Copy link
Contributor

What changes were proposed in this pull request?

Add following functions:

  • xpath
  • xpath_boolean
  • xpath_double
  • xpath_float
  • xpath_int
  • xpath_long
  • xpath_number
  • xpath_short
  • xpath_string

to:

  • Scala API
  • Python API
  • Spark Connect Scala Client
  • Spark Connect Python Client

Why are the changes needed?

for parity

Does this PR introduce any user-facing change?

Yes, new functions.

How was this patch tested?

  • Add New UT.

@zhengruifeng
Copy link
Contributor

zhengruifeng commented Jun 6, 2023

@panbingkun many thanks for working on this.

since a XPathXXX expression takes an Expression typed path, I think we should:
1, using Column typed path on scala side;
2, using ColumnOrName path on python side, when the input is a string, it will be treated as a column name.

also cc @HyukjinKwon @beliefer

@panbingkun
Copy link
Contributor Author

@panbingkun many thanks for working on this.

since a XPathXXX expression takes an Expression typed path, I think we should: 1, using Column typed path on scala side; 2, using ColumnOrName path, when the input is a string, it will be treated as a column name.

also cc @HyukjinKwon @beliefer

Ok, let me fix it.

* @group "xml_funcs"
* @since 3.5.0
*/
def xpath(xml: Column, path: String): Column =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to @HyukjinKwon 's suggestion, please change path as Column.

@@ -28,6 +29,7 @@ class XPathFunctionsSuite extends QueryTest with SharedSparkSession {
test("xpath_boolean") {
val df = Seq("<a><b>b</b></a>").toDF("xml")
checkAnswer(df.selectExpr("xpath_boolean(xml, 'a/b')"), Row(true))
checkAnswer(df.select(xpath_boolean(col("xml"), lit("a/b"))), Row(true))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update them with
checkAnswer(df.select(xpath_boolean(col("xml"), lit("a/b"))), df.selectExpr("xpath_boolean(xml, 'a/b')"))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really necessary to do so in such an easily observable situation?
@zhengruifeng

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine in this case, since it implicitly compares with above df.selectExpr("xpath_boolean(xml, 'a/b')")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

@beliefer
Copy link
Contributor

beliefer commented Jun 8, 2023

LGTM+1.

@zhengruifeng
Copy link
Contributor

merged to master

czxm pushed a commit to czxm/spark that referenced this pull request Jun 12, 2023
…d Python

### What changes were proposed in this pull request?
Add following functions:

- xpath
- xpath_boolean
- xpath_double
- xpath_float
- xpath_int
- xpath_long
- xpath_number
- xpath_short
- xpath_string

to:

- Scala API
- Python API
- Spark Connect Scala Client
- Spark Connect Python Client

### Why are the changes needed?
for parity

### Does this PR introduce _any_ user-facing change?
Yes, new functions.

### How was this patch tested?
- Add New UT.

Closes apache#41470 from panbingkun/SPARK-43935.

Authored-by: panbingkun <pbk1982@gmail.com>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants