Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-43939][CONNECT][PYTHON] Add try_* functions to Scala and Python #41653

Closed
wants to merge 8 commits into from

Conversation

panbingkun
Copy link
Contributor

@panbingkun panbingkun commented Jun 19, 2023

What changes were proposed in this pull request?

Add following functions:

  • try_add
  • try_avg
  • try_divide
  • try_element_at
  • try_multiply
  • try_subtract
  • try_sum
  • try_to_binary
  • try_to_number
  • try_to_timestamp

to:

  • Scala API
  • Python API
  • Spark Connect Scala Client
  • Spark Connect Python Client

Why are the changes needed?

for parity

Does this PR introduce any user-facing change?

Yes, new functions.

How was this patch tested?

  • Add New UT.

* Returns `dividend``/``divisor`. It always performs floating point division. Its result is
* always null if `divisor` is 0.
*
* @note
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this PR, let's use call_udf for better parity

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌🏻

* Returns the sum of `left` and `right` and the result is null on overflow. The acceptable
* input types are the same with the `+` operator.
*
* @note
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be supported naturally in Connect

* Returns `left``*``right` and the result is null on overflow. The acceptable input types are the
* same with the `*` operator.
*
* @note
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

* Returns `left`-`right` and the result is null on overflow. The acceptable input types are the
* same with the `-` operator.
*
* @note
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto


.. versionadded:: 3.5.0

Notes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto


.. versionadded:: 3.5.0

Notes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto


.. versionadded:: 3.5.0

Notes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

* Returns the sum of `left` and `right` and the result is null on overflow. The acceptable
* input types are the same with the `+` operator.
*
* @note
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please check similar function in vanilla scala apis

* @since 3.5.0
*/
def try_add(left: Column, right: Column): Column = {
call_udf("try_add", left, right)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, let's directly use UnresolvedFunction for this case

since there were already some similar cases in functions, e.g.

/**
* Left-pad the binary column with pad to a byte length of len. If the binary column is longer
* than len, the return value is shortened to len bytes.
*
* @group string_funcs
* @since 3.3.0
*/
def lpad(str: Column, len: Int, pad: Array[Byte]): Column = withExpr {
UnresolvedFunction("lpad", Seq(str.expr, lit(len).expr, lit(pad).expr), isDistinct = false)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, let me test the issue with scaladoc first.

@zhengruifeng
Copy link
Contributor

thank you @panbingkun merged to master

@try_remote_functions
def try_element_at(col: "ColumnOrName", extraction: "ColumnOrName") -> Column:
"""
(array, index) - Returns element of array at given (1-based) index. If Index is 0, Spark will
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@panbingkun for try_element_at, maybe we should unify the second parameter's name to index, in both scala and python

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me fix it in the following PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@panbingkun I just notice that try_element_at use the same names as element_at, so current commit is fine. we don't need any follow up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants