Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Easier Pandas UDFs #8

Closed
thunterdb opened this issue Jan 11, 2019 · 1 comment
Closed

Easier Pandas UDFs #8

thunterdb opened this issue Jan 11, 2019 · 1 comment
Labels

Comments

@thunterdb
Copy link
Contributor

TODO: move this to a google doc for proper design.

All these are mostly implemented in https://github.com/databricks/spark-pandas/blob/master/pandorable_sparky/typing.py#L86

A couple of changes can make the current Pandas UDFs more friendly for non-spark users:

support native python types Users should not have to know the spark datatype objects.

Use python3 typing for expressing types Type hints are required by spark and now they can be expressed naturally in python 3 too. You can quickly declare a python UDF with the following pythonic syntax:

@spark_function
def my_udf(x: Col[int]) -> Col[double]:
  pass

The input type is optional but then can be use to do either casting or type checking, just like in scala.
An alternative syntax for expressing the same thing in python 2 could be:

@spark_udf(col_x=int, col_return=double)
def my_udf(x): pass

Similarly for reducers. The following function can be automatically infered to be a UDAF because it returns an integer.

@spark_function
def my_reducer(x: Col[int]) -> int:
  pass

Broadcasting For example, for UDFs that take multiple columns as arguments, a scalar should be accepted as treated as a SQL lit().

Useful and relevant code:
https://github.com/databricks/spark-pandas/blob/master/pandorable_sparky/typing.py#L86
https://github.com/databricks/Koala/blob/master/potassium/utils/spark_utils.py
https://github.com/dvgodoy/handyspark/blob/master/handyspark/sql/schema.py

@thunterdb thunterdb added the P1 label Jan 18, 2019
@HyukjinKwon
Copy link
Member

I believe this was implemented by #453

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants