Skip to content

Commit

Permalink
[SPARK-25601][PYTHON] Register Grouped aggregate UDF Vectorized UDFs …
Browse files Browse the repository at this point in the history
…for SQL Statement

## What changes were proposed in this pull request?

This PR proposes to register Grouped aggregate UDF Vectorized UDFs for SQL Statement, for instance:

```python
from pyspark.sql.functions import pandas_udf, PandasUDFType

pandas_udf("integer", PandasUDFType.GROUPED_AGG)
def sum_udf(v):
    return v.sum()

spark.udf.register("sum_udf", sum_udf)
q = "SELECT v2, sum_udf(v1) FROM VALUES (3, 0), (2, 0), (1, 1) tbl(v1, v2) GROUP BY v2"
spark.sql(q).show()
```

```
+---+-----------+
| v2|sum_udf(v1)|
+---+-----------+
|  1|          1|
|  0|          5|
+---+-----------+
```

## How was this patch tested?

Manual test and unit test.

Closes #22620 from HyukjinKwon/SPARK-25601.

Authored-by: hyukjinkwon <gurwls223@apache.org>
Signed-off-by: hyukjinkwon <gurwls223@apache.org>
  • Loading branch information
HyukjinKwon committed Oct 4, 2018
1 parent 79dd4c9 commit 927e527
Showing 0 changed files with 0 additions and 0 deletions.

1 comment on commit 927e527

@HyukjinKwon
Copy link
Member Author

@HyukjinKwon HyukjinKwon commented on 927e527 Oct 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry this empty commit was my mistake. I used a merge script with some custom fixes. Let me get rid of it to prevent such mistakes.

Please sign in to comment.