[SPARK-28978 ] Support > 256 args to python udf #26442

MrBago · 2019-11-09T00:05:02Z

What changes were proposed in this pull request?

On the worker we express lambda functions as strings and then eval them to create a "mapper" function. This make the code hard to read & limits the # of arguments a udf can support to 256 for python <= 3.6.

This PR rewrites the mapper functions as nested functions instead of "lambda strings" and allows passing in more than 255 args.

Why are the changes needed?

The jira ticket associated with this issue describes how MLflow uses udfs to consume columns as features. This pattern isn't unique and a limit of 255 features is quite low.

Does this PR introduce any user-facing change?

Users can now pass more than 255 cols to a udf function.

How was this patch tested?

Added a unit test for passing in > 255 args to udf.

mengxr · 2019-11-09T00:20:51Z

LGTM. Code is much easier to read now.

SparkQA · 2019-11-09T00:40:19Z

Test build #113481 has finished for PR 26442 at commit 14c6432.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-11-09T00:50:56Z

Test build #113482 has finished for PR 26442 at commit e1bcda1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

…n-worker

SparkQA · 2019-11-09T01:40:17Z

Test build #113485 has finished for PR 26442 at commit db40059.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mengxr · 2019-11-09T03:20:18Z

Merged into master. Thanks!

HyukjinKwon

late LGTM too

MrBago added 3 commits November 8, 2019 13:37

Avoid using 'string lambdas' on worker

a49d12c

Added unit test.

14c6432

remove stale comments

e1bcda1

mengxr self-requested a review November 9, 2019 00:20

mengxr approved these changes Nov 9, 2019

View reviewed changes

Merge remote-tracking branch 'upstream/master' into replace-lambdas-o…

db40059

…n-worker

asfgit closed this in 8152a87 Nov 9, 2019

HyukjinKwon reviewed Nov 13, 2019

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-28978 ] Support > 256 args to python udf #26442

[SPARK-28978 ] Support > 256 args to python udf #26442

Uh oh!

MrBago commented Nov 9, 2019

Uh oh!

mengxr commented Nov 9, 2019

Uh oh!

SparkQA commented Nov 9, 2019

Uh oh!

SparkQA commented Nov 9, 2019

Uh oh!

SparkQA commented Nov 9, 2019

Uh oh!

mengxr commented Nov 9, 2019

Uh oh!

HyukjinKwon left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-28978 ] Support > 256 args to python udf #26442

[SPARK-28978 ] Support > 256 args to python udf #26442

Uh oh!

Conversation

MrBago commented Nov 9, 2019

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

mengxr commented Nov 9, 2019

Uh oh!

SparkQA commented Nov 9, 2019

Uh oh!

SparkQA commented Nov 9, 2019

Uh oh!

SparkQA commented Nov 9, 2019

Uh oh!

mengxr commented Nov 9, 2019

Uh oh!

HyukjinKwon left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants