[SPARK-23060][Python] New feature - apply method to extend rdd's functionality #20258

gianmarcodonetti · 2018-01-13T08:44:16Z

What changes were proposed in this pull request?

Extend the RDD class with the method apply.
This method should be like the pipe operator, attached to the RDD class itself.
Example:

The idea is to have something like this:
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.pipe.html?

How was this patch tested?

Manual tests. Easy patch.

Please review http://spark.apache.org/contributing.html before opening a pull request.

HyukjinKwon

Isn't it just a helper function?

def apply(self, func):
    return func(self)

I don't think it's quite worth adding it.

gianmarcodonetti · 2018-01-13T14:01:48Z

@HyukjinKwon in my opinion, it helps a lot.
My goal is to avoid this case:

final_rdd = func_3(func_2(func_1(initial_rdd)))

And admit this:

final_rdd = initial_rdd.apply(func_1).apply(func_2).apply(func_3)

More functional and readable...

HyukjinKwon · 2018-01-13T14:28:18Z

That resembles pipe as I pointed out in the JIRA. It's just a little trick and I don't think it's worth adding it for an API alone.

BTW, we should consider Java / Scala APIs and how it's going to work with Dataset and DataFrame too.

ueshin · 2018-01-16T05:20:27Z

Is this similar to Dataset.transform() in Java/Scala API? But we don't have similar APIs for RDDs.

HyukjinKwon · 2018-01-16T05:56:20Z

Oh, I see! Yea, they look quite same.

srowen · 2018-01-16T13:51:54Z

At best, the functionality already exists for the new API in a form. This should be closed.

AmplabJenkins · 2018-01-18T17:29:45Z

Can one of the admins verify this patch?

holdenk · 2018-02-26T22:54:50Z

I'm +1 to @srowen on this, I don't believe this is a change we're going to make to the API. @gianmarcodonetti please close this PR.

gianmarcodonetti added 2 commits January 12, 2018 16:31

added function apply to rdd

d8463db

refactor and add todo

8e79b46

HyukjinKwon reviewed Jan 13, 2018

View reviewed changes

gianmarcodonetti closed this Feb 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-23060][Python] New feature - apply method to extend rdd's functionality #20258

[SPARK-23060][Python] New feature - apply method to extend rdd's functionality #20258

gianmarcodonetti commented Jan 13, 2018 •

edited

Loading

HyukjinKwon left a comment

gianmarcodonetti commented Jan 13, 2018

HyukjinKwon commented Jan 13, 2018

ueshin commented Jan 16, 2018

HyukjinKwon commented Jan 16, 2018

srowen commented Jan 16, 2018

AmplabJenkins commented Jan 18, 2018

holdenk commented Feb 26, 2018

[SPARK-23060][Python] New feature - apply method to extend rdd's functionality #20258

[SPARK-23060][Python] New feature - apply method to extend rdd's functionality #20258

Conversation

gianmarcodonetti commented Jan 13, 2018 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

HyukjinKwon left a comment

Choose a reason for hiding this comment

gianmarcodonetti commented Jan 13, 2018

HyukjinKwon commented Jan 13, 2018

ueshin commented Jan 16, 2018

HyukjinKwon commented Jan 16, 2018

srowen commented Jan 16, 2018

AmplabJenkins commented Jan 18, 2018

holdenk commented Feb 26, 2018

gianmarcodonetti commented Jan 13, 2018 •

edited

Loading