[SPARK-16992][PYSPARK] use map comprehension in doc #14863

gsemet · 2016-08-29T15:54:29Z

Code is equivalent, but map comprehency is most of the time faster than a map.

SparkQA · 2016-08-29T17:19:23Z

Test build #64553 has finished for PR 14863 at commit 7a2621e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2016-09-01T08:23:58Z

I don't know if performance is important here. I'd rather either batch this together with other changes that make this change consistently or drop this one.

gsemet · 2016-09-01T08:52:40Z

I agree. I would prefer if Spark examples also "promotes" the good practice of Python, ie, replacing 'map' and 'filter' by list or map comprehension ('reduce' has no equivalent on comprehension), even though 'map'/'filter' syntax might be closer to their equivalent on the TDDs, they are not the same. I am not sure if there is a consensus over this point on the "data science" community, but most of the Pythonists now happily promotes comprehension over map/filter. Most of the time it is faster, especially when there is a conversion to list after the map.
'map' may be faster than comprehension when a lambda is not used, is lazy on Python 3 (one can use generator comprehension on Python 2 or 3 to have the same result, thus should be aware of when to use it or not).

Long story short: if Spark community agree, I can look for these 'map'/'filter' in the examples and replace them with comprehension.

srowen · 2016-09-01T08:56:22Z

OK well I'd leave it to people here with more taste to agree about what's canonical but I take your word for it. I'm mostly interested in consistency if anythign.

gsemet · 2016-09-01T12:32:41Z

This is actually wrong, 'map()' returns a 'list' and not a dict

srowen · 2016-09-01T12:35:00Z

OK you're saying the existing example doesn't work?

gsemet · 2016-09-01T12:42:53Z

No my proposal was wrong. I have updated it

gsemet · 2016-09-01T12:43:10Z

examples/src/main/python/ml/quantile_discretizer_example.py

@@ -29,7 +29,7 @@
        .getOrCreate()

    # $example on$
-    data = [(0, 18.0,), (1, 19.0,), (2, 8.0,), (3, 5.0,), (4, 2.2,)]


these extra comma are useless

SparkQA · 2016-09-01T13:08:24Z

Test build #64779 has finished for PR 14863 at commit 079665b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

Signed-off-by: Gaetan Semet <gaetan@xeberon.net>

SparkQA · 2016-09-05T14:14:32Z

Test build #64940 has finished for PR 14863 at commit f674f75.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2016-09-12T11:21:14Z

OK, this one's trivial in any event so I'm OK to merge this much.

Code is equivalent, but map comprehency is most of the time faster than a map. Author: Gaetan Semet <gaetan@xeberon.net> Closes apache#14863 from Stibbons/map_comprehension.

gsemet force-pushed the map_comprehension branch from 7a2621e to 079665b Compare September 1, 2016 12:42

gsemet reviewed Sep 1, 2016
View reviewed changes

use map comprehension

f674f75

Signed-off-by: Gaetan Semet <gaetan@xeberon.net>

gsemet force-pushed the map_comprehension branch from 079665b to f674f75 Compare September 5, 2016 13:50

asfgit closed this in b3c2291 Sep 12, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-16992][PYSPARK] use map comprehension in doc #14863

[SPARK-16992][PYSPARK] use map comprehension in doc #14863

gsemet commented Aug 29, 2016

SparkQA commented Aug 29, 2016

srowen commented Sep 1, 2016

gsemet commented Sep 1, 2016

srowen commented Sep 1, 2016

gsemet commented Sep 1, 2016

srowen commented Sep 1, 2016

gsemet commented Sep 1, 2016

gsemet Sep 1, 2016

SparkQA commented Sep 1, 2016

SparkQA commented Sep 5, 2016

srowen commented Sep 12, 2016

[SPARK-16992][PYSPARK] use map comprehension in doc #14863

[SPARK-16992][PYSPARK] use map comprehension in doc #14863

Conversation

gsemet commented Aug 29, 2016

SparkQA commented Aug 29, 2016

srowen commented Sep 1, 2016

gsemet commented Sep 1, 2016

srowen commented Sep 1, 2016

gsemet commented Sep 1, 2016

srowen commented Sep 1, 2016

gsemet commented Sep 1, 2016

gsemet Sep 1, 2016

Choose a reason for hiding this comment

SparkQA commented Sep 1, 2016

SparkQA commented Sep 5, 2016

srowen commented Sep 12, 2016