Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-16992][PYSPARK] use map comprehension in doc #14863

Closed
wants to merge 1 commit into from

Conversation

gsemet
Copy link
Contributor

@gsemet gsemet commented Aug 29, 2016

Code is equivalent, but map comprehency is most of the time faster than a map.

@SparkQA
Copy link

SparkQA commented Aug 29, 2016

Test build #64553 has finished for PR 14863 at commit 7a2621e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Sep 1, 2016

I don't know if performance is important here. I'd rather either batch this together with other changes that make this change consistently or drop this one.

@gsemet
Copy link
Contributor Author

gsemet commented Sep 1, 2016

I agree. I would prefer if Spark examples also "promotes" the good practice of Python, ie, replacing 'map' and 'filter' by list or map comprehension ('reduce' has no equivalent on comprehension), even though 'map'/'filter' syntax might be closer to their equivalent on the TDDs, they are not the same. I am not sure if there is a consensus over this point on the "data science" community, but most of the Pythonists now happily promotes comprehension over map/filter. Most of the time it is faster, especially when there is a conversion to list after the map.
'map' may be faster than comprehension when a lambda is not used, is lazy on Python 3 (one can use generator comprehension on Python 2 or 3 to have the same result, thus should be aware of when to use it or not).

Long story short: if Spark community agree, I can look for these 'map'/'filter' in the examples and replace them with comprehension.

@srowen
Copy link
Member

srowen commented Sep 1, 2016

OK well I'd leave it to people here with more taste to agree about what's canonical but I take your word for it. I'm mostly interested in consistency if anythign.

@gsemet
Copy link
Contributor Author

gsemet commented Sep 1, 2016

This is actually wrong, 'map()' returns a 'list' and not a dict

@srowen
Copy link
Member

srowen commented Sep 1, 2016

OK you're saying the existing example doesn't work?

@gsemet
Copy link
Contributor Author

gsemet commented Sep 1, 2016

No my proposal was wrong. I have updated it

@@ -29,7 +29,7 @@
.getOrCreate()

# $example on$
data = [(0, 18.0,), (1, 19.0,), (2, 8.0,), (3, 5.0,), (4, 2.2,)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these extra comma are useless

@SparkQA
Copy link

SparkQA commented Sep 1, 2016

Test build #64779 has finished for PR 14863 at commit 079665b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Signed-off-by: Gaetan Semet <gaetan@xeberon.net>
@SparkQA
Copy link

SparkQA commented Sep 5, 2016

Test build #64940 has finished for PR 14863 at commit f674f75.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Sep 12, 2016

OK, this one's trivial in any event so I'm OK to merge this much.

@asfgit asfgit closed this in b3c2291 Sep 12, 2016
wgtmac pushed a commit to wgtmac/spark that referenced this pull request Sep 19, 2016
Code is equivalent, but map comprehency is most of the time faster than a map.

Author: Gaetan Semet <gaetan@xeberon.net>

Closes apache#14863 from Stibbons/map_comprehension.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants