-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-5616] [PySpark] Add examples for PySpark API #4417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2.add module example for PySpark
|
The title of the PR should include the title of the JIRA. There is a typo throughout this PR: it's "broadcast" and not "boardcast". What are the other new files besides the main example file? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it's clear what this is an example of. Yes it uses broadcast variables, but the als.py example already does too. Why does it need to be done several times and how does this show the difference versus non-broadcast variables? that plus comments might make this more useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had added some comment to explain why we use broadcast variables . and print the performance report.
Using broadcast: Iteration 0 cost time 0.829586982727
Using broadcast: Iteration 1 cost time 0.0809919834137
Using broadcast: Iteration 2 cost time 0.0794229507446
Don't use broadcast: Iteration 0 cost time 2.80766296387
Don't use broadcast: Iteration 1 cost time 2.83087706566
Don't use broadcast: Iteration 2 cost time 3.16146707535
|
ok to test |
|
Test build #27037 has finished for PR 4417 at commit
|
|
Test build #27043 has finished for PR 4417 at commit
|
|
Test build #27044 has finished for PR 4417 at commit
|
|
Do I need to fix the python style ? I have use lint-python to check my python style. |
|
@lazyman500 yes, this needs to pass python style checks. You can use |
|
Test build #27045 has finished for PR 4417 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We change the default Broadcast factory to TorrentBroadcastFactory, is there any reason we use Http as default here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davies Thanks for your review
I imitate the scala example (https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/BroadcastTest.scala)
I guess that author want to tell us how to change boardCast Type :)
Do I need change the default Broadcast factory to TorrentBroadcastFactory ?
|
It's good to have these examples, thanks for working on it. I had took a round on it, left few comments. |
|
Can one of the admins verify this patch? |
|
I'm going to close this pull request. If this is still relevant and you are interested in pushing it forward, please open a new pull request. Thanks! |
add examples for PySpark