-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-1468] Modify the partition function used by partitionBy. #371
Conversation
…on function since the python hash function does not consistently assign the same value to None across python processes.
Merged build triggered. |
Merged build started. |
Merged build finished. |
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13962/ |
Not sure why the build is failing as I'm pretty sure this change isn't touching any of those two things. |
Jenkins, retest this please. |
Merged build triggered. |
Merged build started. |
Merged build finished. All automated tests passed. |
All automated tests passed. |
@tyro89 Thanks for the fix, makes sense. Would you mind creating a JIRA for this on the Spark issue tracker? Also if there is a symptom or error that this causes that would be helpful to know (I'd guess it's just seeing the None key in multiple places on the reduce side of the shuffle). Otherwise if people run into this it will be hard for them to learn where/when it was fixed. |
Jenkins, test this please |
Sorry for the delay, just re-testing this before merging it. |
Merged build triggered. |
Merged build started. |
Merged build finished. All automated tests passed. |
All automated tests passed. |
Make partitionBy use a tweaked version of hash as its default partition function since the python hash function does not consistently assign the same value to None across python processes. Associated JIRA at https://issues.apache.org/jira/browse/SPARK-1468 Author: Erik Selin <erik.selin@jadedpixel.com> Closes #371 from tyro89/consistent_hashing and squashes the following commits: 201c301 [Erik Selin] Make partitionBy use a tweaked version of hash as its default partition function since the python hash function does not consistently assign the same value to None across python processes. (cherry picked from commit 8edc9d0) Signed-off-by: Matei Zaharia <matei@databricks.com>
Make partitionBy use a tweaked version of hash as its default partition function since the python hash function does not consistently assign the same value to None across python processes. Associated JIRA at https://issues.apache.org/jira/browse/SPARK-1468 Author: Erik Selin <erik.selin@jadedpixel.com> Closes #371 from tyro89/consistent_hashing and squashes the following commits: 201c301 [Erik Selin] Make partitionBy use a tweaked version of hash as its default partition function since the python hash function does not consistently assign the same value to None across python processes. (cherry picked from commit 8edc9d0) Signed-off-by: Matei Zaharia <matei@databricks.com>
Thanks Erik! Merged this into branch-0.9, 1.0 and master. |
Make partitionBy use a tweaked version of hash as its default partition function since the python hash function does not consistently assign the same value to None across python processes. Associated JIRA at https://issues.apache.org/jira/browse/SPARK-1468 Author: Erik Selin <erik.selin@jadedpixel.com> Closes apache#371 from tyro89/consistent_hashing and squashes the following commits: 201c301 [Erik Selin] Make partitionBy use a tweaked version of hash as its default partition function since the python hash function does not consistently assign the same value to None across python processes.
Make partitionBy use a tweaked version of hash as its default partition function since the python hash function does not consistently assign the same value to None across python processes. Associated JIRA at https://issues.apache.org/jira/browse/SPARK-1468 Author: Erik Selin <erik.selin@jadedpixel.com> Closes apache#371 from tyro89/consistent_hashing and squashes the following commits: 201c301 [Erik Selin] Make partitionBy use a tweaked version of hash as its default partition function since the python hash function does not consistently assign the same value to None across python processes.
…erals [SPARK-24151] fix case sensitive literals
1. do not use uuid directly, to get the id quering by name 2. can not create flavor in public clouds, so let the tests fail first 3. only add one playbook terraform-provider-openstack-acceptance-test-public-clouds for all public clouds 4. add post.yaml to clean up the resources after the acctests Closes: theopenlab/openlab#125 Closes: theopenlab/openlab#136
Make partitionBy use a tweaked version of hash as its default partition function
since the python hash function does not consistently assign the same value
to None across python processes.
Associated JIRA at https://issues.apache.org/jira/browse/SPARK-1468