Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-1468] Modify the partition function used by partitionBy. #371

Closed
wants to merge 1 commit into from
Closed

[SPARK-1468] Modify the partition function used by partitionBy. #371

wants to merge 1 commit into from

Conversation

tyro89
Copy link

@tyro89 tyro89 commented Apr 9, 2014

Make partitionBy use a tweaked version of hash as its default partition function
since the python hash function does not consistently assign the same value
to None across python processes.

Associated JIRA at https://issues.apache.org/jira/browse/SPARK-1468

…on function

since the python hash function does not consistently assign the same value
to None across python processes.
@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13962/

@tyro89
Copy link
Author

tyro89 commented Apr 10, 2014

Not sure why the build is failing as I'm pretty sure this change isn't touching any of those two things.

@pwendell
Copy link
Contributor

Jenkins, retest this please.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14003/

@pwendell
Copy link
Contributor

@tyro89 Thanks for the fix, makes sense. Would you mind creating a JIRA for this on the Spark issue tracker? Also if there is a symptom or error that this causes that would be helpful to know (I'd guess it's just seeing the None key in multiple places on the reduce side of the shuffle).

Otherwise if people run into this it will be hard for them to learn where/when it was fixed.

@tyro89
Copy link
Author

tyro89 commented Apr 10, 2014

@tyro89 tyro89 changed the title Modify the partition function used by partitionBy. [SPARK-1468] Modify the partition function used by partitionBy. Apr 10, 2014
@mateiz
Copy link
Contributor

mateiz commented Jun 3, 2014

Jenkins, test this please

@mateiz
Copy link
Contributor

mateiz commented Jun 3, 2014

Sorry for the delay, just re-testing this before merging it.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15395/

@asfgit asfgit closed this in 8edc9d0 Jun 3, 2014
asfgit pushed a commit that referenced this pull request Jun 3, 2014
Make partitionBy use a tweaked version of hash as its default partition function
since the python hash function does not consistently assign the same value
to None across python processes.

Associated JIRA at https://issues.apache.org/jira/browse/SPARK-1468

Author: Erik Selin <erik.selin@jadedpixel.com>

Closes #371 from tyro89/consistent_hashing and squashes the following commits:

201c301 [Erik Selin] Make partitionBy use a tweaked version of hash as its default partition function since the python hash function does not consistently assign the same value to None across python processes.

(cherry picked from commit 8edc9d0)
Signed-off-by: Matei Zaharia <matei@databricks.com>
asfgit pushed a commit that referenced this pull request Jun 3, 2014
Make partitionBy use a tweaked version of hash as its default partition function
since the python hash function does not consistently assign the same value
to None across python processes.

Associated JIRA at https://issues.apache.org/jira/browse/SPARK-1468

Author: Erik Selin <erik.selin@jadedpixel.com>

Closes #371 from tyro89/consistent_hashing and squashes the following commits:

201c301 [Erik Selin] Make partitionBy use a tweaked version of hash as its default partition function since the python hash function does not consistently assign the same value to None across python processes.

(cherry picked from commit 8edc9d0)
Signed-off-by: Matei Zaharia <matei@databricks.com>
@mateiz
Copy link
Contributor

mateiz commented Jun 3, 2014

Thanks Erik! Merged this into branch-0.9, 1.0 and master.

pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
Make partitionBy use a tweaked version of hash as its default partition function
since the python hash function does not consistently assign the same value
to None across python processes.

Associated JIRA at https://issues.apache.org/jira/browse/SPARK-1468

Author: Erik Selin <erik.selin@jadedpixel.com>

Closes apache#371 from tyro89/consistent_hashing and squashes the following commits:

201c301 [Erik Selin] Make partitionBy use a tweaked version of hash as its default partition function since the python hash function does not consistently assign the same value to None across python processes.
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
Make partitionBy use a tweaked version of hash as its default partition function
since the python hash function does not consistently assign the same value
to None across python processes.

Associated JIRA at https://issues.apache.org/jira/browse/SPARK-1468

Author: Erik Selin <erik.selin@jadedpixel.com>

Closes apache#371 from tyro89/consistent_hashing and squashes the following commits:

201c301 [Erik Selin] Make partitionBy use a tweaked version of hash as its default partition function since the python hash function does not consistently assign the same value to None across python processes.
mccheah pushed a commit to mccheah/spark that referenced this pull request Oct 3, 2018
…erals

[SPARK-24151] fix case sensitive literals
bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019
1. do not use uuid directly, to get the id quering by name
2. can not create flavor in public clouds, so let the tests fail first
3. only add one playbook
terraform-provider-openstack-acceptance-test-public-clouds for all
public clouds
4. add post.yaml to clean up the resources after the acctests

Closes: theopenlab/openlab#125
Closes: theopenlab/openlab#136
arjunshroff pushed a commit to arjunshroff/spark that referenced this pull request Nov 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants