Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-12645] [SparkR] SparkR support hash function #10597

Closed
wants to merge 2 commits into from

Conversation

yanboliang
Copy link
Contributor

Add hash function for SparkR DataFrame.

@SparkQA
Copy link

SparkQA commented Jan 5, 2016

Test build #48757 has finished for PR 10597 at commit c41eb1f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivaram
Copy link
Contributor

shivaram commented Jan 5, 2016

cc @sun-rui

@SparkQA
Copy link

SparkQA commented Jan 6, 2016

Test build #48820 has finished for PR 10597 at commit 995fd06.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sun-rui
Copy link
Contributor

sun-rui commented Jan 6, 2016

LGTM

@felixcheung
Copy link
Member

looks good. no conflict with base/stats

@shivaram
Copy link
Contributor

shivaram commented Jan 9, 2016

LGTM. Thanks @yanboliang - Merging this to master and branch-1.6

asfgit pushed a commit that referenced this pull request Jan 9, 2016
Add ```hash``` function for SparkR ```DataFrame```.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #10597 from yanboliang/spark-12645.

(cherry picked from commit 3d77cff)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
@asfgit asfgit closed this in 3d77cff Jan 9, 2016
@yanboliang yanboliang deleted the spark-12645 branch January 10, 2016 07:27
@yhuai
Copy link
Contributor

yhuai commented Jan 12, 2016

It breaks hadoop 1 test (https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-1.6-test-sbt-hadoop-1.0/30/console). Can you take a look?

@shivaram
Copy link
Contributor

Hmm is hash not a supported function in 1.6 by any chance ? Are the other hadoop versions in branch-1.6 fine ?

@yhuai
Copy link
Contributor

yhuai commented Jan 12, 2016

looks like 2.2 tests are broken as well. How about we revert it from 1.6 for now? If it is good, can you do the revert? Thanks!

@shivaram
Copy link
Contributor

sure - sounds good to me. Can you open a JIRA and cc me and @yanboliang on it ?

@yhuai
Copy link
Contributor

yhuai commented Jan 12, 2016

let's just reuse the original jira. I will reopen it.

@yhuai
Copy link
Contributor

yhuai commented Jan 12, 2016

Actually, I am not sure if we should merge this to branch 1.6 since it is a new feature.

@shivaram
Copy link
Contributor

The SparkR module is still considered alpha, so we do include new features in minor updates when it is appropriate. In this case it was a small new function, so it seemed fine to me.

@felixcheung
Copy link
Member

it looks like hash is new in 2.0

  /**
   * Calculates the hash code of given columns, and returns the result as a int column.
   *
   * @group misc_funcs
   * @since 2.0
   */
  @scala.annotation.varargs
  def hash(cols: Column*): Column = withExpr {
    new Murmur3Hash(cols.map(_.expr))
  }

https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L1823

so shouldn't be in 1.6.0

@yhuai
Copy link
Contributor

yhuai commented Jan 12, 2016

@shivaram I will revert it from 1.6 branch.

@yhuai
Copy link
Contributor

yhuai commented Jan 12, 2016

reverted from 1.6

@shivaram
Copy link
Contributor

@yanboliang Could you test why this doesn't with branch 1.6 ?

@felixcheung
Copy link
Member

@shivaram in my comment above, hash was added only in 2.0.0.

@yanboliang
Copy link
Contributor Author

@shivaram Just like @felixcheung commented, the hash function was added only in 2.0.0. So revert it from branch 1.6 will fix the broken test.

@shivaram
Copy link
Contributor

Ah I didn't notice @felixcheung earlier comment. Thanks for clarifying. I guess there is nothing to do here as the JIRA rightly says this feature is fixed in 2.0.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants