[SPARK-12645] [SparkR] SparkR support hash function #10597

yanboliang · 2016-01-05T09:44:07Z

Add hash function for SparkR DataFrame.

SparkQA · 2016-01-05T10:17:08Z

Test build #48757 has finished for PR 10597 at commit c41eb1f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

shivaram · 2016-01-05T17:15:37Z

cc @sun-rui

SparkQA · 2016-01-06T04:04:23Z

Test build #48820 has finished for PR 10597 at commit 995fd06.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

sun-rui · 2016-01-06T07:27:47Z

LGTM

felixcheung · 2016-01-06T07:52:00Z

looks good. no conflict with base/stats

shivaram · 2016-01-09T06:57:03Z

LGTM. Thanks @yanboliang - Merging this to master and branch-1.6

Add ```hash``` function for SparkR ```DataFrame```. Author: Yanbo Liang <ybliang8@gmail.com> Closes #10597 from yanboliang/spark-12645. (cherry picked from commit 3d77cff) Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

yhuai · 2016-01-12T20:27:30Z

It breaks hadoop 1 test (https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-1.6-test-sbt-hadoop-1.0/30/console). Can you take a look?

shivaram · 2016-01-12T20:39:40Z

Hmm is hash not a supported function in 1.6 by any chance ? Are the other hadoop versions in branch-1.6 fine ?

yhuai · 2016-01-12T20:42:30Z

looks like 2.2 tests are broken as well. How about we revert it from 1.6 for now? If it is good, can you do the revert? Thanks!

shivaram · 2016-01-12T20:43:29Z

sure - sounds good to me. Can you open a JIRA and cc me and @yanboliang on it ?

yhuai · 2016-01-12T20:45:00Z

let's just reuse the original jira. I will reopen it.

yhuai · 2016-01-12T20:45:41Z

Actually, I am not sure if we should merge this to branch 1.6 since it is a new feature.

shivaram · 2016-01-12T20:49:52Z

The SparkR module is still considered alpha, so we do include new features in minor updates when it is appropriate. In this case it was a small new function, so it seemed fine to me.

felixcheung · 2016-01-12T22:43:06Z

it looks like hash is new in 2.0

  /**
   * Calculates the hash code of given columns, and returns the result as a int column.
   *
   * @group misc_funcs
   * @since 2.0
   */
  @scala.annotation.varargs
  def hash(cols: Column*): Column = withExpr {
    new Murmur3Hash(cols.map(_.expr))
  }

https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L1823

so shouldn't be in 1.6.0

yhuai · 2016-01-12T23:14:16Z

@shivaram I will revert it from 1.6 branch.

yhuai · 2016-01-12T23:16:04Z

reverted from 1.6

shivaram · 2016-01-14T18:35:48Z

@yanboliang Could you test why this doesn't with branch 1.6 ?

felixcheung · 2016-01-15T03:24:51Z

@shivaram in my comment above, hash was added only in 2.0.0.

yanboliang · 2016-01-15T09:23:07Z

@shivaram Just like @felixcheung commented, the hash function was added only in 2.0.0. So revert it from branch 1.6 will fix the broken test.

shivaram · 2016-01-15T15:33:52Z

Ah I didn't notice @felixcheung earlier comment. Thanks for clarifying. I guess there is nothing to do here as the JIRA rightly says this feature is fixed in 2.0.0

yanboliang added 2 commits January 6, 2016 11:02

SparkR support hash function

0728560

update implementation

995fd06

yanboliang force-pushed the spark-12645 branch from c41eb1f to 995fd06 Compare January 6, 2016 03:06

asfgit closed this in 3d77cff Jan 9, 2016

yanboliang deleted the spark-12645 branch January 10, 2016 07:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-12645] [SparkR] SparkR support hash function #10597

[SPARK-12645] [SparkR] SparkR support hash function #10597

yanboliang commented Jan 5, 2016

SparkQA commented Jan 5, 2016

shivaram commented Jan 5, 2016

SparkQA commented Jan 6, 2016

sun-rui commented Jan 6, 2016

felixcheung commented Jan 6, 2016

shivaram commented Jan 9, 2016

yhuai commented Jan 12, 2016

shivaram commented Jan 12, 2016

yhuai commented Jan 12, 2016

shivaram commented Jan 12, 2016

yhuai commented Jan 12, 2016

yhuai commented Jan 12, 2016

shivaram commented Jan 12, 2016

felixcheung commented Jan 12, 2016

yhuai commented Jan 12, 2016

yhuai commented Jan 12, 2016

shivaram commented Jan 14, 2016

felixcheung commented Jan 15, 2016

yanboliang commented Jan 15, 2016

shivaram commented Jan 15, 2016

[SPARK-12645] [SparkR] SparkR support hash function #10597

[SPARK-12645] [SparkR] SparkR support hash function #10597

Conversation

yanboliang commented Jan 5, 2016

SparkQA commented Jan 5, 2016

shivaram commented Jan 5, 2016

SparkQA commented Jan 6, 2016

sun-rui commented Jan 6, 2016

felixcheung commented Jan 6, 2016

shivaram commented Jan 9, 2016

yhuai commented Jan 12, 2016

shivaram commented Jan 12, 2016

yhuai commented Jan 12, 2016

shivaram commented Jan 12, 2016

yhuai commented Jan 12, 2016

yhuai commented Jan 12, 2016

shivaram commented Jan 12, 2016

felixcheung commented Jan 12, 2016

yhuai commented Jan 12, 2016

yhuai commented Jan 12, 2016

shivaram commented Jan 14, 2016

felixcheung commented Jan 15, 2016

yanboliang commented Jan 15, 2016

shivaram commented Jan 15, 2016