Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-7157][SQL] add sampleBy to DataFrame #6769

Closed
wants to merge 3 commits into from

Conversation

mengxr
Copy link
Contributor

@mengxr mengxr commented Jun 11, 2015

Add sampleBy to DataFrame. @rxin

@SparkQA
Copy link

SparkQA commented Jun 12, 2015

Test build #34725 has finished for PR 6769 at commit 832f7cc.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 12, 2015

Test build #34729 has finished for PR 6769 at commit 4a14834.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 12, 2015

Test build #34739 has finished for PR 6769 at commit 991f26f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Jun 24, 2015

Thanks. Merging this. Can you add a version of sampleBy that uses a Java map?

@asfgit asfgit closed this in 0401cba Jun 24, 2015
@rxin
Copy link
Contributor

rxin commented Jun 24, 2015

Note that I had to revert this patch due to tests failing after merging. @mengxr can you submit a new pr with tests and Java friendly API? Thanks.

asfgit pushed a commit that referenced this pull request Jul 31, 2015
This was previously committed but then reverted due to test failures (see #6769).

Author: Xiangrui Meng <meng@databricks.com>

Closes #7755 from rxin/SPARK-7157 and squashes the following commits:

fbf9044 [Xiangrui Meng] fix python test
542bd37 [Xiangrui Meng] update test
604fe6d [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-7157
f051afd [Xiangrui Meng] use udf instead of building expression
f4e9425 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-7157
8fb990b [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-7157
103beb3 [Xiangrui Meng] add Java-friendly sampleBy
991f26f [Xiangrui Meng] fix seed
4a14834 [Xiangrui Meng] move sampleBy to stat
832f7cc [Xiangrui Meng] add sampleBy to DataFrame
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants