Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARK-5270 [CORE] Elegantly check if RDD is empty #4074

Closed
wants to merge 4 commits into from

Conversation

srowen
Copy link
Member

@srowen srowen commented Jan 16, 2015

Pretty minor, but submitted for consideration -- this would at least help people make this check in the most efficient way I know.

@SparkQA
Copy link

SparkQA commented Jan 16, 2015

Test build #25667 has started for PR 4074 at commit de6b95e.

  • This patch merges cleanly.

@srowen
Copy link
Member Author

srowen commented Jan 16, 2015

(Oh of course, if this looks good I can add this to Java / Python too)

@SparkQA
Copy link

SparkQA commented Jan 16, 2015

Test build #25667 has finished for PR 4074 at commit de6b95e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25667/
Test PASSed.

@ksakellis
Copy link

LTGM. What is the use case? is this part of a bigger pr?

@srowen
Copy link
Member Author

srowen commented Jan 16, 2015

This is all there is to it. It's just a convenience method that implements the check efficiently. Given several questions on the list, it seems that people do want to test for an empty RDD and there hasn't been an accepted way to do it that is faster than count() == 0:

http://apache-spark-user-list.1001560.n3.nabble.com/Testing-if-an-RDD-is-empty-td1678.html#a1679
... and of course
http://issues.apache.org/jira/browse/SPARK-5270

@pwendell
Copy link
Contributor

Seems reasonable to have since it's non obvious how to do it - @srowen could you add this in Java and Python?

@SparkQA
Copy link

SparkQA commented Jan 16, 2015

Test build #25682 has started for PR 4074 at commit d76f8e3.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 16, 2015

Test build #25682 has finished for PR 4074 at commit d76f8e3.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25682/
Test FAILed.

@srowen
Copy link
Member Author

srowen commented Jan 17, 2015

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Jan 17, 2015

Test build #25701 has started for PR 4074 at commit d76f8e3.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 17, 2015

Test build #25701 has finished for PR 4074 at commit d76f8e3.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25701/
Test FAILed.

test("isEmpty") {
assert(sc.emptyRDD.isEmpty())
assert(sc.parallelize(Seq[Int]()).isEmpty())
assert(!sc.parallelize(Seq(1)).isEmpty())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this tests the case where there are multiple partitions but no data in any of the partitions. Maybe add something like

assert(sc.parallelize(Seq(1,2,3), 3).filter(_ < 0).isEmpty())

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the sc.parallelize(Seq[Int]() case actually has multiple partitions but I'll add this too. Also, I'll check the case where the first partition is empty but others aren't.

@SparkQA
Copy link

SparkQA commented Jan 18, 2015

Test build #25730 has started for PR 4074 at commit 191bb9f.

  • This patch does not merge cleanly.

@SparkQA
Copy link

SparkQA commented Jan 18, 2015

Test build #25730 has finished for PR 4074 at commit 191bb9f.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25730/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Jan 18, 2015

Test build #25731 has started for PR 4074 at commit 66885b8.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 18, 2015

Test build #25731 has finished for PR 4074 at commit 66885b8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25731/
Test PASSed.

@pwendell
Copy link
Contributor

LGTM @srowen - are you still working on it or is it good from your end? Will leave a bit of time for others to comment as well.

@srowen
Copy link
Member Author

srowen commented Jan 19, 2015

@pwendell No more changes from my side.

@pwendell
Copy link
Contributor

@srowen Thanks Sean, I committed this with a minor re-word of the title.

@asfgit asfgit closed this in 306ff18 Jan 20, 2015
bomeng pushed a commit to Huawei-Spark/spark that referenced this pull request Jan 21, 2015
Pretty minor, but submitted for consideration -- this would at least help people make this check in the most efficient way I know.

Author: Sean Owen <sowen@cloudera.com>

Closes apache#4074 from srowen/SPARK-5270 and squashes the following commits:

66885b8 [Sean Owen] Add note that JavaRDDLike should not be implemented by user code
2e9b490 [Sean Owen] More tests, and Mima-exclude the new isEmpty method in JavaRDDLike
28395ff [Sean Owen] Add isEmpty to Java, Python
7dd04b7 [Sean Owen] Add efficient RDD.isEmpty()
@srowen srowen deleted the SPARK-5270 branch January 22, 2015 14:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants