Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARK-2684: Update ExternalAppendOnlyMap to take an iterator as input #1607

Closed
wants to merge 2 commits into from

Conversation

mateiz
Copy link
Contributor

@mateiz mateiz commented Jul 27, 2014

This will decrease object allocation from the "update" closure used in map.changeValue.

@SparkQA
Copy link

SparkQA commented Jul 27, 2014

QA tests have started for PR 1607. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17235/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 27, 2014

QA results for PR 1607:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17235/consoleFull

}
currentMap.changeValue(key, update)
numPairsInMemory += 1
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it makes sense to include a more collection-oriented interface, more like scala.collection.mutable.Buffer#insertAll:

def insertAll(coll: Iterable[Product2[K, V]]): Unit = insertAll(coll.iterator)

...so that you can do things like externalAppendOnlyMap.insertAll(aMap) instead of externalAppendOnlyMap.insertAll(aMap.iterator)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that much of the time we compute data in an Iterator, and Iterator is not Iterable AFAIK (you can't iterate over it more than once). We could add a second interface though if you'd like that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW I've now added this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I saw. Thanks. I'm not really sure how useful it is, because you are correct that we usually have an iterator anyway, but it was simple and clean to add the Iterable interface.

@SparkQA
Copy link

SparkQA commented Jul 27, 2014

QA tests have started for PR 1607. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17238/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 27, 2014

QA results for PR 1607:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17238/consoleFull

@aarondav
Copy link
Contributor

LGTM, merging into master.

@asfgit asfgit closed this in 9857053 Jul 27, 2014
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
This will decrease object allocation from the "update" closure used in map.changeValue.

Author: Matei Zaharia <matei@databricks.com>

Closes apache#1607 from mateiz/spark-2684 and squashes the following commits:

b7d89e6 [Matei Zaharia] Add insertAll for Iterables too, and fix some code style
561fc97 [Matei Zaharia] Update ExternalAppendOnlyMap to take an iterator as input
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants