-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPARK-2684: Update ExternalAppendOnlyMap to take an iterator as input #1607
Conversation
QA tests have started for PR 1607. This patch merges cleanly. |
QA results for PR 1607: |
} | ||
currentMap.changeValue(key, update) | ||
numPairsInMemory += 1 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it makes sense to include a more collection-oriented interface, more like scala.collection.mutable.Buffer#insertAll:
def insertAll(coll: Iterable[Product2[K, V]]): Unit = insertAll(coll.iterator)
...so that you can do things like externalAppendOnlyMap.insertAll(aMap)
instead of externalAppendOnlyMap.insertAll(aMap.iterator)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that much of the time we compute data in an Iterator, and Iterator is not Iterable AFAIK (you can't iterate over it more than once). We could add a second interface though if you'd like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW I've now added this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I saw. Thanks. I'm not really sure how useful it is, because you are correct that we usually have an iterator anyway, but it was simple and clean to add the Iterable interface.
QA tests have started for PR 1607. This patch merges cleanly. |
QA results for PR 1607: |
LGTM, merging into master. |
This will decrease object allocation from the "update" closure used in map.changeValue. Author: Matei Zaharia <matei@databricks.com> Closes apache#1607 from mateiz/spark-2684 and squashes the following commits: b7d89e6 [Matei Zaharia] Add insertAll for Iterables too, and fix some code style 561fc97 [Matei Zaharia] Update ExternalAppendOnlyMap to take an iterator as input
This will decrease object allocation from the "update" closure used in map.changeValue.