SPARK-2684: Update ExternalAppendOnlyMap to take an iterator as input #1607

mateiz · 2014-07-27T01:51:13Z

This will decrease object allocation from the "update" closure used in map.changeValue.

SparkQA · 2014-07-27T01:53:47Z

QA tests have started for PR 1607. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17235/consoleFull

SparkQA · 2014-07-27T02:39:07Z

QA results for PR 1607:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17235/consoleFull

markhamstra · 2014-07-27T03:00:28Z

core/src/main/scala/org/apache/spark/util/collection/ExternalAppendOnlyMap.scala

    }
-    currentMap.changeValue(key, update)
-    numPairsInMemory += 1
  }


Does it makes sense to include a more collection-oriented interface, more like scala.collection.mutable.Buffer#insertAll:

def insertAll(coll: Iterable[Product2[K, V]]): Unit = insertAll(coll.iterator)

...so that you can do things like externalAppendOnlyMap.insertAll(aMap) instead of externalAppendOnlyMap.insertAll(aMap.iterator)?

The problem is that much of the time we compute data in an Iterator, and Iterator is not Iterable AFAIK (you can't iterate over it more than once). We could add a second interface though if you'd like that.

BTW I've now added this.

Yeah, I saw. Thanks. I'm not really sure how useful it is, because you are correct that we usually have an iterator anyway, but it was simple and clean to add the Iterable interface.

SparkQA · 2014-07-27T04:43:36Z

QA tests have started for PR 1607. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17238/consoleFull

SparkQA · 2014-07-27T05:29:03Z

QA results for PR 1607:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17238/consoleFull

aarondav · 2014-07-27T18:19:11Z

LGTM, merging into master.

This will decrease object allocation from the "update" closure used in map.changeValue. Author: Matei Zaharia <matei@databricks.com> Closes apache#1607 from mateiz/spark-2684 and squashes the following commits: b7d89e6 [Matei Zaharia] Add insertAll for Iterables too, and fix some code style 561fc97 [Matei Zaharia] Update ExternalAppendOnlyMap to take an iterator as input

Update ExternalAppendOnlyMap to take an iterator as input

561fc97

markhamstra reviewed Jul 27, 2014
View reviewed changes

Add insertAll for Iterables too, and fix some code style

b7d89e6

asfgit closed this in 9857053 Jul 27, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPARK-2684: Update ExternalAppendOnlyMap to take an iterator as input #1607

SPARK-2684: Update ExternalAppendOnlyMap to take an iterator as input #1607

mateiz commented Jul 27, 2014

SparkQA commented Jul 27, 2014

SparkQA commented Jul 27, 2014

markhamstra Jul 27, 2014

mateiz Jul 27, 2014

mateiz Jul 27, 2014

markhamstra Jul 27, 2014

SparkQA commented Jul 27, 2014

SparkQA commented Jul 27, 2014

aarondav commented Jul 27, 2014

SPARK-2684: Update ExternalAppendOnlyMap to take an iterator as input #1607

SPARK-2684: Update ExternalAppendOnlyMap to take an iterator as input #1607

Conversation

mateiz commented Jul 27, 2014

SparkQA commented Jul 27, 2014

SparkQA commented Jul 27, 2014

markhamstra Jul 27, 2014

Choose a reason for hiding this comment

mateiz Jul 27, 2014

Choose a reason for hiding this comment

mateiz Jul 27, 2014

Choose a reason for hiding this comment

markhamstra Jul 27, 2014

Choose a reason for hiding this comment

SparkQA commented Jul 27, 2014

SparkQA commented Jul 27, 2014

aarondav commented Jul 27, 2014