Skip to content

Commit

Permalink
[SPARK-4672][GraphX]Perform checkpoint() on PartitionsRDD to shorten …
Browse files Browse the repository at this point in the history
…the lineage

The related JIRA is https://issues.apache.org/jira/browse/SPARK-4672

Iterative GraphX applications always have long lineage, while checkpoint() on EdgeRDD and VertexRDD themselves cannot shorten the lineage. In contrast, if we perform checkpoint() on their ParitionsRDD, the long lineage can be cut off. Moreover, the existing operations such as cache() in this code is performed on the PartitionsRDD, so checkpoint() should do the same way. More details and explanation can be found in the JIRA.

Author: JerryLead <JerryLead@163.com>
Author: Lijie Xu <csxulijie@gmail.com>

Closes #3549 from JerryLead/my_graphX_checkpoint and squashes the following commits:

d1aa8d8 [JerryLead] Perform checkpoint() on PartitionsRDD not VertexRDD and EdgeRDD themselves
ff08ed4 [JerryLead] Merge branch 'master' of https://github.com/apache/spark
c0169da [JerryLead] Merge branch 'master' of https://github.com/apache/spark
52799e3 [Lijie Xu] Merge pull request #1 from apache/master

(cherry picked from commit fc0a147)
Signed-off-by: Ankur Dave <ankurdave@gmail.com>
  • Loading branch information
JerryLead authored and ankurdave committed Dec 3, 2014
1 parent 5e026a3 commit f1859fc
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,10 @@ class EdgeRDDImpl[ED: ClassTag, VD: ClassTag] private[graphx] (
this
}

override def checkpoint() = {
partitionsRDD.checkpoint()
}

/** The number of edges in the RDD. */
override def count(): Long = {
partitionsRDD.map(_._2.size.toLong).reduce(_ + _)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,10 @@ class VertexRDDImpl[VD] private[graphx] (
this
}

override def checkpoint() = {
partitionsRDD.checkpoint()
}

/** The number of vertices in the RDD. */
override def count(): Long = {
partitionsRDD.map(_.size).reduce(_ + _)
Expand Down

0 comments on commit f1859fc

Please sign in to comment.