Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introducing an Improved Pregel API #1217

Closed
wants to merge 12 commits into from

Conversation

jegonzal
Copy link
Contributor

The initial Pregel API coupled voting to halt with message reception. In this revised the vertex program receives a PregelContext which enables the user to signal whether or not to halt as well as access the current iteration number.

@jegonzal
Copy link
Contributor Author

@ankurdave unfortunately to full accept this change we will need to break compatibility with the current Pregel API. I cannot seem to overload the apply method.

@jegonzal
Copy link
Contributor Author

I spent some time verifying the math behind the PageRank (in particular starting values) to ensure that the delta formulation behaves identically to the static formulation which matches other reference implementations of PageRank. One of the key changes is I have added an extra normalization step at the end of the calculation to address a discrepancy in how we handle dangling vertices.


// Unpersist the RDDs hidden by newly-materialized RDDs
// prevG.unpersistVertices(blocking=false)
// prevG.edges.unpersist(blocking=false)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uncommenting lines 279 and 280 leads to a substantial slow down in later iterations indicating that there is still an issue with unpersist. @ankurdave any thoughts?

@jegonzal
Copy link
Contributor Author

@ankurdave and @rxin there is an issue with the current API. The sendMessage function pull the active field out of the vertex value here:
https://github.com/apache/spark/pull/1217/files#diff-e399679417ffa6eeedf26a7630baca16R243

def sendMessageWrapper(triplet: EdgeTriplet[(VD, Boolean),ED]): Iterator[(VertexId, A)] = {
  val simpleTriplet = new EdgeTriplet[VD, ED]()
   simpleTriplet.set(triplet)
   simpleTriplet.srcAttr = triplet.srcAttr._1
   simpleTriplet.dstAttr = triplet.dstAttr._1
   val ctx = new EdgeContext(i, triplet.srcAttr._2, triplet.dstAttr._2)
   sendMsg(simpleTriplet, ctx)
 }
// Compute the messages for all the active vertices
val messages = g.mapReduceTriplets(sendMessageWrapper, mergeMsg, Some((activeVertices, activeDirection)))

thereby allowing the user a simple sendMsg interface:

sendMsg: (EdgeTriplet[VD, ED], EdgeContext) => Iterator[(VertexId, A)]

However because we access the source and destination vertex attributes the byte code inspection will force a full 3-way join even if the user doesn't actually read the fields.

The simplest solution would be to change the send message interface to operate on the extended vertex attribute (containing the active field).

sendMsg: (EdgeTriplet[(VD, Boolean), ED], EdgeContext) => Iterator[(VertexId, A)]

* vertex properties in terms of the properties of neighboring vertices. These
* recursive properties are then computed through iterative fixed-point
* computations. For example, the PageRank of a web-page can be defined as a
* weighted some of the PageRank of web-pages that link to that page and is

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"weighted sum" :)

@SparkQA
Copy link

SparkQA commented Aug 28, 2014

QA tests have started for PR 1217 at commit 2116e8a.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Aug 28, 2014

QA tests have finished for PR 1217 at commit 2116e8a.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 5, 2014

Can one of the admins verify this patch?

@jegonzal
Copy link
Contributor Author

This is still work in progress and we need to discuss these API changes.

@jegonzal
Copy link
Contributor Author

@ankurdave is this already covered in your latest PR?

@jegonzal
Copy link
Contributor Author

@ankurdave should I try and update this with your latest changes or do you want to create a new one?

@ankurdave
Copy link
Contributor

@jegonzal I'm going to look at this over the weekend to try to get it into 1.2.

@nchammas
Copy link
Contributor

@jegonzal @ankurdave - This PR has gone stale. Do we want to update it or close it for later?

@ankurdave
Copy link
Contributor

Let's close this issue for later.

@srowen
Copy link
Member

srowen commented Jul 17, 2015

(For Github close script:) Do you mind closing this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants