-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error in Page Rank Computation in PageRank.scala #2100
Conversation
Can one of the admins verify this patch? |
Do take a look. I noticed that with this patch, dangling nodes no longer have page ranks equal to their reset probabilities, so there may not be an error, or my solution may not be the right one. Here is a small test that I did. Edge Table:
Node Table:
Page Ranks Before Patch: (4,0.29503124999999997) Page Ranks After Patch: (4,0.2488125) Note that node 10 is not in the edge table |
ok to test |
QA tests have started for PR 2100 at commit
|
QA tests have finished for PR 2100 at commit
|
It looks like I may be mistaken (since the unit tests failed), so I am closing this pull request. |
To answer your concern, the As a result, we have to add in |
Thank you for taking the time to explain the code. I did another test and noticed that the two methods seem to give different outputs for a test case. See Issue: https://issues.apache.org/jira/browse/SPARK-3206?jql=project%20%3D%20SPARK%20AND%20resolution%20%3D%20Unresolved%20AND%20priority%20%3D%20Major%20ORDER%20BY%20key%20DESC |
I saw something strange in the Page Rank computation for runUntilConverge() in PageRank.scala. It uses the oldPR instead of the resetProb. Note that the run() Method in PageRank.scala uses resetProb as my correction does here (see Lines 95–96 of PageRank.scala).
Here is the diff that I see (in case it is hidden later):
This might not be the right correction, but I brought a pull request to see if I have found an error or not. If it is not correct, just deny the pull request.
Best Wishes