Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pagerank: Remove duplicate entries from grouping output #294

Closed
wants to merge 2 commits into from

Conversation

njayaram2
Copy link
Contributor

JIRA: MADLIB-1229
JIRA: MADLIB-1253

Fixes the missing output for complete graphs bug as well.

Co-authored-by: Nandish Jayaram njayaram@apache.org

JIRA: MADLIB-1229
JIRA: MADLIB-1253

Fixes the missing output for complete graphs bug as well.

Co-authored-by: Nandish Jayaram <njayaram@apache.org>
@asfgit
Copy link

asfgit commented Jul 16, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/562/

@jingyimei
Copy link

jingyimei commented Jul 16, 2018

The current master has another issue:
When you run

DROP TABLE IF EXISTS vertex, "EDGE";
CREATE TABLE vertex(
id INTEGER
);
CREATE TABLE "EDGE"(
src INTEGER,
dest INTEGER,
user_id INTEGER
);
INSERT INTO vertex VALUES
(0),
(1),
(2);
INSERT INTO "EDGE" VALUES
(0, 1, 1),
(0, 2, 1),
(1, 2, 1),
(2, 1, 1),
(0, 1, 2);


DROP TABLE IF EXISTS pagerank_ppr_grp_out;
DROP TABLE IF EXISTS pagerank_ppr_grp_out_summary;
SELECT pagerank(
'vertex', -- Vertex table
'id', -- Vertix id column
'"EDGE"', -- "EDGE" table
'src=src, dest=dest', -- "EDGE" args
'pagerank_ppr_grp_out', -- Output table of PageRank
NULL, -- Default damping factor (0.85)
NULL, -- Default max iters (100)
NULL, -- Default Threshold 
'user_id');

you will get the following result:

madlib=# select * from pagerank_ppr_grp_out order by user_id, id; user_id | id | pagerank
---------+----+-------------------
1 | 0 | 0.05
1 | 0 | 0.05
1 | 1 | 0.614906399170753
1 | 2 | 0.614906399170753
2 | 0 | 0.075
2 | 1 | 0.13875
(6 rows)

where for user_id=1 the pagerank scores don't sum up to 1 where they should have to. This PR actually fix this issue and gives the right number. However the dev check didn't have a case to catch this issue before. Suggest to add this corner case in dev check to test future changes.

Copy link

@jingyimei jingyimei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment.

@njayaram2
Copy link
Contributor Author

Thank you for the comments @jingyimei , have pushed a commit with a new dev-check test.

@asfgit
Copy link

asfgit commented Jul 17, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/564/

@asfgit asfgit closed this in 62b53dc Jul 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants