New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes for Personalized Page Rank : Jira:1084 #244

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
5 participants
@hpandeycodeit
Contributor

hpandeycodeit commented Mar 16, 2018

Jira : 1084
This PR contains changes for Personalized Page Rank.

  • Added extra parameter, nodes_of_interest in main pagerank function.
  • Added a new Function get_query_params_for_ppr in pagerank.py_in to calculate random_jump_probabilty based on the user provided input nodes.
  • Added a condition, when the user provided nodes are present then Personalized Page Rank will be executed otherwise regular Page Rank will run.
  • Added an example function in pagerank.sql_in
  • The extra parameter nodes_of_interest is also added in the calling functions in pagerank.sql_in
@asfgit

This comment has been minimized.

Show comment
Hide comment
@asfgit

asfgit Mar 16, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/379/

asfgit commented Mar 16, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/379/

@jingyimei

When I tried to run grouping with nodes_of_interest, I got the following error message

ERROR:  plpy.SPIError: column "user_id" named in DISTRIBUTED BY clause does not exist
CONTEXT:  Traceback (most recent call last):
  PL/Python function "pagerank", line 23, in <module>
    return pagerank.pagerank(**globals())
  PL/Python function "pagerank", line 184, in pagerank
  PL/Python function "pagerank", line 607, in get_query_params_for_ppr
PL/Python function "pagerank"

Other comments are all minor suggestions.

Show outdated Hide outdated src/ports/postgres/modules/graph/pagerank.py_in Outdated
Show outdated Hide outdated src/ports/postgres/modules/graph/pagerank.sql_in Outdated
Show outdated Hide outdated src/ports/postgres/modules/graph/pagerank.py_in Outdated
Show outdated Hide outdated src/ports/postgres/modules/graph/pagerank.py_in Outdated
Show outdated Hide outdated src/ports/postgres/modules/graph/pagerank.py_in Outdated
Show outdated Hide outdated src/ports/postgres/modules/graph/test/pagerank.sql_in Outdated
Show outdated Hide outdated src/ports/postgres/modules/graph/test/pagerank.sql_in Outdated
Show outdated Hide outdated src/ports/postgres/modules/graph/pagerank.sql_in Outdated
@asfgit

This comment has been minimized.

Show comment
Hide comment
@asfgit

asfgit Mar 21, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/387/

asfgit commented Mar 21, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/387/

@kaknikhil

This comment has been minimized.

Show comment
Hide comment
@kaknikhil

kaknikhil Mar 24, 2018

Contributor

@hpandeycodeit
Can you add more description in the final commit (maybe use the same description as the PR).
This is the convention we use for our commit messages.

Commit Title

JIRA: MADLIB-1084

Detailed description

See more here : https://cwiki.apache.org/confluence/display/MADLIB/Contribution+Guidelines

Contributor

kaknikhil commented Mar 24, 2018

@hpandeycodeit
Can you add more description in the final commit (maybe use the same description as the PR).
This is the convention we use for our commit messages.

Commit Title

JIRA: MADLIB-1084

Detailed description

See more here : https://cwiki.apache.org/confluence/display/MADLIB/Contribution+Guidelines

@asfgit

This comment has been minimized.

Show comment
Hide comment
@asfgit

asfgit Mar 26, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/404/

asfgit commented Mar 26, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/404/

@jingyimei

This code works well with grouping. Great job! My concern now is more about performance. Here is some test result from local machine with test cases in install check sql file:

In a centos 6 docker container with GP5, normal pagerank takes ~40s to run install check, while PPR code takes ~160s.

From a single query run, it takes around 3X time for a query with PPR.
Without PPR: ~3400ms to run without grouping
~35500ms to run with grouping
With PPR: ~10500ms to run withoug grouping with 2 special nodes
~109000ms to run with grouping with 2 special nodes

I would suggest to look at new queries that get added/modified and see if we can do anything to reduce the run time.

Show outdated Hide outdated src/ports/postgres/modules/graph/pagerank.py_in Outdated
Show outdated Hide outdated src/ports/postgres/modules/graph/pagerank.py_in Outdated
Show outdated Hide outdated src/ports/postgres/modules/graph/pagerank.py_in Outdated
Show outdated Hide outdated src/ports/postgres/modules/graph/pagerank.py_in Outdated
Show outdated Hide outdated src/ports/postgres/modules/graph/pagerank.py_in Outdated
Show outdated Hide outdated src/ports/postgres/modules/graph/pagerank.py_in Outdated
Show outdated Hide outdated src/ports/postgres/modules/graph/pagerank.py_in Outdated
Show outdated Hide outdated src/ports/postgres/modules/graph/test/pagerank.sql_in Outdated
Show outdated Hide outdated src/ports/postgres/modules/graph/test/pagerank.sql_in Outdated
Show outdated Hide outdated src/ports/postgres/modules/graph/test/pagerank.sql_in Outdated
@asfgit

This comment has been minimized.

Show comment
Hide comment
@asfgit

asfgit Apr 2, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/419/

asfgit commented Apr 2, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/419/

@asfgit

This comment has been minimized.

Show comment
Hide comment
@asfgit

asfgit Apr 6, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/426/

asfgit commented Apr 6, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/426/

@asfgit

This comment has been minimized.

Show comment
Hide comment
@asfgit

asfgit Apr 6, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/427/

asfgit commented Apr 6, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/427/

@asfgit

This comment has been minimized.

Show comment
Hide comment
@asfgit

asfgit Apr 11, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/434/

asfgit commented Apr 11, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/434/

@asfgit

This comment has been minimized.

Show comment
Hide comment
@asfgit

asfgit Apr 11, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/435/

asfgit commented Apr 11, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/435/

@asfgit

This comment has been minimized.

Show comment
Hide comment
@asfgit

asfgit Apr 11, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/437/

asfgit commented Apr 11, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/437/

@asfgit

This comment has been minimized.

Show comment
Hide comment
@asfgit

asfgit Apr 11, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/438/

asfgit commented Apr 11, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/438/

@asfgit

This comment has been minimized.

Show comment
Hide comment
@asfgit

asfgit Apr 11, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/440/

asfgit commented Apr 11, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/440/

@asfgit

This comment has been minimized.

Show comment
Hide comment
@asfgit

asfgit Apr 12, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/441/

asfgit commented Apr 12, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/441/

@asfgit

This comment has been minimized.

Show comment
Hide comment
@asfgit

asfgit Apr 12, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/442/

asfgit commented Apr 12, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/442/

@asfgit

This comment has been minimized.

Show comment
Hide comment
@asfgit

asfgit Apr 13, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/446/

asfgit commented Apr 13, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/446/

@asfgit

This comment has been minimized.

Show comment
Hide comment
@asfgit

asfgit Apr 13, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/447/

asfgit commented Apr 13, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/447/

@asfgit

This comment has been minimized.

Show comment
Hide comment
@asfgit

asfgit Apr 16, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/451/

asfgit commented Apr 16, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/451/

Pagerank: Add Personalized Page Rank option
JIRA: MADLIB-1084

This commit introduces personalized Page Rank by adding a new optional intput
parameter in pagerank interface.

Personalization vertices from input parameter will have a higher jump probability as
compared to other vertices and random surfer is more likely to jump on
these personalization vertices. These personalization vertices are
initialized with an initial probabilty of 1/N where N is the
total number of vertices in the graph and rest of the vertices in the
graph are assigned an initial probability of 0. Pagerank calculated for
these vertices is biased as a random jump probability is assigned to
only these vertices during the pagerank calculation,which is equal to 1
- damping factor.

Also updated:
1. Group support
2. Install check test cases
3. Design doc

Co-authored-by: Himanshu Pandey <hpandey@pivotal.io>
@asfgit

This comment has been minimized.

Show comment
Hide comment
@asfgit

asfgit Apr 17, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/452/

asfgit commented Apr 17, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/452/

@asfgit asfgit closed this in b5c641a Apr 17, 2018

@hpandeycodeit hpandeycodeit deleted the hpandeycodeit:graph_1084 branch Aug 8, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment