Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PinSAGE samper] Adjust the APIs for PinSAGESamper #3529

Merged
merged 21 commits into from Nov 29, 2021
Merged

[PinSAGE samper] Adjust the APIs for PinSAGESamper #3529

merged 21 commits into from Nov 29, 2021

Conversation

lixiaobai09
Copy link
Contributor

@lixiaobai09 lixiaobai09 commented Nov 21, 2021

Description

Compact the "to_simple", "select_topk" and "gather_row" to one API "randomwalk_topk" in class RandomWalkNeighborSampler. The new API can speed up to about 2X than old three APIs.

Checklist

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented
  • To the best of my knowledge, examples are either not affected by this change,
    or have been fixed to be compatible with this change
  • Related issue is referred in this PR
  • If the PR is for a new model/paper, I've updated the example index here.

Changes

@dgl-bot
Copy link
Collaborator

dgl-bot commented Nov 21, 2021

To trigger regression tests:

  • @dgl-bot run [instance-type] [which tests] [compare-with-branch];
    For example: @dgl-bot run g4dn.4xlarge all dmlc/master or @dgl-bot run c5.9xlarge kernel,api dmlc/master

@lixiaobai09
Copy link
Contributor Author

lixiaobai09 commented Nov 22, 2021

A simple performance test script is as follow:

import dgl
import torch
import scipy
import time

if __name__ == '__main__':
    g = scipy.sparse.random(6000, 8000, 0.3)
    G = dgl.heterograph({
            ('A', 'AB', 'B'): g.nonzero(),
            ('B', 'BA', 'A'): g.T.nonzero()
        })
    print('edge size AB and BA: {}, {}'.format(G.num_edges('AB'), G.num_edges('BA')))
    sampler = dgl.sampling.PinSAGESampler(G, 'A', 'B', 3, 0.5, 200, 10)
    seeds = torch.LongTensor([i for i in range(1000)])
    repeat = 10
    print('graph edges: {}'.format(G.num_edges()))
    t1 = time.time()
    for _ in range(repeat):
        frontier = sampler(seeds)
        print('frontier edges: {}'.format(frontier.num_edges()))
    time_cost = (time.time() - t1)
    # frontier.all_edges(form='uv')
    print('time cost: {:.5f}'.format(time_cost / repeat))

@@ -209,6 +210,47 @@ def random_walk(g, nodes, *, metapath=None, length=None, prob=None, restart_prob
eids = F.from_dgl_nd(eids)
return (traces, eids, types) if return_eids else (traces, types)

def randomwalk_topk(src, dst, num_samples_per_node, k):
"""Select the top-k nodes in src for each node in dst(dedup).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand, this is fusing to_simple, select_topk, and counting the number of occurrences together. The docstring did not show that. Could you rewrite the docstring? Also, I don't think randomwalk_topk is a suitable name since this function is not related to random walk.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, thank you for your reply. I have modified the docstring according to your requirement. I modified these codes to optimize the random walk algorithm at first, so I just give this API with the name "randomwalk_topk". I tried to think of another name but did not get a suitable name, because of my poor English. Could you give me a suggestion about this name?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function is only used for PinSAGE. I'll change it to an internal function instead.

@BarclayII BarclayII self-assigned this Nov 29, 2021
@BarclayII BarclayII merged commit 44f0b5f into dmlc:master Nov 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants