New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[Example] GCMC with sampling #1296

Merged

jermainewang merged 27 commits into dmlc:master from classicsong:gcmc-sample

Apr 13, 2020

Contributor

classicsong commented Feb 29, 2020 •

edited

Loading

Description

This PR contains a sampling-based GCMC using new sample APIs.

Checklist

Please feel free to remove inapplicable items for your PR.

The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage
Code is well-documented
To the my best knowledge, examples are either not affected by this change,
or have been fixed to be compatible with this change
Related issue is referred in this PR

Changes


          gcmc example

10669cb

classicsong requested a review from jermainewang

February 29, 2020 01:51

classicsong changed the title ~~[Example] GCMC with sampling~~ [WIP][Example] GCMC with sampling


          Update Readme

fecb928

classicsong changed the title ~~[WIP][Example] GCMC with sampling~~ [Example] GCMC with sampling


          Add multiprocess support

Member

jermainewang commented Mar 16, 2020

@classicsong The new RGCN example has been merged. You may check it out and adjust this PR accordingly.


          Merge branch 'master' into gcmc-sample

db7b4e4

Contributor Author

classicsong commented Mar 16, 2020

@classicsong The new RGCN example has been merged. You may check it out and adjust this PR accordingly.

I c. will modify GCMC implementation accordingly.

Ubuntu and others added 6 commits

March 16, 2020 14:32

Fix

ce93563


          Multigpu + dataloader

7cedece


          Merge branch 'gcmc-sample' of https://github.com/classicsong/dgl into…

f5ae56d

… gcmc-sample


          Merge branch 'master' into gcmc-sample

53ca25f


          Delete some dead code

83205ca


          Delete more

fc2c036

ZhishengNi commented Mar 17, 2020

@classicsong Hi, I tried your sampling-based GCMC model, yet there was an error: "AttributeError: module 'dgl' has no attribute 'in_subgraph'". I have installed dgl-4.0.2, but the error still exists

Contributor Author

classicsong commented Mar 17, 2020

@classicsong Hi, I tried your sampling-based GCMC model, yet there was an error: "AttributeError: module 'dgl' has no attribute 'in_subgraph'". I have installed dgl-4.0.2, but the error still exists

You should use the nightly-build DGL

ZhishengNi commented Mar 18, 2020

@classicsong Hi, I tried your sampling-based GCMC model, yet there was an error: "AttributeError: module 'dgl' has no attribute 'in_subgraph'". I have installed dgl-4.0.2, but the error still exists

You should use the nightly-build DGL

Thanks. I reinstalled the nightly-build dgl used for CPU, but it didn't work. Does only the nightly-build dgl for GPU work?

upd

2d09359

Contributor Author

classicsong commented Mar 18, 2020

@classicsong Hi, I tried your sampling-based GCMC model, yet there was an error: "AttributeError: module 'dgl' has no attribute 'in_subgraph'". I have installed dgl-4.0.2, but the error still exists

You should use the nightly-build DGL

Thanks. I reinstalled the nightly-build dgl used for CPU, but it didn't work. Does only the nightly-build dgl for GPU work?

Did you install the dgl like: pip install --pre dgl
And what is your platform? If it is mac. there is no nightly-build package. You need to build from source.


          Merge branch 'master' into gcmc-sample

34e3b38

ZhishengNi commented Mar 18, 2020

@classicsong Hi, I tried your sampling-based GCMC model, yet there was an error: "AttributeError: module 'dgl' has no attribute 'in_subgraph'". I have installed dgl-4.0.2, but the error still exists

You should use the nightly-build DGL

Thanks. I reinstalled the nightly-build dgl used for CPU, but it didn't work. Does only the nightly-build dgl for GPU work?

Did you install the dgl like: pip install --pre dgl
And what is your platform? If it is mac. there is no nightly-build package. You need to build from source.

I installed the dgl via command: pip install --pre dgl, but as you said , my platform is mac. I would reinstall the dgl from source. Thanks for your help

ZhishengNi commented Mar 18, 2020

@classicsong i renstalled dgl from source and it worked. Thanks

jermainewang and others added 5 commits

March 20, 2020 15:11


          Merge branch 'master' into gcmc-sample

9f7bf6b


          Add README

cf311e6


          Merge branch 'gcmc-sample' of https://github.com/classicsong/dgl into…

fe84f6b

… gcmc-sample

upd

e1a38a7

Upd

e291c8f

jermainewang reviewed

View reviewed changes

examples/pytorch/gcmc/gcmc_mp_sampling.py Outdated

+                  def __init__(self, num_edges, minibatch_size, dataset):
+                      self.num_edges = num_edges
+                      self.minibatch_size = minibatch_size
+                      self.seed = th.randperm(num_edges)

Member

jermainewang Mar 25, 2020

remove?

examples/pytorch/gcmc/gcmc_mp_sampling.py Outdated

+                  """
+                  def __init__(self, num_edges, minibatch_size, dataset):
+                      self.num_edges = num_edges
+                      self.minibatch_size = minibatch_size

Member

jermainewang Mar 25, 2020

remove

examples/pytorch/gcmc/gcmc_mp_sampling.py Outdated

+                      self.minibatch_size = minibatch_size
+                      self.seed = th.randperm(num_edges)
+                      self.dataset = dataset
+                      self.sample_idx = 0

Member

jermainewang Mar 25, 2020

remove

examples/pytorch/gcmc/gcmc_mp_sampling.py Outdated

+                                                      etype=str(possible_rating_value),
+                                                      vtype=vtype,
+                                                      card=(dataset.train_enc_graph.number_of_nodes(utype),
+                                                            dataset.train_enc_graph.number_of_nodes(vtype))))

Member

jermainewang Mar 25, 2020

card -> num_nodes

examples/pytorch/gcmc/gcmc_mp_sampling.py Outdated

+                          seed_nodes[ntype] = g.nodes[ntype].data[dgl.NID]
+                      frontiner = dgl.in_subgraph(dataset.train_enc_graph, seed_nodes)
+                      frontiner = dgl.to_block(frontiner, seed_nodes)

Member

jermainewang Mar 25, 2020

frontier

examples/pytorch/gcmc/gcmc_mp_sampling.py Outdated

+                      frontiner.srcnodes['user'].data['cj'] = \
+                          dataset.train_enc_graph.nodes['user'].data['cj'][frontiner.srcnodes['user'].data[dgl.NID]]
+                      frontiner.dstnodes['movie'].data['ci'] = \
+                          dataset.train_enc_graph.nodes['movie'].data['ci'][frontiner.dstnodes['movie'].data[dgl.NID]]

Member

jermainewang Mar 25, 2020

The ci and cj will be calculated automatically by the GraphConv module if you plan to follow the new HeteroGraphConv approach. However, this means these normalizers will be calculated based on each sample rather than from the whole graph. Need some field tests to see the influence on the accuracy.

examples/pytorch/gcmc/gcmc_mp_sampling.py Outdated

+                                                     args.gcn_agg_accum,
+                                                     agg_act=self._act,
+                                                     share_user_item_param=args.share_param,
+                                                     device=dev_id)

Member

jermainewang Mar 25, 2020

You mention it's hard to unify the full graph and mini-batch versions. What is the problem here?

Contributor Author

classicsong Mar 26, 2020

For mini-batch, I suppose we always use mix_cpu_gpu implementation that is more relevant to large dataset. We can merge the two code maybe, but I think the code will be more complex to understand. As this is a baseline model, and some users from industry may borrow it, I want to make the multi-gpu/mix_cpu_gpu example code simple.

examples/pytorch/gcmc/gcmc_mp_sampling.py Outdated

+                                                            enc_graph.number_of_nodes(vtype))))
+                      g = dgl.hetero_from_relations(subg)
+                      g = dgl.compact_graphs(g)

Member

jermainewang Mar 25, 2020

Reuse the sampler code?

Contributor Author

classicsong Mar 26, 2020

Yes. For eval, we also can use dataloader.


          Merge branch 'master' into gcmc-sample

8ba5da1

classicsong and others added 9 commits

March 29, 2020 21:30


          Merge branch 'master' into gcmc-sample

8d52be6


          combine full batch and sample GCMCLayer, use HeteroCov

7040cf8

Fix

62c7818


          Update Readme

bb9cadc

udp

6d2f825


          Fix typo

0a74197


          Add cpu run

2b47232


          Merge branch 'master' into gcmc-sample


          some fix and docstring

b591da2

jermainewang merged commit 28117cd into dmlc:master

classicsong deleted the gcmc-sample branch

April 14, 2020 01:06

BarclayII reviewed

View reviewed changes

examples/pytorch/gcmc/model.py

		self.reset_parameters()

		def partial_to(self, device):

Collaborator

BarclayII Apr 14, 2020

Why not put all parameters into GPU?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment