Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Example] GCMC with sampling #1296

Merged
merged 27 commits into from
Apr 13, 2020
Merged

Conversation

classicsong
Copy link
Contributor

@classicsong classicsong commented Feb 29, 2020

Description

This PR contains a sampling-based GCMC using new sample APIs.

#957

Checklist

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented
  • To the my best knowledge, examples are either not affected by this change,
    or have been fixed to be compatible with this change
  • Related issue is referred in this PR

Changes

@classicsong classicsong changed the title [Example] GCMC with sampling [WIP][Example] GCMC with sampling Feb 29, 2020
@classicsong classicsong changed the title [WIP][Example] GCMC with sampling [Example] GCMC with sampling Mar 4, 2020
@jermainewang
Copy link
Member

@classicsong The new RGCN example has been merged. You may check it out and adjust this PR accordingly.

@classicsong
Copy link
Contributor Author

@classicsong The new RGCN example has been merged. You may check it out and adjust this PR accordingly.

I c. will modify GCMC implementation accordingly.

@ZhishengNi
Copy link

@classicsong Hi, I tried your sampling-based GCMC model, yet there was an error: "AttributeError: module 'dgl' has no attribute 'in_subgraph'". I have installed dgl-4.0.2, but the error still exists

@classicsong
Copy link
Contributor Author

@classicsong Hi, I tried your sampling-based GCMC model, yet there was an error: "AttributeError: module 'dgl' has no attribute 'in_subgraph'". I have installed dgl-4.0.2, but the error still exists

You should use the nightly-build DGL

@ZhishengNi
Copy link

@classicsong Hi, I tried your sampling-based GCMC model, yet there was an error: "AttributeError: module 'dgl' has no attribute 'in_subgraph'". I have installed dgl-4.0.2, but the error still exists

You should use the nightly-build DGL

Thanks. I reinstalled the nightly-build dgl used for CPU, but it didn't work. Does only the nightly-build dgl for GPU work?

@classicsong
Copy link
Contributor Author

@classicsong Hi, I tried your sampling-based GCMC model, yet there was an error: "AttributeError: module 'dgl' has no attribute 'in_subgraph'". I have installed dgl-4.0.2, but the error still exists

You should use the nightly-build DGL

Thanks. I reinstalled the nightly-build dgl used for CPU, but it didn't work. Does only the nightly-build dgl for GPU work?

Did you install the dgl like: pip install --pre dgl
And what is your platform? If it is mac. there is no nightly-build package. You need to build from source.

@ZhishengNi
Copy link

@classicsong Hi, I tried your sampling-based GCMC model, yet there was an error: "AttributeError: module 'dgl' has no attribute 'in_subgraph'". I have installed dgl-4.0.2, but the error still exists

You should use the nightly-build DGL

Thanks. I reinstalled the nightly-build dgl used for CPU, but it didn't work. Does only the nightly-build dgl for GPU work?

Did you install the dgl like: pip install --pre dgl
And what is your platform? If it is mac. there is no nightly-build package. You need to build from source.

I installed the dgl via command: pip install --pre dgl, but as you said , my platform is mac. I would reinstall the dgl from source. Thanks for your help

@ZhishengNi
Copy link

@classicsong i renstalled dgl from source and it worked. Thanks

def __init__(self, num_edges, minibatch_size, dataset):
self.num_edges = num_edges
self.minibatch_size = minibatch_size
self.seed = th.randperm(num_edges)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove?

"""
def __init__(self, num_edges, minibatch_size, dataset):
self.num_edges = num_edges
self.minibatch_size = minibatch_size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

self.minibatch_size = minibatch_size
self.seed = th.randperm(num_edges)
self.dataset = dataset
self.sample_idx = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

etype=str(possible_rating_value),
vtype=vtype,
card=(dataset.train_enc_graph.number_of_nodes(utype),
dataset.train_enc_graph.number_of_nodes(vtype))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

card -> num_nodes

seed_nodes[ntype] = g.nodes[ntype].data[dgl.NID]

frontiner = dgl.in_subgraph(dataset.train_enc_graph, seed_nodes)
frontiner = dgl.to_block(frontiner, seed_nodes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

frontier

frontiner.srcnodes['user'].data['cj'] = \
dataset.train_enc_graph.nodes['user'].data['cj'][frontiner.srcnodes['user'].data[dgl.NID]]
frontiner.dstnodes['movie'].data['ci'] = \
dataset.train_enc_graph.nodes['movie'].data['ci'][frontiner.dstnodes['movie'].data[dgl.NID]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ci and cj will be calculated automatically by the GraphConv module if you plan to follow the new HeteroGraphConv approach. However, this means these normalizers will be calculated based on each sample rather than from the whole graph. Need some field tests to see the influence on the accuracy.

args.gcn_agg_accum,
agg_act=self._act,
share_user_item_param=args.share_param,
device=dev_id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mention it's hard to unify the full graph and mini-batch versions. What is the problem here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For mini-batch, I suppose we always use mix_cpu_gpu implementation that is more relevant to large dataset. We can merge the two code maybe, but I think the code will be more complex to understand. As this is a baseline model, and some users from industry may borrow it, I want to make the multi-gpu/mix_cpu_gpu example code simple.

enc_graph.number_of_nodes(vtype))))

g = dgl.hetero_from_relations(subg)
g = dgl.compact_graphs(g)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reuse the sampler code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. For eval, we also can use dataloader.

@jermainewang jermainewang merged commit 28117cd into dmlc:master Apr 13, 2020
@classicsong classicsong deleted the gcmc-sample branch April 14, 2020 01:06
self.reset_parameters()

def partial_to(self, device):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not put all parameters into GPU?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants