Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime error R-GCN: Expected argument Long; but got CUDAType #672

Closed
abkds opened this issue Jun 19, 2019 · 2 comments 路 Fixed by #674
Assignees

Comments

@abkds
Copy link

@abkds abkds commented Jun 19, 2019

馃悰 Bug

While running the rgcn.py given in the examples pytorch section I am getting this runtime error
RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got CUDAType instead (while checking arguments for embedding)

To Reproduce

Steps to reproduce the behavior:

  1. Clone the dgl repository
  2. Go into examples\pytorch\rgcn
  3. Run python3 link_predict.py -d FB15k-237 --gpu 0

This is the error stack:

Namespace(dataset='FB15k-237', dropout=0.2, eval_batch_size=500, evaluate_every=500, gpu=0, grad_norm=1.0, graph_batch_size=30000, graph_split_size=0.5, lr=0.01, n_bases=100, n_epochs=6000, n_hidden=500, n_layers=2, negative_sample=10, regularization=0.01)
# entities: 14541
# relations: 237
# edges: 272115
Test graph:
C:\Users\t-kadas\dgl\examples\pytorch\rgcn\utils.py:112: RuntimeWarning: divide by zero encountered in true_divide
  norm = 1.0 / in_deg
# nodes: 14541, # edges: 544230
start training...
# sampled nodes: 9466
# sampled edges: 30000
# nodes: 9466, # edges: 30000
Done edge sampling
Traceback (most recent call last):
  File "link_predict.py", line 249, in <module>
    main(args)
  File "link_predict.py", line 164, in main
    loss = model.get_loss(g, data, labels)
  File "link_predict.py", line 75, in get_loss
    embedding = self.forward(g)
  File "link_predict.py", line 62, in forward
    return self.rgcn.forward(g)
  File "C:\Users\t-kadas\dgl\examples\pytorch\rgcn\model.py", line 54, in forward
    layer(g)
  File "C:\ProgramData\Anaconda3\envs\nl\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "link_predict.py", line 31, in forward
    g.ndata['h'] = self.embedding(node_id)
  File "C:\ProgramData\Anaconda3\envs\nl\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\ProgramData\Anaconda3\envs\nl\lib\site-packages\torch\nn\modules\sparse.py", line 117, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "C:\ProgramData\Anaconda3\envs\nl\lib\site-packages\torch\nn\functional.py", line 1506, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got CUDAType instead (while checking arguments for embedding)

Expected behavior

As was mentioned in the docs, I expected it to run on the gpu and give a MRR of 0.158

Environment

  • DGL Version (e.g., 1.0): 0.3
  • Backend Library & Version (e.g., PyTorch 0.4.1, MXNet/Gluon 1.3): PyTorch==1.1.0
  • OS (e.g., Linux): Windows 10
  • How you installed DGL (conda, pip, source): Tried both pip and conda
  • Build command you used (if compiling from source):
  • Python version: 3.6.8
  • CUDA/cuDNN version (if applicable): 10.0
  • GPU models and configuration (e.g. V100): Tesla K80
  • Any other relevant information:

Additional context

When I run without GPU it gives the following error RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got CPUType instead (while checking arguments for embedding)

Here is the other issue which was closed a few days ago. I didn't do a proper research before opening this issue. I am sorry for that.
That issue was closed since it was inactive (but I couldn't find the solution), hence I opened it here.

@lingfanyu

This comment has been minimized.

Copy link
Collaborator

@lingfanyu lingfanyu commented Jun 19, 2019

@BarclayII Can you try to see if you can reproduce the problem with a windows machine? @zheng-da and I both tried using docker and fresh new AWS instance, but we cannot reproduce the problem.

@BarclayII

This comment has been minimized.

Copy link
Collaborator

@BarclayII BarclayII commented Jun 20, 2019

node_id is apparently an int32 tensor instead of int64 on Windows.

The problem is that np.array([1,2,3]) on Linux returns an int64 array, while on Windows it returns int32 even on an x64 machine. This is because the default numpy int inherits C long, which is 64-bit on Linux and 32-bit on Windows.

Putting dtype=int64 in either place (the data loader, returning int64 array for uniq_v in generate_sampled_graph_and_labels, or converting node_id to long) would fix the problem.

P.S. I guess from now on the best practice would be always specifying numpy dtype whenever we call np.array() just in case this mess happens. This whole thing is just ridiculous.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can鈥檛 perform that action at this time.