Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Got 'InvalidArgumentError: indices[0] = 1875 is not in [0, 1795)' in running RSGAN #196

Closed
HanlynnKe opened this issue Nov 19, 2020 · 11 comments

Comments

@HanlynnKe
Copy link

When running the RSGAN algorithm, this error below appears frequently that I can hardly get the predicting process completed. I was wondering whether the "Process Process-2" is right or not. Is it the problem of multi-process?

Predicting [1]...
RSGAN [1] progress:0/1753
RSGAN [1] progress:100/1753
RSGAN [1] progress:200/1753
RSGAN [1] progress:300/1753
RSGAN [1] progress:400/1753
Process Process-2:
Traceback (most recent call last):
  File "/home/azure/miniconda3/envs/py2/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/azure/miniconda3/envs/py2/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/azure/RecQ-master/main/RecQ.py", line 116, in run
    measure[order] = algor.execute()
  File "../baseclass/Recommender.py", line 257, in execute
    self.evalRanking()
  File "../baseclass/Recommender.py", line 153, in evalRanking
    predictedItems = self.predictForRanking(user)
  File "../algorithm/ranking/RSGAN.py", line 448, in predictForRanking
    res = self.sess.run(self.d_output, {self.u_idx:[u]})
  File "/home/azure/miniconda3/envs/py2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 950, in run
    run_metadata_ptr)
  File "/home/azure/miniconda3/envs/py2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1173, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/azure/miniconda3/envs/py2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
    run_metadata)
  File "/home/azure/miniconda3/envs/py2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
    raise type(e)(node_def, op, message)
InvalidArgumentError: indices[0] = 1875 is not in [0, 1795)
         [[node discriminator/u_e (defined at ../algorithm/ranking/RSGAN.py:334) ]]

Errors may have originated from an input operation.
Input Source operations connected to node discriminator/u_e:
 U/read (defined at ../baseclass/DeepRecommender.py:24) 
 u_idx (defined at ../baseclass/DeepRecommender.py:19)
@HanlynnKe
Copy link
Author

My Azure machine is using the miniconda with python=2.7 and tensorflow=1.14

@Coder-Yu
Copy link
Owner

I cannot reproduce this problem. I suggest you run this lib in your local experimental environments.

You can also change the evaluation setting '-cv 5' to '-ap 0.2' in the config file to see if this problem is related to multi-processing.

@HanlynnKe
Copy link
Author

The problem is not related to multi-processing. Both my machine and the remote server have the same error like below.
Could you please give more details about how you set up the virtual environment for RecQ? I am not sure whether my procedures are correct or not.

My procedures are:
① conda create -n py2 python=2.7
② conda activate py2
③ conda install tensorflow=1.14
④ conda install gensim

Predicting [1]...
RSGAN [1] progress:0/1725
RSGAN [1] progress:100/1725
RSGAN [1] progress:200/1725
RSGAN [1] progress:300/1725
Traceback (most recent call last):
  File "main.py", line 68, in <module>
    recSys.execute()
  File "/Users/kehanlin/VSCodeProjects/RecQ-master/main/RecQ.py", line 113, in execute
    eval(recommender).execute()
  File "../baseclass/Recommender.py", line 257, in execute
    self.evalRanking()
  File "../baseclass/Recommender.py", line 153, in evalRanking
    predictedItems = self.predictForRanking(user)
  File "../algorithm/ranking/RSGAN.py", line 448, in predictForRanking
    res = self.sess.run(self.d_output, {self.u_idx:[u]})
  File "/Users/kehanlin/opt/anaconda3/envs/py2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 950, in run
    run_metadata_ptr)
  File "/Users/kehanlin/opt/anaconda3/envs/py2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1173, in _run
    feed_dict_tensor, options, run_metadata)
  File "/Users/kehanlin/opt/anaconda3/envs/py2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
    run_metadata)
  File "/Users/kehanlin/opt/anaconda3/envs/py2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0] = 1877 is not in [0, 1794)
         [[node discriminator/u_e (defined at ../algorithm/ranking/RSGAN.py:334) ]]

Errors may have originated from an input operation.
Input Source operations connected to node discriminator/u_e:
 U/read (defined at ../baseclass/DeepRecommender.py:24)
 u_idx (defined at ../baseclass/DeepRecommender.py:19)

Original stack trace for u'discriminator/u_e':
  File "main.py", line 68, in <module>
    recSys.execute()
  File "/Users/kehanlin/VSCodeProjects/RecQ-master/main/RecQ.py", line 113, in execute
    eval(recommender).execute()
  File "../baseclass/Recommender.py", line 243, in execute
    self.initModel()
  File "../algorithm/ranking/RSGAN.py", line 407, in initModel
    self.build_graph()
  File "../algorithm/ranking/RSGAN.py", line 334, in build_graph
    self.u_embedding = tf.nn.embedding_lookup(self.user_embeddings, self.u_idx,name='u_e')
  File "/Users/kehanlin/opt/anaconda3/envs/py2/lib/python2.7/site-packages/tensorflow/python/ops/embedding_ops.py", line 315, in embedding_lookup
    transform_fn=None)
  File "/Users/kehanlin/opt/anaconda3/envs/py2/lib/python2.7/site-packages/tensorflow/python/ops/embedding_ops.py", line 133, in _embedding_lookup_and_transform
    array_ops.gather(params[0], ids, name=name), ids, max_norm)
  File "/Users/kehanlin/opt/anaconda3/envs/py2/lib/python2.7/site-packages/tensorflow/python/util/dispatch.py", line 180, in wrapper
    return target(*args, **kwargs)
  File "/Users/kehanlin/opt/anaconda3/envs/py2/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 3475, in gather
    return gen_array_ops.gather_v2(params, indices, axis, name=name)
  File "/Users/kehanlin/opt/anaconda3/envs/py2/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 4097, in gather_v2
    batch_dims=batch_dims, name=name)
  File "/Users/kehanlin/opt/anaconda3/envs/py2/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/Users/kehanlin/opt/anaconda3/envs/py2/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/Users/kehanlin/opt/anaconda3/envs/py2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
    op_def=op_def)
  File "/Users/kehanlin/opt/anaconda3/envs/py2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2005, in __init__
    self._traceback = tf_stack.extract_stack()

@Coder-Yu
Copy link
Owner

I have downloaded the project and deployed it on my server. The algorithm works well.
I use the gpu version of tensorflow. I am not sure if it could be the problem.

@Coder-Yu
Copy link
Owner

image

@Coder-Yu
Copy link
Owner

If you still can not solve it, I suggest you try other algorithms to see whether the same problem would happen. Anyway, I have tried my best and there are no others who had reported the same problem.

@HanlynnKe
Copy link
Author

Thank you for your help! I am trying to debug the codes on my CPU ver. tensorflow. I would ask my group member to try the algorithm on GPU ver. tensorflow. Thanks again!

@HanlynnKe HanlynnKe reopened this Nov 20, 2020
@HanlynnKe
Copy link
Author

To be more specific, I found that when I used the lastFM dataset, the output of algorithm is:
image
Then, when the algorithm began to predict the testset, the error occured like #195
image

@HanlynnKe
Copy link
Author

I have tried other algorithms (i.e. IRGAN) and the error above did not show up. I also notice that [0, 1790) should be the user count in the training set, however, 1795 might be the userID that sampled from testing set.

@Coder-Yu
Copy link
Owner

I have refactored some code segments and found out the possible mistake which could lead to such an error. Though I cannot reproduce the error, I guess the problem has been solved. You could try the new version.

@HanlynnKe
Copy link
Author

Thank you very much! Problem solved! I can successfully run RSGAN on my CPU ver. tensorflow now ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants