Got 'InvalidArgumentError: indices[0] = 1875 is not in [0, 1795)' in running RSGAN #196

HanlynnKe · 2020-11-19T04:29:13Z

When running the RSGAN algorithm, this error below appears frequently that I can hardly get the predicting process completed. I was wondering whether the "Process Process-2" is right or not. Is it the problem of multi-process?

Predicting [1]...
RSGAN [1] progress:0/1753
RSGAN [1] progress:100/1753
RSGAN [1] progress:200/1753
RSGAN [1] progress:300/1753
RSGAN [1] progress:400/1753
Process Process-2:
Traceback (most recent call last):
  File "/home/azure/miniconda3/envs/py2/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/azure/miniconda3/envs/py2/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/azure/RecQ-master/main/RecQ.py", line 116, in run
    measure[order] = algor.execute()
  File "../baseclass/Recommender.py", line 257, in execute
    self.evalRanking()
  File "../baseclass/Recommender.py", line 153, in evalRanking
    predictedItems = self.predictForRanking(user)
  File "../algorithm/ranking/RSGAN.py", line 448, in predictForRanking
    res = self.sess.run(self.d_output, {self.u_idx:[u]})
  File "/home/azure/miniconda3/envs/py2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 950, in run
    run_metadata_ptr)
  File "/home/azure/miniconda3/envs/py2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1173, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/azure/miniconda3/envs/py2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
    run_metadata)
  File "/home/azure/miniconda3/envs/py2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
    raise type(e)(node_def, op, message)
InvalidArgumentError: indices[0] = 1875 is not in [0, 1795)
         [[node discriminator/u_e (defined at ../algorithm/ranking/RSGAN.py:334) ]]

Errors may have originated from an input operation.
Input Source operations connected to node discriminator/u_e:
 U/read (defined at ../baseclass/DeepRecommender.py:24) 
 u_idx (defined at ../baseclass/DeepRecommender.py:19)

The text was updated successfully, but these errors were encountered:

HanlynnKe · 2020-11-19T04:30:22Z

My Azure machine is using the miniconda with python=2.7 and tensorflow=1.14

Coder-Yu · 2020-11-19T05:39:06Z

I cannot reproduce this problem. I suggest you run this lib in your local experimental environments.

You can also change the evaluation setting '-cv 5' to '-ap 0.2' in the config file to see if this problem is related to multi-processing.

HanlynnKe · 2020-11-19T08:37:43Z

The problem is not related to multi-processing. Both my machine and the remote server have the same error like below.
Could you please give more details about how you set up the virtual environment for RecQ? I am not sure whether my procedures are correct or not.

My procedures are:
① conda create -n py2 python=2.7
② conda activate py2
③ conda install tensorflow=1.14
④ conda install gensim

Predicting [1]...
RSGAN [1] progress:0/1725
RSGAN [1] progress:100/1725
RSGAN [1] progress:200/1725
RSGAN [1] progress:300/1725
Traceback (most recent call last):
  File "main.py", line 68, in <module>
    recSys.execute()
  File "/Users/kehanlin/VSCodeProjects/RecQ-master/main/RecQ.py", line 113, in execute
    eval(recommender).execute()
  File "../baseclass/Recommender.py", line 257, in execute
    self.evalRanking()
  File "../baseclass/Recommender.py", line 153, in evalRanking
    predictedItems = self.predictForRanking(user)
  File "../algorithm/ranking/RSGAN.py", line 448, in predictForRanking
    res = self.sess.run(self.d_output, {self.u_idx:[u]})
  File "/Users/kehanlin/opt/anaconda3/envs/py2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 950, in run
    run_metadata_ptr)
  File "/Users/kehanlin/opt/anaconda3/envs/py2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1173, in _run
    feed_dict_tensor, options, run_metadata)
  File "/Users/kehanlin/opt/anaconda3/envs/py2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
    run_metadata)
  File "/Users/kehanlin/opt/anaconda3/envs/py2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0] = 1877 is not in [0, 1794)
         [[node discriminator/u_e (defined at ../algorithm/ranking/RSGAN.py:334) ]]

Errors may have originated from an input operation.
Input Source operations connected to node discriminator/u_e:
 U/read (defined at ../baseclass/DeepRecommender.py:24)
 u_idx (defined at ../baseclass/DeepRecommender.py:19)

Original stack trace for u'discriminator/u_e':
  File "main.py", line 68, in <module>
    recSys.execute()
  File "/Users/kehanlin/VSCodeProjects/RecQ-master/main/RecQ.py", line 113, in execute
    eval(recommender).execute()
  File "../baseclass/Recommender.py", line 243, in execute
    self.initModel()
  File "../algorithm/ranking/RSGAN.py", line 407, in initModel
    self.build_graph()
  File "../algorithm/ranking/RSGAN.py", line 334, in build_graph
    self.u_embedding = tf.nn.embedding_lookup(self.user_embeddings, self.u_idx,name='u_e')
  File "/Users/kehanlin/opt/anaconda3/envs/py2/lib/python2.7/site-packages/tensorflow/python/ops/embedding_ops.py", line 315, in embedding_lookup
    transform_fn=None)
  File "/Users/kehanlin/opt/anaconda3/envs/py2/lib/python2.7/site-packages/tensorflow/python/ops/embedding_ops.py", line 133, in _embedding_lookup_and_transform
    array_ops.gather(params[0], ids, name=name), ids, max_norm)
  File "/Users/kehanlin/opt/anaconda3/envs/py2/lib/python2.7/site-packages/tensorflow/python/util/dispatch.py", line 180, in wrapper
    return target(*args, **kwargs)
  File "/Users/kehanlin/opt/anaconda3/envs/py2/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 3475, in gather
    return gen_array_ops.gather_v2(params, indices, axis, name=name)
  File "/Users/kehanlin/opt/anaconda3/envs/py2/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 4097, in gather_v2
    batch_dims=batch_dims, name=name)
  File "/Users/kehanlin/opt/anaconda3/envs/py2/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/Users/kehanlin/opt/anaconda3/envs/py2/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/Users/kehanlin/opt/anaconda3/envs/py2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
    op_def=op_def)
  File "/Users/kehanlin/opt/anaconda3/envs/py2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2005, in __init__
    self._traceback = tf_stack.extract_stack()

Coder-Yu · 2020-11-19T11:38:48Z

I have downloaded the project and deployed it on my server. The algorithm works well.
I use the gpu version of tensorflow. I am not sure if it could be the problem.

Coder-Yu · 2020-11-19T11:39:54Z

Coder-Yu · 2020-11-19T11:43:15Z

If you still can not solve it, I suggest you try other algorithms to see whether the same problem would happen. Anyway, I have tried my best and there are no others who had reported the same problem.

HanlynnKe · 2020-11-19T11:45:19Z

Thank you for your help! I am trying to debug the codes on my CPU ver. tensorflow. I would ask my group member to try the algorithm on GPU ver. tensorflow. Thanks again!

HanlynnKe · 2020-11-20T07:16:55Z

To be more specific, I found that when I used the lastFM dataset, the output of algorithm is:

Then, when the algorithm began to predict the testset, the error occured like #195

HanlynnKe · 2020-11-20T07:24:55Z

I have tried other algorithms (i.e. IRGAN) and the error above did not show up. I also notice that [0, 1790) should be the user count in the training set, however, 1795 might be the userID that sampled from testing set.

Coder-Yu · 2020-11-21T11:54:53Z

I have refactored some code segments and found out the possible mistake which could lead to such an error. Though I cannot reproduce the error, I guess the problem has been solved. You could try the new version.

HanlynnKe · 2020-11-22T09:22:36Z

Thank you very much! Problem solved! I can successfully run RSGAN on my CPU ver. tensorflow now ;-)

HanlynnKe closed this as completed Nov 19, 2020

HanlynnKe reopened this Nov 20, 2020

HanlynnKe closed this as completed Nov 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Got 'InvalidArgumentError: indices[0] = 1875 is not in [0, 1795)' in running RSGAN #196

Got 'InvalidArgumentError: indices[0] = 1875 is not in [0, 1795)' in running RSGAN #196

HanlynnKe commented Nov 19, 2020

HanlynnKe commented Nov 19, 2020

Coder-Yu commented Nov 19, 2020

HanlynnKe commented Nov 19, 2020

Coder-Yu commented Nov 19, 2020

Coder-Yu commented Nov 19, 2020

Coder-Yu commented Nov 19, 2020

HanlynnKe commented Nov 19, 2020

HanlynnKe commented Nov 20, 2020

HanlynnKe commented Nov 20, 2020

Coder-Yu commented Nov 21, 2020

HanlynnKe commented Nov 22, 2020

Got 'InvalidArgumentError: indices[0] = 1875 is not in [0, 1795)' in running RSGAN #196

Got 'InvalidArgumentError: indices[0] = 1875 is not in [0, 1795)' in running RSGAN #196

Comments

HanlynnKe commented Nov 19, 2020

HanlynnKe commented Nov 19, 2020

Coder-Yu commented Nov 19, 2020

HanlynnKe commented Nov 19, 2020

Coder-Yu commented Nov 19, 2020

Coder-Yu commented Nov 19, 2020

Coder-Yu commented Nov 19, 2020

HanlynnKe commented Nov 19, 2020

HanlynnKe commented Nov 20, 2020

HanlynnKe commented Nov 20, 2020

Coder-Yu commented Nov 21, 2020

HanlynnKe commented Nov 22, 2020