Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to solve the huge matrix (>2G) #7

Open
fjssharpsword opened this issue Jul 19, 2019 · 4 comments
Open

how to solve the huge matrix (>2G) #7

fjssharpsword opened this issue Jul 19, 2019 · 4 comments

Comments

@fjssharpsword
Copy link

Problem:
when I conducts an experiment on Pinterest-20 dataset, the tensorflow give errors "ValueError: Cannot create a tensor proto whose content is larger than 2GB.".
Errors occur on the "self.user_item_embedding = tf.convert_to_tensor(self.dataSet.getEmbedding())".

Now, I solves it by using tf.variable.
Such as: https://blog.csdn.net/fjssharpsword/article/details/96431553

The change of code is :
def add_embedding_matrix(self): self.matrix_init = tf.placeholder(tf.float32, shape=(self.shape[0], self.shape[1])) matrix = tf.Variable(self.matrix_init) self.user_item_embedding = tf.convert_to_tensor(matrix) #self.user_item_embedding = tf.convert_to_tensor(self.dataSet.getEmbedding()) self.item_user_embedding = tf.transpose(self.user_item_embedding)

def init_sess(self): self.config = tf.ConfigProto() self.config.gpu_options.allow_growth = True self.config.allow_soft_placement = True self.sess = tf.Session(config=self.config) #self.sess.run(tf.global_variables_initializer()) self.sess.run(tf.global_variables_initializer(), feed_dict={self.matrix_init: self.dataSet.getEmbedding()})

Is it works? I don't know! Can you give me other effective solutions, thanks!

@www6130911
Copy link

你好,请问这个问题你解决了吗?

@fjssharpsword
Copy link
Author

https://blog.csdn.net/fjssharpsword/article/details/96431553
This is my solution, but it runs slowly.

@www6130911
Copy link

你好,我使用你的方法处理矩阵过大的问题,然后使用movielens10M(User Num: 71567,Item Num: 65133)的数据集进行训练。结果出现下面的错误,不知道是什么原因。还请指教一下!

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[71567,65133] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator mklcpu
[[node Variable/Adam_1/Assign (defined at D:/PycharmProjects/Basic-DMF-Model/main.py:139) = Assign[T=DT_FLOAT, _class=["loc:@Variable/Assign"], _grappler_relax_allocator_constraints=true, use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Variable/Adam_1, Variable/Adam/Initializer/zeros)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

@0215Arthur
Copy link

我也遇到了类似的问题,感觉作者根本没有考虑大评分矩阵的问题,DSSM还作了hash,这个就直接放onehot向量进去,且不说能不能跑,就这个存储量一般机器都承受不了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants