Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(External) Cuda error(9), invalid configuration argument. #34561

Closed
zhangpu1211 opened this issue Aug 2, 2021 · 9 comments
Closed

(External) Cuda error(9), invalid configuration argument. #34561

zhangpu1211 opened this issue Aug 2, 2021 · 9 comments
Assignees
Labels
status/close 已关闭

Comments

@zhangpu1211
Copy link

版本、环境信息:
官方ai studio kernel
Python版本:python 3.7
框架版本:PaddlePaddle 2.1.0
看报错是mean op的时候报错了,cpu运行正常,gpu运行报错,调整batch_size也没有用。
代码地址:https://aistudio.baidu.com/aistudio/projectdetail/2246354
报错日志如下:
6 args = parser.parse_args(args=[])
7 log.info(args)
----> 8 pred = main(args)
in main(args)
23 train_loss, train_acc = train(train_index[batch_numbatch_size:], train_label[batch_numbatch_size:], gnn_model,graph, criterion, optim)
24 else:
---> 25 train_loss, train_acc = train(train_index[batch_num*batch_size:(batch_num+1)batch_size], train_label[batch_numbatch_size:(batch_num+1)*batch_size], gnn_model,graph, criterion, optim)
26 writer.add_scalar(tag="train/loss", step=epoch, value=train_loss)
27 writer.add_scalar(tag="train/acc", step=epoch, value=train_acc)
in train(node_index, node_label, gnn_model, graph, criterion, optim)
3 pred = gnn_model(graph, graph.node_feat["words"])
4 pred = paddle.gather(pred, node_index)
----> 5 loss = criterion(pred, node_label)
6 loss.backward()
7 acc = paddle.metric.accuracy(input=pred, label=node_label, k=1)
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py in call(self, *inputs, **kwargs)
896 self._built = True
897
--> 898 outputs = self.forward(*inputs, **kwargs)
899
900 for forward_post_hook in self._forward_post_hooks.values():
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/nn/layer/loss.py in forward(self, input, label)
403 axis=self.axis,
404 use_softmax=self.use_softmax,
--> 405 name=self.name)
406
407 return ret
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/nn/functional/loss.py in cross_entropy(input, label, weight, ignore_index, reduction, soft_label, axis, use_softmax, name)
1462 return out_sum / (total_weight + (total_weight == 0.0))
1463 else:
-> 1464 return core.ops.mean(out)
1465
1466 else:
OSError: (External) Cuda error(9), invalid configuration argument.
[Advise: This indicates that a kernel launch is requesting resources that can never be satisfied by the current device. Requestingmore shared memory per block than the device supports will trigger this error, as will requesting too many threads or blocks.See cudaDeviceProp for more device limitations.] (at /paddle/paddle/fluid/operators/mean_op.cu:75)
[operator < mean > error]

@paddle-bot-old
Copy link

paddle-bot-old bot commented Aug 2, 2021

您好,我们已经收到了您的问题,会安排技术人员尽快解答您的问题,请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时,您也可以通过查看官网API文档常见问题历史IssueAI社区来寻求解答。祝您生活愉快~

Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the APIFAQGithub Issue and AI community to get the answer.Have a nice day!

@zhangpu1211
Copy link
Author

没人回应么

@DannyIsFunny
Copy link
Contributor

请问是aiStudio 上的使用问题吗

@zhangpu1211
Copy link
Author

请问是aiStudio 上的使用问题吗

应该是paddle的问题

@DannyIsFunny
Copy link
Contributor

这个错误可能是因为AIStudio容器shared memory只有64M,无法设置成更大的。可以将DataLoader的use_shared_memory设置成False试下

@zhangpu1211
Copy link
Author

这个错误可能是因为AIStudio容器shared memory只有64M,无法设置成更大的。可以将DataLoader的use_shared_memory设置成False试下

我没有用DataLoader,那我该怎么设置呢?

@zhangpu1211
Copy link
Author

zhangpu1211 commented Aug 4, 2021

本地GPU可以跑,但是结果和CPU差很多,PaddlePaddle/PGL#303

@weiexcelpro
Copy link

AIStudio容器shared memory只有64MB, 近期AIStudio将会对此扩容. 大概需要1个月左右吧. 预计提升到512MB或1GB左右.

@shangzhizhou
Copy link
Member

您好,目前版本的aistudio(paddle2.1.2)已经不再复现次问题,请尝试。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/close 已关闭
Projects
None yet
Development

No branches or pull requests

4 participants