-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(External) Cuda error(9), invalid configuration argument. #34561
Comments
您好,我们已经收到了您的问题,会安排技术人员尽快解答您的问题,请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时,您也可以通过查看官网API文档、常见问题、历史Issue、AI社区来寻求解答。祝您生活愉快~ Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the API,FAQ,Github Issue and AI community to get the answer.Have a nice day! |
没人回应么 |
请问是aiStudio 上的使用问题吗 |
应该是paddle的问题 |
这个错误可能是因为AIStudio容器shared memory只有64M,无法设置成更大的。可以将DataLoader的use_shared_memory设置成False试下 |
我没有用DataLoader,那我该怎么设置呢? |
本地GPU可以跑,但是结果和CPU差很多,PaddlePaddle/PGL#303 |
AIStudio容器shared memory只有64MB, 近期AIStudio将会对此扩容. 大概需要1个月左右吧. 预计提升到512MB或1GB左右. |
您好,目前版本的aistudio(paddle2.1.2)已经不再复现次问题,请尝试。 |
版本、环境信息:
官方ai studio kernel
Python版本:python 3.7
框架版本:PaddlePaddle 2.1.0
看报错是mean op的时候报错了,cpu运行正常,gpu运行报错,调整batch_size也没有用。
代码地址:https://aistudio.baidu.com/aistudio/projectdetail/2246354
报错日志如下:
6 args = parser.parse_args(args=[])
7 log.info(args)
----> 8 pred = main(args)
in main(args)
23 train_loss, train_acc = train(train_index[batch_numbatch_size:], train_label[batch_numbatch_size:], gnn_model,graph, criterion, optim)
24 else:
---> 25 train_loss, train_acc = train(train_index[batch_num*batch_size:(batch_num+1)batch_size], train_label[batch_numbatch_size:(batch_num+1)*batch_size], gnn_model,graph, criterion, optim)
26 writer.add_scalar(tag="train/loss", step=epoch, value=train_loss)
27 writer.add_scalar(tag="train/acc", step=epoch, value=train_acc)
in train(node_index, node_label, gnn_model, graph, criterion, optim)
3 pred = gnn_model(graph, graph.node_feat["words"])
4 pred = paddle.gather(pred, node_index)
----> 5 loss = criterion(pred, node_label)
6 loss.backward()
7 acc = paddle.metric.accuracy(input=pred, label=node_label, k=1)
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py in call(self, *inputs, **kwargs)
896 self._built = True
897
--> 898 outputs = self.forward(*inputs, **kwargs)
899
900 for forward_post_hook in self._forward_post_hooks.values():
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/nn/layer/loss.py in forward(self, input, label)
403 axis=self.axis,
404 use_softmax=self.use_softmax,
--> 405 name=self.name)
406
407 return ret
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/nn/functional/loss.py in cross_entropy(input, label, weight, ignore_index, reduction, soft_label, axis, use_softmax, name)
1462 return out_sum / (total_weight + (total_weight == 0.0))
1463 else:
-> 1464 return core.ops.mean(out)
1465
1466 else:
OSError: (External) Cuda error(9), invalid configuration argument.
[Advise: This indicates that a kernel launch is requesting resources that can never be satisfied by the current device. Requestingmore shared memory per block than the device supports will trigger this error, as will requesting too many threads or blocks.See cudaDeviceProp for more device limitations.] (at /paddle/paddle/fluid/operators/mean_op.cu:75)
[operator < mean > error]
The text was updated successfully, but these errors were encountered: