-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
配置gpu运行book例子的02.recognize_digits,报错CUDA error: invalid device function #5629
Comments
@shiyazhou121 应该是你GPU的问题,可否贴一个运行nvidia-smi的输出的截图? |
@shiyazhou121 |
@kuke 感谢一直帮忙解决问题, 我尝试use_gpu=with_gpu。使用cpu,可以正常运行。结果也和预期一样。 |
刚忘记了,你是否设置了CUDA_VISIBLE_DEVICES这个环境变量, 可以按如下方式设置 |
@shiyazhou121 建议你先试用下docker镜像 |
@kuke 我在https://stackoverflow.com/questions/39850309/how-to-resolve-cudasuccess-err-0-vs-8-error-on-paddle-v0-8-0b,发现了跟我遇到一样的的问题 发现好像需要编译paddle,这个是2016年的,现在应该有些不同吧? |
@shiyazhou121 从你的截图来看,你使用的是 我们CUDA架构查看: https://github.com/PaddlePaddle/Paddle/blob/develop/cmake/flags.cmake , 搜索 Lines 198 to 203 in 56f28a0
这里觉得有点Bug,你可能需要重新编译下。 另外,#5713 这个PR加了对 |
查看GPU架构的方式: cd /usr/local/cuda/samples/1_Utilities/deviceQuery/
make
./deviceQuery 看 |
|
这个issue很久没有更新的消息了,我先关掉了,如果有任何疑问可以随时重新打开它。 |
问题描述:
按照【AI学习】PaddlePaddle深度学习实战-PaddlePaddle在不同平台的安装 (http://learn.baidu.com/pages/index.html#/courseInfo/13655?courseId=13655&_k=usdv7x)中centos 6.3环境安装gpu版paddle方法。首先安装python27-gcc482,然后按照视频中方法配置gpu。
下面是配置的cudnn和cuda的环境变量
LD_LIBRARY_PATH=/usr/local/cuda/lib64:/home/work/cudnn/cudnn_v5/cuda/lib64:/usr/local/ganglia/lib64:/usr/local/apr/lib:/usr/local/cuda/lib64:/usr/local/cuda/lib:/usr/local/ganglia/lib64:/usr/local/apr/lib:/usr/local/cuda/lib64:/usr/local/cuda/lib::/home/work/cuda-8.0/lib64:/home/work/cuda-8.0/lib:/home/HGCP_Program/software-install/hadoop-v2/hadoop/lib:/home/HGCP_Program/software-install/hadoop-v2/hadoop/libhce:/home/HGCP_Program/software-install/hadoop-v2/hadoop/libhdfs:/home/HGCP_Program/software-install/openmpi-1.8.5/lib:/home/work/cuda-8.0/lib64:/home/work/cuda-8.0/lib:/home/HGCP_Program/software-install/hadoop-v2/hadoop/lib:/home/HGCP_Program/software-install/hadoop-v2/hadoop/libhce:/home/HGCP_Program/software-install/hadoop-v2/hadoop/libhdfs:/home/HGCP_Program/software-install/openmpi-1.8.5/lib
配置完成后,尝试运行book中的02.recognize_digits时报错,下面是全部日志
I1114 14:56:59.516850 4275 Util.cpp:166] commandline: --use_gpu=1 --trainer_count=1
W1114 14:57:08.683694 4275 CpuId.h:112] PaddlePaddle wasn't compiled to use avx instructions, but these are available on your machine and could speed up CPU computations via CMAKE .. -DWITH_AVX=ON
[INFO 2017-11-14 14:57:08,688 layers.py:2539] output for __conv_pool_0___conv: c = 20, h = 24, w = 24, size = 11520
[INFO 2017-11-14 14:57:08,689 layers.py:2667] output for __conv_pool_0___pool: c = 20, h = 12, w = 12, size = 2880
[INFO 2017-11-14 14:57:08,690 layers.py:2539] output for __conv_pool_1___conv: c = 50, h = 8, w = 8, size = 3200
[INFO 2017-11-14 14:57:08,691 layers.py:2667] output for __conv_pool_1___pool: c = 50, h = 4, w = 4, size = 800
F1114 14:57:08.697180 4275 hl_gpu_matrix_kernel.cuh:181] Check failed: cudaSuccess == err (0 vs. 8) [hl_gpu_apply_unary_op failed] CUDA error: invalid device function
*** Check failure stack trace: ***
@ 0x7fe360c605ed google::LogMessage::Fail()
@ 0x7fe360c6409c google::LogMessage::SendToLog()
@ 0x7fe360c600e3 google::LogMessage::Flush()
@ 0x7fe360c655ae google::LogMessageFatal::~LogMessageFatal()
@ 0x7fe360aeaec4 hl_gpu_apply_unary_op<>()
@ 0x7fe360aeb205 paddle::BaseMatrixT<>::applyUnary<>()
@ 0x7fe360aeb433 paddle::BaseMatrixT<>::zero()
@ 0x7fe3609868d1 paddle::Parameter::enableType()
@ 0x7fe3609821cc paddle::parameterInitNN()
@ 0x7fe36098491a paddle::NeuralNetwork::init()
@ 0x7fe3609ad491 paddle::GradientMachine::create()
@ 0x7fe360c3d3b3 GradientMachine::createFromPaddleModelPtr()
@ 0x7fe360c3d58f GradientMachine::createByConfigProtoStr()
@ 0x7fe36084c4cd _wrap_GradientMachine_createByConfigProtoStr
@ 0x4b4cb9 PyEval_EvalFrameEx
@ 0x4b6b28 PyEval_EvalCodeEx
@ 0x4b5d10 PyEval_EvalFrameEx
@ 0x4b6b28 PyEval_EvalCodeEx
@ 0x4b5d10 PyEval_EvalFrameEx
@ 0x4b6b28 PyEval_EvalCodeEx
@ 0x52940f function_call
@ 0x422cba PyObject_Call
@ 0x4271ad instancemethod_call
@ 0x422cba PyObject_Call
@ 0x48121f slot_tp_init
@ 0x47eb1a type_call
@ 0x422cba PyObject_Call
@ 0x4b31dd PyEval_EvalFrameEx
@ 0x4b6b28 PyEval_EvalCodeEx
@ 0x4b5d10 PyEval_EvalFrameEx
@ 0x4b6b28 PyEval_EvalCodeEx
@ 0x4b6c52 PyEval_EvalCode
Aborted
之后尝试其他book例子,发现全部是这个报错,这个是什么原因?怎么解决?
The text was updated successfully, but these errors were encountered: