Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

使用gpu训练报错cudaSuccess == cudaStat (0 vs. 38) Cuda Error: no CUDA-capable device is detected #7804

Closed
HljWafer opened this issue Jan 24, 2018 · 11 comments
Labels
User 用于标记用户问题

Comments

@HljWafer
Copy link

HljWafer commented Jan 24, 2018

安装的 pip install paddlepaddle-gpu
cuda-7.5 cudnn5
运行官网例子时用cpu可以成功,切换成gpu会报错

I0124 08:39:10.899049  1337 Util.cpp:166] commandline:  --use_gpu=True
modprobe: FATAL: Module nvidia not found in directory /lib/modules/4.13.0-31-generic
F0124 08:39:10.926606  1337 hl_cuda_device.cc:453] Check failed: cudaSuccess == cudaStat (0 vs. 38) Cuda Error: no CUDA-capable device is detected

检查了下 cuda确实应该是安装成功了

nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:27:32_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17

运行的是官网的例子

import paddle.v2 as paddle

paddle.init(use_gpu=True, trainer_count=1)

x = paddle.layer.data(name='x', type=paddle.data_type.dense_vector(13))
y_predict = paddle.layer.fc(input=x, size=1, act=paddle.activation.Linear())

probs = paddle.infer(
    output_layer=y_predict,
    parameters=paddle.dataset.uci_housing.model(),
    input=[item for item in paddle.dataset.uci_housing.test()()])

for i in xrange(len(probs)):
    print 'Predicted price: ${:,.2f}'.format(probs[i][0] * 1000)
@helinwang
Copy link
Contributor

请问nvidia-smi的运行结果是什么?
nvcc -v只是检查编译器版本,并没有检查GPU驱动是否安装成功。

@HljWafer
Copy link
Author

  • @helinwang ,NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running. 已经失败了,我在安装cuda之前是可以看到GPU情况的,安装cuda时把之前安装的官方驱动卸载掉了,,然后安装了cuda7.5里的驱动。

@helinwang
Copy link
Contributor

应该是驱动没有安装成功。nvidia-smi只是检测GPU是否正常,跟cuda应该没有关系的(但是反过来,cuda正常运行需要驱动正常运行)。
可以试试重启,如果还是不行装一个最新版本的驱动吧,应该也是支持CUDA 7.5的。

@Yancey1989 Yancey1989 added the User 用于标记用户问题 label Jan 24, 2018
@HljWafer
Copy link
Author

HljWafer commented Jan 24, 2018

@helinwang 折腾了半天,环境已经初始化好了, 现在 报另外一个错,
hl_gpu_matrix_kernel.cuh:181] Check failed: cudaSuccess == err (0 vs. 8) [hl_gpu_apply_unary_op failed] CUDA error: invalid device function
*** Check failure stack trace: ***
@ 0x7f5e5341fbcd google::LogMessage::Fail()
@ 0x7f5e5342367c google::LogMessage::SendToLog()
@ 0x7f5e5341f6f3 google::LogMessage::Flush()
@ 0x7f5e53424b8e google::LogMessageFatal::~LogMessageFatal()
@ 0x7f5e532943eb hl_gpu_apply_unary_op<>()
@ 0x7f5e5329475d paddle::BaseMatrixT<>::applyUnary<>()
@ 0x7f5e532949a3 paddle::BaseMatrixT<>::zero()
@ 0x7f5e53152b99 ZNSt17_Function_handlerIFviPN6paddle9ParameterEEZNS0_15GradientMachine6createERKNS0_11ModelConfigEiRKSt6vectorINS0_19enumeration_wrapper13ParameterTypeESaISA_EEEUliS2_E_E9_M_invokeERKSt9_Any_dataiS2
@ 0x7f5e5312958e paddle::NeuralNetwork::init()
@ 0x7f5e53152eaf paddle::GradientMachine::create()
@ 0x7f5e533fc495 GradientMachine::createFromPaddleModelPtr()
@ 0x7f5e533fc67f GradientMachine::createByConfigProtoStr()
@ 0x7f5e52fd9717 _wrap_GradientMachine_createByConfigProtoStr
@ 0x558df22e399e PyEval_EvalFrameEx
@ 0x558df22dab3a PyEval_EvalCodeEx
@ 0x558df22e2e1f PyEval_EvalFrameEx
@ 0x558df22dab3a PyEval_EvalCodeEx
@ 0x558df22e282e PyEval_EvalFrameEx
@ 0x558df22dab3a PyEval_EvalCodeEx

nvidia-smi
Wed Jan 24 15:43:37 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.111 Driver Version: 384.111 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 105... Off | 00000000:01:00.0 On | N/A |
| 24% 44C P0 ERR! / 75W | 160MiB / 4030MiB | 11% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 761 G /usr/lib/xorg/Xorg 158MiB |
+-----------------------------------------------------------------------------+

@typhoonzero
Copy link
Contributor

Please refer to:
#5629
#4439

@HljWafer
Copy link
Author

@typhoonzero 那两个 我都看了 , 一个是 最后 用了cuda8, 但是 paddle文档上写的是7.5 , 直接pip下来的版本 支持 8吗? 另外一个GPU架构的bug 都不太一样。

@Yancey1989
Copy link
Contributor

@HljWafer
Copy link
Author

@Yancey1989 @typhoonzero @helinwang , 为了解决问题, 我安装了最新的 paddle 0.11, cuda8 和cudnn5 但又出现了 新的报错
hl_cuda_cudnn.cc:171] Check failed: (cudnn_cuh_major < 4 && cudnn_dso_major < 4) || (cudnn_cuh_major == cudnn_dso_major) [cudnn init] libcudnn v5 with header v7 unmatched!
PaddlePaddle Requirement: (header v[2-3] with libcudnn v[2-3]) Or (header v4 with libcudnn v4) Or (header v5 with libcudnn v5) Or(header v6 with libcudnn v6).
header是什么? 上面的意思是 我需要把header降到和cudnn一个等级吗?

@HljWafer
Copy link
Author

HljWafer commented Jan 25, 2018

cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
define CUDNN_MAJOR 5
define CUDNN_MINOR 0
define CUDNN_PATCHLEVEL 5

define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

include "driver_types.h"

@HljWafer
Copy link
Author

升级 cudnn到7 ,问题解决

@jchua127
Copy link

升级 cudnn到7 ,问题解决

你好,cudnn升级到7,会和cuda8冲突吗?我看官网上cudnn7是对于cuda10的。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
User 用于标记用户问题
Projects
None yet
Development

No branches or pull requests

5 participants