Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C-API库进行gpu多线程inference时,出现cublas status: not initialized错误 #5669

Closed
QingshuChen opened this issue Nov 15, 2017 · 5 comments
Labels
User 用于标记用户问题

Comments

@QingshuChen
Copy link
Contributor

QingshuChen commented Nov 15, 2017

从网站上下载的cuda 8.0的libpaddle_capi_shared.so, 使用一个全连接的网络,在进行forward时,出现如下错误:

F1115 15:36:56.346094 17370 hl_cuda_cublas.cc:307] Check failed: stat == CUBLAS_STATUS_SUCCESS (1 vs. 0) [cublas status]: not initialized
*** Check failure stack trace: ***
    @     0x7f8ddce75bcd  google::LogMessage::Fail()
    @     0x7f8ddce7967c  google::LogMessage::SendToLog()
    @     0x7f8ddce756f3  google::LogMessage::Flush()
    @     0x7f8ddce7ab8e  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f8ddd265147  hl_matrix_mul()
    @     0x7f8ddce7ab8e  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f8ddd265147  hl_matrix_mul()
    @     0x7f8ddd0c3987  paddle::GpuMatrix::mul()
    @     0x7f8ddd0c3987  paddle::GpuMatrix::mul()
    @     0x7f8ddd0c3fe1  paddle::GpuMatrix::mul()
    @     0x7f8ddcfc5ffb  paddle::FullyConnectedLayer::forward()
    @     0x7f8ddd0c3fe1  paddle::GpuMatrix::mul()
    @     0x7f8ddcfc5ffb  paddle::FullyConnectedLayer::forward()
    @     0x7f8ddcedf95d  paddle::NeuralNetwork::forward()
    @     0x7f8ddce6f6b6  paddle_gradient_machine_forward
    @     0x7f8ddcedf95d  paddle::NeuralNetwork::forward()
    @           0xc52da7  recarch::paddle::PaddlePredictor::predict()
    @     0x7f8ddce6f6b6  paddle_gradient_machine_forward
    @           0x7e20f1  rec::predictor::CtrDnnEngineV4::calc_quality_callback()
    @           0xc52da7  recarch::paddle::PaddlePredictor::predict()
    @           0x7998a1  rec::predictor::BaseEngine::handle()
    @           0x79e646  rec::predictor::PredictTask::run()
    @           0x7e20f1  rec::predictor::CtrDnnEngineV4::calc_quality_callback()
    @           0x7a63cd  rec::predictor::WorkerThread::run()
    @           0x7998a1  rec::predictor::BaseEngine::handle()
    @           0x79e646  rec::predictor::PredictTask::run()
    @           0xf4c6aa  thread_proxy
    @           0x7a63cd  rec::predictor::WorkerThread::run()
    @           0xf4c6aa  thread_proxy
    @     0x7f8de08021c3  start_thread
    @     0x7f8de08021c3  start_thread
    @     0x7f8ddc0da12d  __clone
    @     0x7f8ddc0da12d  __clone
    @              (nil)  (unknown)

请问这个是什么原因?怎么解决?

hl_matrix_mul函数报错代码在:
stat = CUBLAS_GEMM(t_resource.handle,
CUBLAS_OP_N,
CUBLAS_OP_N,
dimN,
dimM,
dimK,
&alpha,
B_d,
ldb,
A_d,
lda,
&beta,
C_d,
ldc);
其中t_resource.handle是一个空指针,原因是t_resource是一个thread_local的变量,没有初始化。

@guoshengCS guoshengCS added the User 用于标记用户问题 label Nov 15, 2017
@QingshuChen QingshuChen changed the title C-API库进行inference时,出现cublas status: not initialized错误 C-API库进行gpu多线程inference时,出现cublas status: not initialized错误 Nov 16, 2017
@Xreki
Copy link
Contributor

Xreki commented Nov 16, 2017

paddle_init怎么调用的?

@QingshuChen
Copy link
Contributor Author

@Xreki
command = {"--use_gpu=True", "--gpu_id=0"}
paddle_init(2, comand)

@Xreki
Copy link
Contributor

Xreki commented Nov 16, 2017

paddle_init看起来没有问题,这可能是多线程的一个bug,主线程之外的其他线程的gpu资源没有初始化导致。

@hedaoyuan
Copy link
Contributor

We need to add a paddle_init_cuda interface to inference API.

@Xreki
Copy link
Contributor

Xreki commented Nov 20, 2017

@QingshuChen I create a PR #5773 to fix it. Please help to check. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
User 用于标记用户问题
Projects
None yet
Development

No branches or pull requests

4 participants