Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paddle预测在P4机器上运行出错 #3206

Closed
xieshufu opened this issue Aug 3, 2017 · 6 comments
Closed

Paddle预测在P4机器上运行出错 #3206

xieshufu opened this issue Aug 3, 2017 · 6 comments
Labels
User 用于标记用户问题

Comments

@xieshufu
Copy link

xieshufu commented Aug 3, 2017

在一台P4机器上想测试一下GPU卡上程序运行的速度,但运行出错。出错信息如下:
terminate called after throwing an instance of 'char const*'
*** Aborted at 1501747892 (unix time) try "date -d @1501747892" if you are using GNU date ***
PC: @ 0x7f1a5272a3f7 __GI_raise
*** SIGABRT (@0x1f80000357a) received by PID 13690 (TID 0x7f1a55c04900) from PID 13690; stack trace: ***
@ 0x7f1a555dd160 (unknown)
@ 0x7f1a5272a3f7 __GI_raise
@ 0x7f1a5272b7d8 __GI_abort
@ 0x7f1a53228c65 __gnu_cxx::__verbose_terminate_handler()
@ 0x7f1a53226e06 __cxxabiv1::__terminate()
@ 0x7f1a53225ec9 __cxa_call_terminate
@ 0x7f1a53226a7a __gxx_personality_v0
@ 0x7f1a52ab1853 _Unwind_RaiseException_Phase2
@ 0x7f1a52ab1d87 _Unwind_Resume
@ 0xe1efc5 google::LogMessageFatal::~LogMessageFatal()
@ 0xcaf9ce hl_gpu_apply_unary_op<>()
@ 0xcafd3d paddle::BaseMatrixT<>::applyUnary<>()
@ 0xcaffb3 paddle::BaseMatrixT<>::zero()
@ 0xbe6fe8 ZNSt17_Function_handlerIFviPN6paddle9ParameterEEZNS0_17PredictorInternal4initEPKcS6_bEUliS2_E_E9_M_invokeERKSt9_Any_dataiS2
@ 0xb0dc16 paddle::NeuralNetwork::init()
@ 0xbe6173 paddle::PredictorInternal::init()
@ 0xbe6cdc paddle::PredictorInternal::init()
@ 0xbe83e9 paddle::Predictor::init()
@ 0x7f90f1 inference_seq::init_paddle()
@ 0x76ce65 CSeqLEngWordRegModel::seqLCDNNModelTableInitUTF8()
@ 0x7a6769 ns_seq_ocr_model::CSeqModel::init_chn_recog_model()
@ 0x78c5ef nsseqocrchn::init_model_paddle_deepcnn()
@ 0x48205a main
@ 0x7f1a52716bd5 __libc_start_main
@ 0x499451 (unknown)

P4机器上Cuda7.5和CudnnV3版本,已尝试过在该台机器上编译PADDLE预测库,可以编译成功。
难道是需要用在该台机器上编译的库libpaddle_predictor_cuda75_cudnn3_P4.a 来重新编译可执行程序?

@typhoonzero typhoonzero added GPU Driver User 用于标记用户问题 labels Aug 3, 2017
@typhoonzero
Copy link
Contributor

咨询了 @hedaoyuan P4应该是需要CUDA8的,可以用CUDA8编译下试试。

@xieshufu
Copy link
Author

xieshufu commented Aug 4, 2017

我尝试用Cuda8.0_cudnnv5编译了一个PADDLE版本,然后在P4机器上进行预测,发现仍然有问题。 terminate called after throwing an instance of 'char const*'
*** Aborted at 1501814869 (unix time) try "date -d @1501814869" if you are using GNU date ***
PC: @ 0x7f5d0d01b3f7 __GI_raise
*** SIGABRT (@0x1f800002e64) received by PID 11876 (TID 0x7f5d104f5900) from PID 11876; stack trace: ***
@ 0x7f5d0fece160 (unknown)
@ 0x7f5d0d01b3f7 __GI_raise
@ 0x7f5d0d01c7d8 __GI_abort
@ 0x7f5d0db19c65 __gnu_cxx::__verbose_terminate_handler()
@ 0x7f5d0db17e06 __cxxabiv1::__terminate()
@ 0x7f5d0db16ec9 __cxa_call_terminate
@ 0x7f5d0db17a7a __gxx_personality_v0
@ 0x7f5d0d3a2853 _Unwind_RaiseException_Phase2
@ 0x7f5d0d3a2d87 _Unwind_Resume
@ 0xe216d5 google::LogMessageFatal::~LogMessageFatal()
@ 0xa829a2 hl_cudnn_init()
@ 0xa8c815 hl_create_global_resources()
@ 0xa8d1d1 hl_specify_devices_start()
@ 0xa8d63d hl_start()
@ 0xbe601a paddle::PredictorInternal::initHppl()
@ 0xbe861d paddle::Predictor::init()
@ 0x7f9281 inference_seq::init_paddle()
@ 0x76cd05 CSeqLEngWordRegModel::seqLCDNNModelTableInitUTF8()
@ 0x7a6969 ns_seq_ocr_model::CSeqModel::init_chn_recog_model()
@ 0x78c48f nsseqocrchn::init_model_paddle_deepcnn()
@ 0x48205a main
@ 0x7f5d0d007bd5 __libc_start_main
@ 0x499421 (unknown)

@typhoonzero
Copy link
Contributor

Maybe related to #158 and #34

诡异的是用了cuda8之后挂在了hl_cudnn_init(),cuda 7.5的时候和上面的两个issue现象比较像。

@xieshufu
Copy link
Author

xieshufu commented Aug 4, 2017

测试的是检测识别模块, 检测用的是caffe库(用Cuda8.0_Cudnnv3编译),识别用的是PADDLE库(用Cuda8.0_Cudnnv3编译),在对齐版本后测试了一下程序,还是出错:

  • SIGABRT (@0x1f800000a48) received by PID 2632 (TID 0x7f6d25841900) from PID 2632; stack trace: ***
    @ 0x7f6d2521a160 (unknown)
    @ 0x7f6d223813f7 __GI_raise
    @ 0x7f6d223827d8 __GI_abort
    @ 0x7f6d22e7fc65 __gnu_cxx::__verbose_terminate_handler()
    @ 0x7f6d22e7de06 __cxxabiv1::__terminate()
    @ 0x7f6d22e7cec9 __cxa_call_terminate
    @ 0x7f6d22e7da7a __gxx_personality_v0
    @ 0x7f6d22708853 _Unwind_RaiseException_Phase2
    @ 0x7f6d22708d87 _Unwind_Resume
    @ 0xe22b85 google::LogMessageFatal::~LogMessageFatal()
    @ 0xa827f2 hl_cudnn_init()
    @ 0xa8de15 hl_create_global_resources()
    @ 0xa8e7d1 hl_specify_devices_start()
    @ 0xa8ec3d hl_start()
    @ 0xbe761a paddle::PredictorInternal::initHppl()
    @ 0xbe9c1d paddle::Predictor::init()
    @ 0x7f9261 inference_seq::init_paddle()
    @ 0x76cce5 CSeqLEngWordRegModel::seqLCDNNModelTableInitUTF8()
    @ 0x7a6949 ns_seq_ocr_model::CSeqModel::init_chn_recog_model()
    @ 0x78c46f nsseqocrchn::init_model_paddle_deepcnn()
    @ 0x48203a main
    @ 0x7f6d2236dbd5 __libc_start_main
    @ 0x499401 (unknown)

同第二个是一致的。

@hedaoyuan
Copy link
Contributor

@xieshufu 这个调用栈到hl_cudnn_init后基本看不出来是啥了,编译一个Debug版本试一下(cmake 中添加-DCMAKE_BUILD_TYPE=Debug)。另外,试一下use_gpu=false,看一下cpu的逻辑是否正确?

@xieshufu
Copy link
Author

后来在P4机器上编译了PADDLE预测库, 并在预测程序中将该预测库link进来,发现程序可以正常预测了。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
User 用于标记用户问题
Projects
None yet
Development

No branches or pull requests

3 participants