Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

在brpc接口内部core,但是使用gdb分析时遇到问题 #165

Closed
adanteng opened this Issue Dec 21, 2017 · 20 comments

Comments

Projects
None yet
5 participants
@adanteng
Copy link

adanteng commented Dec 21, 2017

#include <string>
std::stof("a");

在brpc接口内部使用该方法,导致服务core。通过分析core有办法能直接定位到哪一行出现的问题吗?

使用gdb打开core文件,bt后得到下面的信息:

(gdb) bt
#0  0x00007f98eb2b11f7 in raise () from /lib64/libc.so.6
#1  0x00007f98eb2b28e8 in abort () from /lib64/libc.so.6
#2  0x0000000000a7a6b5 in __gnu_cxx::__verbose_terminate_handler() ()
#3  0x0000000000a23406 in __cxxabiv1::__terminate(void (*)()) ()
#4  0x0000000000a7a159 in __cxa_call_terminate ()
#5  0x0000000000a232f4 in __gxx_personality_v0 ()
#6  0x0000000000a7eeb3 in _Unwind_RaiseException_Phase2 ()
#7  0x0000000000a7f6a7 in _Unwind_Resume ()
#8  0x00000000007bdd2a in operator() (this=<optimized out>, obj=<optimized out>) at ./src/brpc/destroyable.h:33
#9  ~unique_ptr (this=<synthetic pointer>, __in_chrg=<optimized out>) at /usr/include/c++/4.8.2/bits/unique_ptr.h:184
#10 ~DestroyingPtr (this=<synthetic pointer>, __in_chrg=<optimized out>) at ./src/brpc/destroyable.h:39
#11 brpc::policy::ProcessRpcRequest (msg_base=<optimized out>) at src/brpc/policy/baidu_rpc_protocol.cpp:508
#12 0x00000000008df7fa in brpc::ProcessInputMessage (void_arg=void_arg@entry=0x7f98a8021930) at src/brpc/input_messenger.cpp:132
#13 0x00000000008e07e4 in operator() (this=<optimized out>, last_msg=0x7f98a8021930) at src/brpc/input_messenger.cpp:138
#14 brpc::InputMessenger::OnNewMessages (m=0x7f987401ac80) at /usr/include/c++/4.8.2/bits/unique_ptr.h:184
#15 0x00000000008d04ed in brpc::Socket::ProcessEvent (arg=0x7f987401ac80) at src/brpc/socket.cpp:1049
#16 0x000000000071c1f4 in bthread::TaskGroup::task_runner (skip_remained=<optimized out>) at src/bthread/task_group.cpp:291
#17 0x0000000000834791 in bthread_make_fcontext ()
#18 0x0000000000000000 in ?? ()


@adanteng

This comment has been minimized.

Copy link
Author

adanteng commented Dec 21, 2017

我用下面会引起core的代码试验了下:

+char *str;
+str = "GfG";
+*(str+1) = 'n';

会给出业务代码的位置。像stof这种throw std::invalid_argument异常的情况我在非brpc环境下也试验了一下。是可以正常提示具体core位置的。

@jamesge

This comment has been minimized.

Copy link
Contributor

jamesge commented Dec 22, 2017

你可以跑下asan

@adanteng

This comment has been minimized.

Copy link
Author

adanteng commented Dec 22, 2017

asan是内存监测工具,您的意思是stof这种,因为传入字符串导致的core,能监测出来?

我先用下试试

@jamesge

This comment has been minimized.

Copy link
Contributor

jamesge commented Dec 22, 2017

如果你不知道为什么core,asan是帮助找出可能有内存问题的地方。如果你知道那句话铁定crash,但coredump显示位置不准,一般是开了优化的关系。

@adanteng

This comment has been minimized.

Copy link
Author

adanteng commented Dec 22, 2017

我在下面的函数中,在CallMethod之前直接std::stof("a"),core文件中提示的位置和最上面代码段中的一致。

#11 brpc::policy::ProcessRpcRequest (msg_base=<optimized out>) at src/brpc/policy/baidu_rpc_protocol.cpp:508

所以,可能是编译brpc的时候,增加了优化项?

CXXFLAGS=$(CPPFLAGS) -O2 -g -rdynamic -pipe -Wall -W -fPIC -fstrict-aliasing -Wno-invalid-offsetof -Wno-unused-parameter -fno-omit-frame-pointer -std=c++0x

我尝试将 -O2 去掉,重新编译实验下

@adanteng

This comment has been minimized.

Copy link
Author

adanteng commented Dec 22, 2017

不是O2的问题,增加-g,brpc内部函数调用的堆栈信息已经打印出来,可以调试。

不过brpc接口内部的业务代码 throw exception,导致服务core掉,这个core文件,bt后丢掉了业务代码的堆栈。这个问题是什么原因那?

@jamesge

This comment has been minimized.

Copy link
Contributor

jamesge commented Dec 22, 2017

O2会影响bt的准确度

@adanteng

This comment has been minimized.

Copy link
Author

adanteng commented Dec 22, 2017

我去掉了O2,但是rpc接口内部的函数调用栈确实是没有体现在core文件当中。这个事是为啥呢?

@jamesge

This comment has been minimized.

Copy link
Contributor

jamesge commented Dec 22, 2017

这是不可能的,说明没有去全。

@adanteng

This comment has been minimized.

Copy link
Author

adanteng commented Dec 22, 2017

只有下面的堆栈信息:

(gdb) bt
#0  0x00007fc23f2261f7 in raise () from /lib64/libc.so.6
#1  0x00007fc23f2278e8 in abort () from /lib64/libc.so.6
#2  0x00000000010e0235 in __gnu_cxx::__verbose_terminate_handler() ()
#3  0x0000000001089e56 in __cxxabiv1::__terminate(void (*)()) ()
#4  0x00000000010dfcd9 in __cxa_call_terminate ()
#5  0x0000000001089d44 in __gxx_personality_v0 ()
#6  0x00000000010e58e3 in _Unwind_RaiseException_Phase2 ()
#7  0x00000000010e60d7 in _Unwind_Resume ()
#8  0x0000000000da5d75 in brpc::policy::ProcessRpcRequest (msg_base=0x7fc208022700) at src/brpc/policy/baidu_rpc_protocol.cpp:508

如果是我自己非rpc服务,throw exception导致的core,core文件会明确指出哪里出问题

@adanteng

This comment has been minimized.

Copy link
Author

adanteng commented Dec 22, 2017

brpc怎么修改Makefile,才能保证业务内部堆栈的输出呢?

@adanteng

This comment has been minimized.

Copy link
Author

adanteng commented Dec 22, 2017

我也构造了segmentfault类型的错误在rpc的接口中,类似我之前说的:

+char *str;
+str = "GfG";
+*(str+1) = 'n';

这种报错,在core文件中,是有明确体现的,能直接定位问题的位置

@stale

This comment has been minimized.

Copy link

stale bot commented Mar 20, 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. 由于最近缺乏更新,这个issue已被自动标记为过期。如果接下来几天仍没有更新,它将会被关闭。感谢你的贡献。

@stale stale bot added the wontfix label Mar 20, 2018

@stale stale bot closed this Mar 27, 2018

@kenshinxf

This comment has been minimized.

Copy link

kenshinxf commented Apr 10, 2018

我这里有类似的问题, 只不过是我自己实现的thrift协议的时候, 如果上游传来的协议数据有问题, server在解析失败的情况下会抛异常, 现场和这几基本一样.

@adanteng

This comment has been minimized.

Copy link
Author

adanteng commented Apr 15, 2018

基于brpc构建的应用程序,每个请求是一个bthread,bthread调用的应用程序的方法 throw exception,并且没有catch,导致exception被抛出,在栈回退的时候会包含brpc的各种对象析构,我推测可能是brpc在这块没有处理好。

@d0ngjun

This comment has been minimized.

Copy link

d0ngjun commented Aug 14, 2018

@adanteng 遇到了和你一样的问题,call stack也是一样的,实际都是在业务层导致的crash

@jamesge

This comment has been minimized.

Copy link
Contributor

jamesge commented Aug 15, 2018

google和baidu的代码规范都不允许使用异常,所以用户callback里抛出异常默认是不支持的。后面在thrift中由于抛异常是常态,所以做了特殊支持

@d0ngjun

This comment has been minimized.

Copy link

d0ngjun commented Aug 15, 2018

@jamesge 我同意尽量不适用异常,但是在使用第三方库的时候,难免会有未捕获的异常,这种情况下coredump应该体现导致crash的具体位置。不知道brpc是做了什么处理吗?

另,“用户callback里抛出异常默认是不支持的”是什么意思?

@scottzzq

This comment has been minimized.

Copy link

scottzzq commented Mar 21, 2019

我也遇到类似的问题,跟了一下gcc5.2源码
image
这个地方 fs.personality函数指针是NULL,所以在业务代码抛异常的时候,进程不会挂掉,但是在栈回退到brpc内部的地方fs.personality这个函数指针指向terminal函数,最终执行了abort,导致进程挂掉。

设置personality指针的代码如下:
image

麻烦 @jamesge 看下,谢谢!

@scottzzq

This comment has been minimized.

Copy link

scottzzq commented Mar 21, 2019

执行到brpc内部挂掉:
#0 0x00007f0621a605f7 in raise () from /usr/lib64/libc.so.6
#1 0x00007f0621a61ce8 in abort () from /usr/lib64/libc.so.6
#2 0x0000000000489f11 in myterminate () at brpc/example/echo_c++/server.cpp:322
#3 0x0000000000b651e6 in __cxxabiv1::__terminate (handler=) at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:47
#4 0x0000000000bf0c09 in __cxa_call_terminate (ue_header=ue_header@entry=0x7f060c047570) at ../../../../libstdc++-v3/libsupc++/eh_call.cc:54
#5 0x0000000000b64a05 in __cxxabiv1::__gxx_personality_v0 (version=, actions=, exception_class=5138137972254386944,
ue_header=, context=0x7f05fdfea8e0) at ../../../../libstdc++-v3/libsupc++/eh_personality.cc:676
#6 0x0000000000bf9023 in _Unwind_RaiseException_Phase2 (exc=exc@entry=0x7f060c047570, context=context@entry=0x7f05fdfea8e0) at ../../../libgcc/unwind.inc:62
#7 0x0000000000bf9877 in _Unwind_Resume (exc=exc@entry=0x7f060c047570) at ../../../libgcc/unwind.inc:230
#8 0x0000000000508036 in operator() (this=, obj=) at brpc/src/brpc/destroyable.h:33
#9 ~unique_ptr (this=, __in_chrg=) at /usr/include/c++/5.2.0/bits/unique_ptr.h:236
#10 ~DestroyingPtr (this=, __in_chrg=) at brpc/src/brpc/destroyable.h:39
#11 brpc::policy::ProcessRpcRequest (msg_base=) at brpc/src/brpc/policy/baidu_rpc_protocol.cpp:333
#12 0x000000000055b887 in brpc::ProcessInputMessage (void_arg=void_arg@entry=0x7f060c036b20) at brpc/src/brpc/input_messenger.cpp:133
#13 0x000000000055c7c8 in operator() (this=, last_msg=0x7f060c036b20) at brpc/src/brpc/input_messenger.cpp:139
#14 brpc::InputMessenger::OnNewMessages (m=0x7f05ec01ac80) at /usr/include/c++/5.2.0/bits/unique_ptr.h:236
#15 0x00000000004a6fed in brpc::Socket::ProcessEvent (arg=0x7f05ec01ac80) at brpc/src/brpc/socket.cpp:1079
#16 0x0000000000609694 in bthread::TaskGroup::task_runner (skip_remained=) at brpc/src/bthread/task_group.cpp:293
#17 0x00000000005f10e1 in bthread_make_fcontext ()
#18 0x00010102464c457f in ?? ()
#19 0x0000000000000000 in ?? ()

业务执行map.at,抛异常,栈回退的过程完全正常,本来应该在_Unwind_RaiseException_Phase2这个函数中执行fs.personality就会挂掉,但是这个指针为NULL

(gdb) bt
#0 _Unwind_RaiseException_Phase2 (exc=exc@entry=0x7fffd003d5b0, context=context@entry=0x7fffcb3ec780) at ../../../libgcc/unwind.inc:40
#1 0x0000000000bf9877 in _Unwind_Resume (exc=exc@entry=0x7fffd003d5b0) at ../../../libgcc/unwind.inc:230
#2 0x000000000048c0d4 in ~_Rb_tree (this=0x7fffcb3ec940, __in_chrg=) at /usr/include/c++/5.2.0/bits/stl_tree.h:858
#3 ~map (this=0x7fffcb3ec940, __in_chrg=) at /usr/include/c++/5.2.0/bits/stl_map.h:96
#4 example::EchoServiceImpl::Echo (this=, cntl_base=, request=0x7fffd0039ae0, response=0x7fffd0039c38, done=0x7fffd003d4d0)
at brpc/example/echo_c++/server.cpp:254
#5 0x000000000043a075 in example::EchoService::CallMethod (this=, method=, controller=, request=,
response=, done=) at build64_release/brpc/example/echo_c++/echo.pb.cc:675
#6 0x0000000000507d39 in brpc::policy::ProcessRpcRequest (msg_base=0x7fffd002cb20) at brpc/src/brpc/policy/baidu_rpc_protocol.cpp:553
#7 0x000000000055b887 in brpc::ProcessInputMessage (void_arg=void_arg@entry=0x7fffd002cb20) at brpc/src/brpc/input_messenger.cpp:133
#8 0x000000000055c7c8 in operator() (this=, last_msg=0x7fffd002cb20) at brpc/src/brpc/input_messenger.cpp:139
#9 brpc::InputMessenger::OnNewMessages (m=0x7fffcc01ac80) at /usr/include/c++/5.2.0/bits/unique_ptr.h:236
#10 0x00000000004a6fed in brpc::Socket::ProcessEvent (arg=0x7fffcc01ac80) at brpc/src/brpc/socket.cpp:1079
#11 0x0000000000609694 in bthread::TaskGroup::task_runner (skip_remained=) at brpc/src/bthread/task_group.cpp:293
#12 0x00000000005f10e1 in bthread_make_fcontext ()
#13 0x00010102464c457f in ?? ()
#14 0x0000000000000000 in ?? ()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.