Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TSeerServer不稳定 core #12

Open
troycheng opened this issue Jun 20, 2018 · 2 comments
Open

TSeerServer不稳定 core #12

troycheng opened this issue Jun 20, 2018 · 2 comments

Comments

@troycheng
Copy link
Contributor

troycheng commented Jun 20, 2018

目前试用中还是会遇到 TSeerServer 不稳定 core 的问题,目前还没有找出稳定复现的条件,会继续尝试。
现将 core 信息贴出,期望能获得一些有效信息,最终解决这个问题:

core文件1:

#0  0x00007f95ce5ccec5 in _IO_vfscanf_internal () from /lib64/libc.so.6
#1  0x00007f95ce5e1685 in vsscanf () from /lib64/libc.so.6
#2  0x00007f95ce5db6e8 in sscanf () from /lib64/libc.so.6
#3  0x00007f95ce615152 in __tzset_parse_tz () from /lib64/libc.so.6
#4  0x00007f95ce61632e in __tzfile_compute () from /lib64/libc.so.6
#5  0x00007f95ce615cb7 in __tz_convert () from /lib64/libc.so.6
#6  0x000000000046eba1 in tars::TC_Logger<tars::RollWriteT, tars::TC_RollBySize>::stream(int) ()
    at /home/tcheng/tools/TSeer/thirdparty/tars/include/util/tc_logger.h:727
#7  0x000000000046ed3e in tars::TC_Logger<tars::RollWriteT, tars::TC_RollBySize>::debug() ()
    at /home/tcheng/tools/TSeer/thirdparty/tars/include/util/tc_logger.h:689
#8  0x00000000004957ec in RequestEtcdCallback::responseClient(int, long, std::vector<Tseer::RouterData, std::allocator<Tseer::RouterData> > const&) () at /home/tcheng/tools/TSeer/TseerServer/src/RequestEtcdCallback.cpp:1299
#9  0x00000000004a1b8c in RequestEtcdCallback::onResponse(bool, tars::TC_HttpResponse&) ()
    at /home/tcheng/tools/TSeer/TseerServer/src/RequestEtcdCallback.cpp:168
#10 0x0000000000609a81 in tars::TC_HttpAsync::AsyncRequest::doReceive() ()
    at /home/tcheng/tools/TSeer/build/Tars/cpp/util/src/tc_http_async.cpp:264
#11 0x0000000000609be1 in tars::TC_HttpAsync::process(tars::TC_AutoPtr<tars::TC_HttpAsync::AsyncRequest>&, int) ()
    at /home/tcheng/tools/TSeer/build/Tars/cpp/util/src/tc_http_async.cpp:462
#12 0x0000000000609d2d in tars::TC_HttpAsync::run() () at /home/tcheng/tools/TSeer/build/Tars/cpp/util/src/tc_http_async.cpp:505
#13 0x0000000000610acf in tars::TC_ThreadPool::ThreadWorker::run() ()
    at /home/tcheng/tools/TSeer/build/Tars/cpp/util/src/tc_thread_pool.cpp:60
#14 0x00000000005f7e7a in tars::TC_Thread::threadEntry(tars::TC_Thread*) ()
    at /home/tcheng/tools/TSeer/build/Tars/cpp/util/src/tc_thread.cpp:93
#15 0x00007f95cf0b6aa1 in start_thread () from /lib64/libpthread.so.0
#16 0x00007f95ce660aad in clone () from /lib64/libc.so.6

core 文件2:

#0  std::basic_string<char, std::char_traits<char>, std::allocator<char> > std::operator+<char, std::char_traits<char>, std::allocator<char> >(char const*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
    at /usr/local/include/c++/4.8.3/bits/basic_string.h:716
#1  0x00000000004a314a in EtcdReqStr(EtcdReqestInfo const&) () at /home/tcheng/tools/TSeer/TseerServer/src/RequestEtcdCallback.h:284
#2  0x00000000004957af in RequestEtcdCallback::responseClient(int, long, std::vector<Tseer::RouterData, std::allocator<Tseer::RouterData> > const&) () at /home/tcheng/tools/TSeer/TseerServer/src/RequestEtcdCallback.cpp:1299
#3  0x0000000000499291 in RequestEtcdCallback::doGetSeerAgentResponse(rapidjson::GenericDocument<rapidjson::UTF8<char>, rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator>, rapidjson::CrtAllocator> const&) ()
    at /home/tcheng/tools/TSeer/TseerServer/src/RequestEtcdCallback.cpp:316
#4  0x00000000004a207b in RequestEtcdCallback::onResponse(bool, tars::TC_HttpResponse&) ()
    at /home/tcheng/tools/TSeer/TseerServer/src/RequestEtcdCallback.cpp:120
#5  0x0000000000609a81 in tars::TC_HttpAsync::AsyncRequest::doReceive() ()
    at /home/tcheng/tools/TSeer/build/Tars/cpp/util/src/tc_http_async.cpp:264
#6  0x0000000000609be1 in tars::TC_HttpAsync::process(tars::TC_AutoPtr<tars::TC_HttpAsync::AsyncRequest>&, int) ()
    at /home/tcheng/tools/TSeer/build/Tars/cpp/util/src/tc_http_async.cpp:462
#7  0x0000000000609d2d in tars::TC_HttpAsync::run() () at /home/tcheng/tools/TSeer/build/Tars/cpp/util/src/tc_http_async.cpp:505
#8  0x0000000000610acf in tars::TC_ThreadPool::ThreadWorker::run() ()
    at /home/tcheng/tools/TSeer/build/Tars/cpp/util/src/tc_thread_pool.cpp:60
#9  0x00000000005f7e7a in tars::TC_Thread::threadEntry(tars::TC_Thread*) ()
    at /home/tcheng/tools/TSeer/build/Tars/cpp/util/src/tc_thread.cpp:93
#10 0x00007f0febe71aa1 in start_thread () from /lib64/libpthread.so.0
#11 0x00007f0feb41baad in clone () from /lib64/libc.so.6

入口都是 tars::TC_HttpAsync::AsyncRequest::doReceive(),中间路径略有由不同,看起来是字符串的处理上有点儿问题,但简单 debug 了一下没有找出原因。

@troycheng
Copy link
Contributor Author

/home/tcheng/tools/TSeer/TseerServer/src/RequestEtcdCallback.cpp:1299 就是一个日志输出语句:

ETCDPROC_LOG << ETCDFILE_FUN << "|response,ret= " << ret << "|retryTime= " << retryTime << endl;

其中ETCDFILE_FUN是一个宏定义:

#define ETCDFILE_FUN FILE_FUN <<EtcdReqStr(_etcdReqInfo)<<"|"

而EtcdReqStr这个函数,就是一个字符串拼接:

273 inline string EtcdReqStr(const EtcdReqestInfo& etcdReqInfo)
274 {
275     string client;
276     if (etcdReqInfo.current)
277     {
278         client = etcdReqInfo.current->getIp();
279     }
280     else
281     {
282         client = "NULL";
283     }
284     return "client=" + client + "|" + \
285             ActionStr(etcdReqInfo.etcdAction) + "|" +\
286             etcdReqInfo.moduletype + "." + \
287             etcdReqInfo.application + "." + \
288             etcdReqInfo.service_name + "|" + \
289             etcdReqInfo.node_name + "_" + \
290             etcdReqInfo.container_name + "|etcdhost=" + \
291             etcdReqInfo.etcdHost + ":" + \
292             TC_Common::tostr(etcdReqInfo.etcdPort) + "|" + \
293             MSTIMEINSTR(etcdReqInfo.startTime);
294 }

其中etcdReqInfo.current->getIp();也是直接返回_ip这个字符串,这个字符串初始值也是“NULL”,整个过程看起来并没有什么问题。目前是改写了一下这个函数,新加了一些日志,期望下次 core 的时候能有额外的发现

@troycheng
Copy link
Contributor Author

troycheng commented Mar 7, 2019

回头来继续看了一下这个问题,对最近的几次 core 分析,代码有修改,位置也有所不同,都最终都是 core 在
ETCDPROC_LOG << ETCDFILE_FUN << "|response,ret= " << ret << "|retryTime= " << retryTime << endl;
这一句里,只是每次断点位置不太一样,鉴于其它部分都是字符串拼接,这个 logger有问题的可能性比较大。其中一次的断点位置在 LoggerStream 的析构函数里,目前触发条件仍未理清,但感觉排查方向应该是对了

#0  0x00007fe20422aa1c in free () from /lib64/libc.so.6
#1  0x000000000063c7eb in std::ios_base::~ios_base() () at ../../../.././libstdc++-v3/src/c++98/ios.cc:93
#2  0x00000000004b6f56 in tars::LoggerStream::~LoggerStream() () at /mnt/homework/gcc/gcc/include/c++/4.8.2/bits/basic_ios.h:276

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant