Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault with performance logger on multi-threaded environment #580

Closed
Forbinn opened this issue Oct 24, 2017 · 6 comments
Closed

Comments

@Forbinn
Copy link

Forbinn commented Oct 24, 2017

Hello,

I have a multi-threaded program that uses easylogging to do both normal and performance logging. My program is compiled with the define ELPP_THREAD_SAFE. Sometime it crashes randomly and the GDB backtrace always shows me an access violation inside the el::base::DefaultLogDispatchCallback::dispatch function.

Here is the backtrace:

#0  0x0000000000419002 in el::base::DefaultLogDispatchCallback::dispatch(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&) (this=0x674f10, 
    logLine=<unknown type in /home/forbinn/test/build-test-Desktop-Debug/test, CU 0x31f1b, DIE 0x75520>) at ../test/easylogging++.cpp:2132
#1  0x0000000000418dc4 in el::base::DefaultLogDispatchCallback::handle (this=0x674f10, data=0x7fffdf7fd090) at ../test/easylogging++.cpp:2105
#2  0x000000000041a211 in el::base::LogDispatcher::dispatch (this=0x7fffdf7fd2b0) at ../test/easylogging++.cpp:2385
#3  0x000000000041af3c in el::base::Writer::triggerDispatch (this=0x7fffdf7fd810) at ../test/easylogging++.cpp:2516
#4  0x000000000041ad08 in el::base::Writer::processDispatch (this=0x7fffdf7fd810) at ../test/easylogging++.cpp:2499
#5  0x0000000000407430 in el::base::Writer::~Writer (this=0x7fffdf7fd810, __in_chrg=<optimized out>) at ../test/easylogging++.h:3246
#6  0x0000000000420c16 in el::base::DefaultPerformanceTrackingCallback::handle (this=0x674e70, data=0x7fffdf7fd8e0) at ../test/easylogging++.h:3579
#7  0x000000000041b7d7 in el::base::PerformanceTracker::~PerformanceTracker (this=0x7fffd00008c0, __in_chrg=<optimized out>) at ../test/easylogging++.cpp:2585
#8  0x000000000041b8a0 in el::base::PerformanceTracker::~PerformanceTracker (this=0x7fffd00008c0, __in_chrg=<optimized out>) at ../test/easylogging++.cpp:2591
#9  0x000000000040b272 in std::default_delete<el::base::PerformanceTracker>::operator() (this=0x7fffdf7fda58, __ptr=0x7fffd00008c0) at /usr/include/c++/6.4.1/bits/unique_ptr.h:76
#10 0x000000000040ab19 in std::unique_ptr<el::base::PerformanceTracker, std::default_delete<el::base::PerformanceTracker> >::~unique_ptr (this=0x7fffdf7fda58, __in_chrg=<optimized out>)
    at /usr/include/c++/6.4.1/bits/unique_ptr.h:239
#11 0x0000000000405eef in <lambda()>::operator()(void) const (__closure=0x7fffdf7fdc7c) at ../test/main.cpp:21
#12 0x0000000000406c5c in boost::asio::asio_handler_invoke<main()::<lambda()> >(<lambda()> &, ...) (function=...) at /usr/include/boost/asio/handler_invoke_hook.hpp:69
#13 0x0000000000406a68 in boost_asio_handler_invoke_helpers::invoke<main()::<lambda()>, main()::<lambda()> >(<lambda()> &, <lambda()> &) (function=..., context=...)
    at /usr/include/boost/asio/detail/handler_invoke_helpers.hpp:37
#14 0x000000000040690b in boost::asio::detail::completion_handler<main()::<lambda()> >::do_complete(boost::asio::detail::io_service_impl *, boost::asio::detail::operation *, const boost::system::error_code &, std::size_t) (owner=0x675c90, base=0x676ca0) at /usr/include/boost/asio/detail/completion_handler.hpp:68
#15 0x000000000040880a in boost::asio::detail::task_io_service_operation::complete (this=0x676ca0, owner=..., ec=..., bytes_transferred=0) at /usr/include/boost/asio/detail/task_io_service_operation.hpp:38
#16 0x0000000000409706 in boost::asio::detail::task_io_service::do_run_one (this=0x675c90, lock=..., this_thread=..., ec=...) at /usr/include/boost/asio/detail/impl/task_io_service.ipp:372
#17 0x0000000000409370 in boost::asio::detail::task_io_service::run (this=0x675c90, ec=...) at /usr/include/boost/asio/detail/impl/task_io_service.ipp:149
#18 0x0000000000409991 in boost::asio::io_service::run (this=0x7fffffffe3c0) at /usr/include/boost/asio/impl/io_service.ipp:59
#19 0x0000000000405fab in <lambda()>::operator()(void) const (__closure=0x6eb498) at ../test/main.cpp:27
#20 0x000000000040703e in std::_Bind_simple<main()::<lambda()>()>::_M_invoke<>(std::_Index_tuple<>) (this=0x6eb498) at /usr/include/c++/6.4.1/functional:1391
#21 0x0000000000406fdb in std::_Bind_simple<main()::<lambda()>()>::operator()(void) (this=0x6eb498) at /usr/include/c++/6.4.1/functional:1380
#22 0x0000000000406fba in std::thread::_State_impl<std::_Bind_simple<main()::<lambda()>()> >::_M_run(void) (this=0x6eb490) at /usr/include/c++/6.4.1/thread:197
#23 0x00007ffff790676f in std::execute_native_thread_routine (__p=0x6eb490) at ../../../../../libstdc++-v3/src/c++11/thread.cc:83
#24 0x00007ffff711473a in start_thread (arg=0x7fffdf7fe700) at pthread_create.c:333
#25 0x00007ffff6e4ee7f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97

This backtrace has been generated by the following program:

#include "easylogging++.h"

#include <vector>
#include <thread>

#include <boost/asio.hpp>
#include <boost/range/irange.hpp>

int main()
{
    constexpr auto count = 10000;
    const auto nbThread = std::thread::hardware_concurrency();

    std::vector<std::thread> threads;
    boost::asio::io_service service;

    for (auto i : boost::irange(0, count))
    {
        service.post([i]
        {
            TIMED_FUNC(blk);
            LOG(INFO) << "Log line " << i;
        });
    }

    for (std::size_t i = 0; i < nbThread; ++i)
        threads.emplace_back([&service]{ service.run(); });

    service.run();
    for (std::size_t i = 0; i < nbThread; ++i)
        threads[i].join();

    return 0;
}

This program creates a thread-pool and push 10 000 tasks in it. It then start n thread (depending on your platform) that output logs as fast as possible.

FYI: it works perfectly if I remove the TIMED_FUNC(blk) line.

Also this program is compiled using the following flags:
-std=c++11 -pthread -O0 -g -Wall -W -fPIC -DELPP_THREAD_SAFE -DELPP_DISABLE_LOGGING_FLAGS_FROM_ARG -DELPP_DISABLE_LOG_FILE_FROM_ARG -DELPP_CUSTOM_COUT=std::cerr -DELPP_STL_LOGGING -DELPP_LOG_STD_ARRAY -DELPP_LOG_UNORDERED_MAP -DELPP_LOG_UNORDERED_SET -DELPP_DISABLE_VERBOSE_LOGS -DELPP_PERFORMANCE_MICROSECONDS -DELPP_NO_DEFAULT_LOG_FILE -DELPP_FEATURE_PERFORMANCE_TRACKING -DAUTO_INITIALIZE_EASYLOGGINGPP

@HomeLH
Copy link

HomeLH commented Jan 8, 2018

I met the same problem after I updated the easyloggingpp to the latest version.

GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "arm-linux-gnueabihf".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./uart_server...done.
[New LWP 5698]
[New LWP 5702]
[New LWP 5699]
[New LWP 5700]
[New LWP 5701]
[New LWP 5706]
[New LWP 5694]
[New LWP 5704]
[New LWP 5705]
[New LWP 5696]
[New LWP 5697]
[New LWP 5703]
[New LWP 5695]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
Core was generated by `./uart_server'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  el::base::DefaultLogDispatchCallback::dispatch(std::string&&) (this=0xfb40f0, 
    logLine=...) at ../uart_server/lib/easyloggingpp/src/easylogging++.cc:2132
2132	../uart_server/lib/easyloggingpp/src/easylogging++.cc: No such file or directory.
[Current thread is 1 (Thread 0x750ff450 (LWP 5698))]

@BigTony
Copy link

BigTony commented Jan 8, 2018

Same problem on same line. I also noticed, that sometimes default logger add its log to log file reserved only for performance log. That DEBUG log shouldn't be there:
2018-01-08 10:24:28,213055| [INFO]| stackWorkMachine0| | Executed [example::PrecisionTimeDuration example::RandomGeneratedTask::hard_work(example::TimeDuration)] in [7 ms]" 2018-01-08 10:24:28,223172| [INFO]| stackWorkMachine0| | Executed [example::PrecisionTimeDuration example::RandomGeneratedTask::soft_work(example::TimeDuration)] in [10 ms]" 2018-01-08 10:24:28,228628| [INFO]| stackWorkMachine0| | Executed [example::PrecisionTimeDuration example::RandomGeneratedTask::hard_work(example::TimeDuration)] in [5 ms]" 2018-01-08 10:24:28,239767| [INFO]| stackWorkMachine0| | Executed [example::PrecisionTimeDuration example::RandomGeneratedTask::soft_work(example::TimeDuration)] in [11 ms]" 2018-01-08 10:24:28,244195| [INFO]| stackWorkMachine0| | Executed [example::PrecisionTimeDuration example::RandomGeneratedTask::hard_work(example::TimeDuration)] in [5 ms]" 2018-01-08 10:24:28,252380| [INFO]| stackWorkMachine0| | Executed [example::PrecisionTimeDuration example::RandomGeneratedTask::soft_work(example::TimeDuration)] in [8 ms]" 2018-01-08 10:24:28,252380| [INFO]| stackWorkMachine0| | Executed [] in [46 ms]" 2018-01-08 10:24:28,259879| [INFO]| stackWorkMachine0| | Executed [example::PrecisionTimeDuration example::RandomGeneratedTask::hard_work(example::TimeDuration)] in [7 ms]" 2018-01-08 10:24:28,269| [DEBUG]| default| main| const std::map<std::unique_ptr<example::TaskInterface>, boost::any>& example::UniversalWorkMachine::exec()| | Notifying all working threads 2018-01-08 10:24:28,252380| [INFO]| stackWorkMachine0| | Executed [example::PrecisionTimeDuration example::RandomGeneratedTask::soft_work(example::TimeDuration)] in [8 ms]" 2018-01-08 10:24:28,252380| [INFO]| stackWorkMachine0| | Executed [] in [46 ms]" 2018-01-08 10:24:28,259879| [INFO]| stackWorkMachine0| | Executed [example::PrecisionTimeDuration example::RandomGeneratedTask::hard_work(example::TimeDuration)] in [7 ms]" 2018-01-08 10:24:28,269881| [INFO]| stackWorkMachine0| | Executed [example::PrecisionTimeDuration example::RandomGeneratedTask::soft_work(example::TimeDuration)] in [10 ms]" 2018-01-08 10:24:28,275396| [INFO]| stackWorkMachine0| | Executed [example::PrecisionTimeDuration example::RandomGeneratedTask::hard_work(example::TimeDuration)] in [6 ms]" 2018-01-08 10:24:28,286656| [INFO]| stackWorkMachine0| | Executed [example::PrecisionTimeDuration example::RandomGeneratedTask::soft_work(example::TimeDuration)] in [11 ms]" 2018-01-08 10:24:28,291085| [INFO]| stackWorkMachine0| | Executed [example::PrecisionTimeDuration example::RandomGeneratedTask::hard_work(example::TimeDuration)] in [5 ms]" 2018-01-08 10:24:28,299131| [INFO]| stackWorkMachine0| | Executed [example::PrecisionTimeDuration example::RandomGeneratedTask::soft_work(example::TimeDuration)] in [8 ms]"

I'm using these defines: "ELPP_THREAD_SAFE" "ELPP_FORCE_USE_STD_THREAD" "ELPP_FEATURE_PERFORMANCE_TRACKING" "ELPP_NO_DEFAULT_LOG_FILE"

@milos1290
Copy link

same stacktrace, it's getting annoying. Any tips ?

@lnagl
Copy link

lnagl commented Feb 7, 2018

I've found a similar issue using Windows when logging a massive amount of information from multiple threads using a non-"default" logger. It shows up in a couple different places. I can't make a pretty stack trace, but in one example it's:
Writer::~Writer
Writer::processDispatch()
Writer::triggerDispatch()
LogDispatcher::dispatch()
DefaultLogDispatchCallback::handle
DefaultLogDispatchCallback::dispatch
LogMessage::level

I've tracked down the issue to the change made in #571 regarding the global lock. When I revert those specific changes, the issue goes away. I don't know exactly what's going on, but it seems that DefaultLogDispatchCallback::m_data is being corrupted.

@BigTony
Copy link

BigTony commented Feb 7, 2018

I've tracked down the issue to the change made in #571 regarding the global lock. When I revert those specific changes, the issue goes away. I don't know exactly what's going on, but it seems that DefaultLogDispatchCallback::m_data is being corrupted.

Good job, yea I also tested revert of commit 699c12e and it also helped. No more fails and no more mixed output!

So from my point of view it seems that something in dispatch() make mess if it's not synchronized. But, I have no clue what is it. Maybye author of that change @abumusamq could help us?

@abumq abumq closed this as completed in 094e4ab Apr 3, 2018
abumq added a commit that referenced this issue Apr 3, 2018
closes #580 conditional global lock
@abumq
Copy link
Owner

abumq commented Apr 3, 2018

This issue is fixed in v9.96.4

For those affected by this change you can define ELPP_NO_GLOBAL_LOCK

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants