Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boost ASIO multithreaded program which shares a single IO context hangs #1353

Open
rohitpai opened this issue Sep 7, 2023 · 1 comment
Open

Comments

@rohitpai
Copy link

rohitpai commented Sep 7, 2023

I am using boost version 1.79 on embedded linux 5.15 platform which is running on Dual Core Cortex A7 hardware.
I have the below code which I am testing to check sharing same IO context with multiple threads (which execute io.run)

#include <boost/asio.hpp>
#include <iostream>
#include <iomanip>

namespace asio = boost::asio;
using namespace std::chrono_literals;
using boost::system::error_code;

static std::atomic_int tid_gen = 0;
thread_local int const tid     = [] { return ++tid_gen; }();
static constexpr auto  now = std::chrono::steady_clock::now;
static auto const      start   = now();
static std::mutex console_mx;

void trace(auto const&... msg) {
    std::lock_guard lk(console_mx);
    std::cerr << "at " << std::setw(8) << (now() - start)/1ms << "ms - tid:" << tid << " ";
    (std::cerr << ... << msg) << std::endl;
}

void worker(asio::io_context& ioContext) {
    trace("Worker thread enter");
    ioContext.run(); // Run the io_context to handle asynchronous operations
    trace("Worker thread exit");
}

int main() {
    try {
        asio::io_context ioContext;

        asio::steady_timer task1(ioContext, 100ms);
        asio::steady_timer task2(ioContext, 200ms);

        task1.async_wait([](error_code ec) {
            trace("Start Task1: ", ec.message());
            if (!ec)
                sleep(6);
        });

        task2.async_wait([](error_code ec) {
            trace("Start Task2: ", ec.message());
            if (!ec)
                sleep(12);
        });

        // Create a work object to prevent ioContext.run() from returning immediately
        auto work = make_work_guard(ioContext);

        // Create multiple worker threads
        std::vector<std::thread> threads;
        for (int i = 0; i < 2; ++i) {
            threads.emplace_back(worker, std::ref(ioContext));
        }


        trace("App started :", std::thread::hardware_concurrency());

        work.reset();
        // Join the worker threads
        for (auto& thread : threads) {
            thread.join();
        }

        trace("All worker threads joined.");
    } catch (std::exception const& e) {
        trace("Exception: ", std::quoted(e.what()));
    }
}

Output

root@hgx:~# /usr/bin/progress-code
at        0ms - tid:1 Worker thread enter
at        1ms - tid:2 App started :2
at        2ms - tid:3 Worker thread enter
at      100ms - tid:1 Start Task1: Success
at     6100ms - tid:1 Start Task2: Success
at    18100ms - tid:1 Worker thread exit

The program never terminates and need to be killed. One worker thread seem to hang and only one worker thread is picking up both timer callbacks serially.
I am not able to understand the behavior. As per this article both the threads should run concurrently and pickup the handlers.
Why one worker thread executing ioContext.run() always hangs in the above example. Need your help to understand if its a bug in the library or something wrong in the test code.

GDB back trace of the thread that hangs

#0  0x76cf5ce4 in pause () from /lib/libc.so.6
#1  0x0040a0c0 in boost::asio::detail::null_event::do_wait () at /home/ropai/SDK_GROUP/sysroots/armv7ahf-vfpv4d16-openbmc-linux-gnueabi/usr/include/boost/asio/detail/impl/null_event.ipp:47
#2  boost::asio::detail::null_event::wait<boost::asio::detail::conditionally_enabled_mutex::scoped_lock> (this=<optimized out>) at /home/ropai/SDK_GROUP/sysroots/armv7ahf-vfpv4d16-openbmc-linux-gnueabi/usr/include/boost/asio/detail/null_event.hpp:82
#3  boost::asio::detail::conditionally_enabled_event::wait (this=<optimized out>, lock=...) at /home/ropai/SDK_GROUP/sysroots/armv7ahf-vfpv4d16-openbmc-linux-gnueabi/usr/include/boost/asio/detail/conditionally_enabled_event.hpp:97
#4  boost::asio::detail::scheduler::do_run_one (this=this@entry=0x42d128, lock=..., this_thread=..., ec=...) at /home/ropai/SDK_GROUP/sysroots/armv7ahf-vfpv4d16-openbmc-linux-gnueabi/usr/include/boost/asio/detail/impl/scheduler.ipp:501
#5  0x004056c8 in boost::asio::detail::scheduler::run (ec=..., this=0x42d128) at /home/ropai/SDK_GROUP/sysroots/armv7ahf-vfpv4d16-openbmc-linux-gnueabi/usr/include/boost/asio/detail/impl/scheduler.ipp:210
#6  boost::asio::io_context::run (this=<optimized out>) at /home/ropai/SDK_GROUP/sysroots/armv7ahf-vfpv4d16-openbmc-linux-gnueabi/usr/include/boost/asio/impl/io_context.ipp:63
#7  worker (ioContext=...) at ../progress_code_main.cpp:182
@justend29
Copy link

justend29 commented Jan 23, 2024

@rohitpai
I'm not sure what the issue is on your platform. Running your reproduction program on Linux w/ Boost.asio 1.83 terminates after the 12.2 seconds as you expect:

at        0ms - tid:1 Worker thread enter
at        0ms - tid:2 Worker thread enter
at        0ms - tid:3 App started :8
at      100ms - tid:1 Start Task1: Success
at      200ms - tid:2 Start Task2: Success
at    12200ms - tid:2 Worker thread exit
at    12200ms - tid:1 Worker thread exit
at    12200ms - tid:3 All worker threads joined.

It's a bit odd to see sleep() in an async program. These could be replaced with more steady timers to allow other work to be completed while those strands wait. Maybe replacing those will prove better results.

Sorry that I don't have anything to help - maybe try updating asio, as I'm testing with an updated version compared to you. The classic phrase: "it works on my machine"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants