Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Add Way to Initialize Threads Before Workers Receive Work #58

Closed
NewbiZ opened this issue Apr 3, 2024 · 7 comments · Fixed by #63
Closed

[FEATURE] Add Way to Initialize Threads Before Workers Receive Work #58

NewbiZ opened this issue Apr 3, 2024 · 7 comments · Fixed by #63
Labels
enhancement New feature or request
Milestone

Comments

@NewbiZ
Copy link

NewbiZ commented Apr 3, 2024

It would be good to be able to run an init function on all threads of the pool to be executed prior to any other task.

In my use case, that would be to pin each worker affinity and set a thread name, but other use cases could be setting a specific stack size, etc.

@DeveloperPaul123
Copy link
Owner

Hi there 👋

Interesting suggestion! Would you mind sharing some example code (or pseudo code) of how you would imagine this working or what the API would look like?

@DeveloperPaul123 DeveloperPaul123 changed the title Thread init Thread Initialization Function Apr 3, 2024
@DeveloperPaul123 DeveloperPaul123 changed the title Thread Initialization Function Add Way to Initialize Threads Before Workers Receive Work Apr 3, 2024
@DeveloperPaul123 DeveloperPaul123 added the enhancement New feature or request label Apr 3, 2024
@DeveloperPaul123 DeveloperPaul123 changed the title Add Way to Initialize Threads Before Workers Receive Work [FEATURE] Add Way to Initialize Threads Before Workers Receive Work Apr 3, 2024
@DeveloperPaul123
Copy link
Owner

Here's an idea:

class thread_pool {
      public:
        explicit thread_pool(
            const FunctionType& init_function,
            const unsigned int &number_of_threads = std::thread::hardware_concurrency())
            : tasks_(number_of_threads) {
            std::size_t current_id = 0;
            for (std::size_t i = 0; i < number_of_threads; ++i) {
                priority_queue_.push_back(size_t(current_id));
                threads_.emplace_back([&, id = current_id,  init = init_function](const std::stop_token &stop_tok) {
                    std::invoke(init);
                   // other stuff
                });
            }
        }
};

@NewbiZ
Copy link
Author

NewbiZ commented Apr 4, 2024

Interesting suggestion! Would you mind sharing some example code (or pseudo code) of how you would imagine this working or what the API would look like?

To be fair, a simple lambda (packaged_task / std::function / etc) should suffice. This is how e.g. the BS thread pool implements it, and it's enough for 99.99% of use cases. Typically I would just need to set a per worker thread seed, name, affinity and some TLS data (e.g. memory pool caches). No context should be necessary strictly speaking, though having access to the worker index may be nice for debugging / logging purposes. It wouldn't need to be passed in explicitely to the init function though, some implementations (e.g. DPDK) store worker thread accounting information in the TLS and expose some functions to retrieve it.

See usage from BS thread pool:

Or DPDK:

The DPDK case is interesting because they also allow storing a deinit callback to be executed when the worker thread (lcore in their nomenclature) is removed from the pool (in case of dynamic resizing, or the pool is destroyed). Use cases for deinit are much less frequent than init though, and trickier to get right (e.g. you need to ensure the deinit is called if an exception is thrown, so there ideally has to be some form of scoped guard), so I think it can be skipped.

I would imagine the API could look like (pseudo code):

dp::thread_pool pool(4, [](std::size_t id){
    // This thread pool limits workers to physical cores 1-4
    cpu_set_t cpuset;
    CPU_ZERO(&cpuset);
    CPU_SET(0, &cpuset);
    CPU_SET(1, &cpuset);
    CPU_SET(2, &cpuset);
    CPU_SET(3, &cpuset);
    pthread_setaffinity_np(pthread_self(), sizeof(cpuset), &cpuset);

    // Set thread name, with pool-relative index
    pthread_setname_np(pthread_self(), std::format("worker-{}", id));
    // It is always useful to be able to expose accounting information to worker threads: their index in the pool, but
    // also the number of workers in the current pool, access to the pool itself, etc. (e.g. to allow worker thread to grow
    // the pool dynamically if more work is required). Providing this accounting information through parameters to the init
    // function is a possible way, but it may be more flexible to just store them all in TLS and expose some functions to
    // retrieve them, for instance dp::this_worker::id(), dp::this_worker::current_pool(). Anyway this is just nice to have, as
    // it can be implemented by users themselves as long as they have access to an init function.
});

@DeveloperPaul123
Copy link
Owner

DeveloperPaul123 commented Apr 4, 2024

Awesome thanks!
This should be doable and I think will also address #36

@DeveloperPaul123 DeveloperPaul123 added this to the 0.7.0 milestone Apr 4, 2024
@DeveloperPaul123
Copy link
Owner

Actually, this will not fully address #36 since setting the stack size of the thread has to be done before the thread is created, which makes sense. I still think the init function is a good idea and will still add it.

@DeveloperPaul123
Copy link
Owner

@NewbiZ Could you take a look at #63 and see if that would meet your needs?

@NewbiZ
Copy link
Author

NewbiZ commented Apr 26, 2024

That looks perfect :-) thanks a lot !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants