Problem
Normal-priority task (the default for hpx::async) goes through the staged queue in thread_queue.hpp, which allocates and frees a task_description object per task:
// thread_queue.hpp:774
task_description* td = task_description_alloc_.allocate(1);
// thread_queue.hpp:244
task_description_alloc_.deallocate(task, 1);
which uses plain malloc and free.
shared_priority_queue_scheduler avoids this entirely by storing task_description by value in a moodycamel::ConcurrentQueue (thread_queue_mc.hpp). Current schedulers using thread_queue.hpp are:
- local_priority_queue_scheduler
- local_queue_scheduler
- background_scheduler
- local_workrequesting_scheduler
Solution
Replace boost::lockfree::queue<task_description*> in the staged queue of thread_queue.hpp with moodycamel::ConcurrentQueue<task_description>. This stores task_description by value which eliminates the allocate(1)/deallocate(1) pair entirely. thread_queue_mc.hpp already has the similar pattern.
Benchmarks
Ran the future_overhead.cpp
There is clear performance degradation when I used local-priority-fifo
Problem
Normal-priority task (the default for hpx::async) goes through the staged queue in thread_queue.hpp, which allocates and frees a task_description object per task:
which uses plain malloc and free.
shared_priority_queue_scheduler avoids this entirely by storing task_description by value in a
moodycamel::ConcurrentQueue(thread_queue_mc.hpp). Current schedulers using thread_queue.hpp are:Solution
Replace
boost::lockfree::queue<task_description*>in the staged queue of thread_queue.hpp withmoodycamel::ConcurrentQueue<task_description>. This stores task_description by value which eliminates the allocate(1)/deallocate(1) pair entirely. thread_queue_mc.hpp already has the similar pattern.Benchmarks
Ran the future_overhead.cpp
There is clear performance degradation when I used local-priority-fifo