Skip to content

thread_queue: per-task heap allocation in staged queue could be further optimised #7050

@toxicteddy00077

Description

@toxicteddy00077

Problem

Normal-priority task (the default for hpx::async) goes through the staged queue in thread_queue.hpp, which allocates and frees a task_description object per task:

 // thread_queue.hpp:774
 task_description* td = task_description_alloc_.allocate(1);  
 // thread_queue.hpp:244
 task_description_alloc_.deallocate(task, 1);                 

which uses plain malloc and free.

shared_priority_queue_scheduler avoids this entirely by storing task_description by value in a moodycamel::ConcurrentQueue (thread_queue_mc.hpp). Current schedulers using thread_queue.hpp are:

  1. local_priority_queue_scheduler
  2. local_queue_scheduler
  3. background_scheduler
  4. local_workrequesting_scheduler

Solution

Replace boost::lockfree::queue<task_description*> in the staged queue of thread_queue.hpp with moodycamel::ConcurrentQueue<task_description>. This stores task_description by value which eliminates the allocate(1)/deallocate(1) pair entirely. thread_queue_mc.hpp already has the similar pattern.

Benchmarks

Ran the future_overhead.cpp

Image

There is clear performance degradation when I used local-priority-fifo

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions