thread_queue: per-task heap allocation in staged queue could be further optimised

# Problem 
Normal-priority task (the default for hpx::async) goes through the staged queue in thread_queue.hpp, which allocates and frees a task_description object per task:

     // thread_queue.hpp:774
     task_description* td = task_description_alloc_.allocate(1);  
     // thread_queue.hpp:244
     task_description_alloc_.deallocate(task, 1);                 

which uses plain malloc and free.

shared_priority_queue_scheduler avoids this entirely by storing task_description by value in a `moodycamel::ConcurrentQueue` (_thread_queue_mc.hpp_). Current schedulers using _thread_queue.hpp_ are:
 
1. local_priority_queue_scheduler 
2. local_queue_scheduler
3. background_scheduler
4.  local_workrequesting_scheduler


# Solution
Replace `boost::lockfree::queue<task_description*>` in the staged queue of thread_queue.hpp with `moodycamel::ConcurrentQueue<task_description>`. This stores task_description by value which eliminates the allocate(1)/deallocate(1) pair entirely. thread_queue_mc.hpp already has the similar pattern.

# Benchmarks
Ran the _future_overhead.cpp_

<img width="2550" height="509" alt="Image" src="https://github.com/user-attachments/assets/5ddd43f6-7909-469a-8e9c-1cad78644bec" />

There is clear performance degradation when I used _local-priority-fifo_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

thread_queue: per-task heap allocation in staged queue could be further optimised #7050

Problem

Solution

Benchmarks

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

thread_queue: per-task heap allocation in staged queue could be further optimised #7050

Description

Problem

Solution

Benchmarks

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions