Skip to content

Convert ink_queue implementation to std::atomic#13170

Draft
JosiahWI wants to merge 10 commits into
apache:masterfrom
JosiahWI:refactor/std-atomic-ink-queue
Draft

Convert ink_queue implementation to std::atomic#13170
JosiahWI wants to merge 10 commits into
apache:masterfrom
JosiahWI:refactor/std-atomic-ink-queue

Conversation

@JosiahWI
Copy link
Copy Markdown
Contributor

@JosiahWI JosiahWI commented May 18, 2026

The PR title is fairly self-explanatory, but the design choices here deserve explicit mention.

  • The head_p type is no longer a union with a {pointer, version} field

This has been done to eliminate type punning, which was done all over the implementation, and is UB. The pointer and version are now always set through the macros FREELIST_POINTER, FREELIST_VERSION, and SET_FREELIST_POINTER_VERSION, which use a separate {pointer, version} struct type and memcpy on platforms where this is appropriate (see preprocessor defs for the list).

  • The head_p type has been changed into a type alias of the data type

This is necessary so that the head of the list will be an atomic integer type instead of an atomic class type to be sure it can use 128bit atomic hardware instructions on platforms that support them.

  • Freelist alignment is now adjusted to be satisfy head_p alignment requirements

This is actually a bug in master. There is no issue created for it. It happens to work to allocate head_p objects that are underaligned on our supported platforms, but it is UB and could very realistically fail... to drive this point home, the included unit tests segfault with the original implementation for this exact reason (they pass if the alignment is adjusted).

  • The freelist and atomiclist pop operations (called freelist_new for the freelist) are now locked to provide mutual exclusion

This one is annoying and might need to be reverted from this PR and considered separately. This particular change was made to fully fix #11640 - there is a minor data race without it in that the second pointer from the list head can be overwritten by an allocator's placement new before it is read without synchronization in freelist_new (a similar argument applies to atomiclist_push) by another thread, which is going to subsequently find out the list head is stale and retry. Thus, the garbage pointer is not dereferenced, but this is still UB.

I have been thinking about approaches here. One approach is to add an atomic flag to the list head that is set by any thread popping from the list. A thread that has successfully popped can spin on that flag to wait for the completion of any other threads still reading the memory it is about to return. My intuition is that this will be better, but I don't know without benchmarking, and it's a lot more complex than the lock.

Fixes #5398
Fixes #11640 in release mode only (dummy_forced_read calls still race)

Previous Work

See #7382. This PR is only a step in the direction of #7382; it retains a lot of the old code structure along with most of its design flaws. If this change is accepted, it should thereafter be possible to apply other design improvements from #7382, such as the fleshed out versioned pointer type, with greater confidence.

A Few Comments About Assertions

This PR adds a hoard of assertions that check alignment requirements. Most of them are debug assertions - the alignment check on the pointer passed to freelist_push is a release assert for now, because it would almost certainly indicate a major issue if it triggered. According to the comment from @bryancall, this assertion was in fact failing before (it was previously a debug assert, and he commented it out). I am hopeful that that issue is now resolved.

I have commented out the assertions in atomiclist, because the alignment requirements for atomiclist are established elsewhere in ATS - and established incorrectly. The head_p objects are misaligned all over the place. That is bad. Unfortunately, it's not trivial to fix, because the head_p objects have to be aligned at an offset from the base pointer, which also has to be aligned, but for a different object type (e.g. Event).

Performance Implications

I ran the following benchmarks in WSL running on an AMD Ryzen 5 5500U processor.

Freelist - Single Threaded Release Performance - Before

benchmark name                       samples       iterations    est run time
                                     mean          low mean      high mean
                                     std dev       low std dev   high std dev
-------------------------------------------------------------------------------
Single threaded alloc                          100           587      2.348 ms
                                        36.2374 ns    35.5824 ns    37.6799 ns
                                        4.69932 ns    2.48233 ns    9.03906 ns

Single threaded free                           100          1412     2.2592 ms
                                        15.2373 ns    14.9109 ns    16.1355 ns
                                        2.52519 ns    1.00718 ns    5.32402 ns

Freelist - Single Threaded Release Performance - After

Notice that the allocation takes a performance hit because of the added lock.

benchmark name                       samples       iterations    est run time
                                     mean          low mean      high mean
                                     std dev       low std dev   high std dev
-------------------------------------------------------------------------------
Single threaded alloc                          100           600       2.34 ms
                                        40.3439 ns    38.0766 ns    46.6134 ns
                                        17.5734 ns    7.42366 ns    35.7456 ns

Single threaded free                           100           346     2.3528 ms
                                        12.0705 ns      11.37 ns    15.4749 ns
                                        6.78033 ns   0.198969 ns    16.1671 ns

Freelist - Single Threaded Release Performance - After Without Lock

This represents the potential performance of allocation without the mutex guarding the allocation routine. This case represents similar behavior to the original code, which did not have the lock, either.

benchmark name                       samples       iterations    est run time
                                     mean          low mean      high mean
                                     std dev       low std dev   high std dev
-------------------------------------------------------------------------------
Single threaded alloc                          100           672      2.352 ms
                                        30.9758 ns    29.5555 ns    36.3515 ns
                                        12.3705 ns    2.25337 ns    28.5114 ns

Single threaded free                           100           489     2.3472 ms
                                        13.0952 ns    11.6199 ns    16.8676 ns
                                        10.5817 ns     2.0172 ns     19.303 ns

Multithreaded Performance (Contention)

I don't know the best way to test this. I would appreciate input from @bryancall or anyone who knows how to benchmark overall ATS performance. I'm concerned that the added lock will have unacceptable performance impacts, but I don't know how I should confirm this.

@JosiahWI JosiahWI self-assigned this May 18, 2026
@JosiahWI JosiahWI added this to the 11.0.0 milestone May 18, 2026
@JosiahWI JosiahWI force-pushed the refactor/std-atomic-ink-queue branch from e507f16 to caf1333 Compare May 18, 2026 16:54
Comment thread include/tscore/ink_queue.h Outdated
Comment on lines +107 to +114
#elif defined(__x86_64__) || defined(__ia64__) || defined(__powerpc64__) || defined(__mips64)

struct head_p_view {
int vaddr : 48;
int version : 15;
int vaddr_mode : 1;
};
#endif
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put this here as documentation but didn't use it anywhere. Should I remove it?

Comment thread src/tscore/ink_queue.cc
SET_FREELIST_POINTER_VERSION(next, FROM_PTR(nullptr), FREELIST_VERSION(item) + 1);
result = ink_atomic_cas(&l->head.data, item.data, next.data);
} while (result == 0);
{
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this unnecessary block nesting.

Comment thread src/tscore/ink_queue.cc
Comment on lines +627 to +631
while (e) {
head_p *e_ = to_head_p(e, l->offset);
void *n = TO_PTR(FREELIST_POINTER(*e_));
SET_FREELIST_POINTER_VERSION(*e_, n, FREELIST_VERSION(*e_));
e = n;
Copy link
Copy Markdown
Contributor Author

@JosiahWI JosiahWI May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the offset is applied before checking for nullptr, it is possible to pass the tscore tests but fail a bunch of other unit tests (event system and cache tests).

@JosiahWI JosiahWI force-pushed the refactor/std-atomic-ink-queue branch from caf1333 to a106ab4 Compare May 18, 2026 17:03
@bryancall bryancall requested a review from Copilot May 18, 2026 22:01
@bryancall bryancall requested a review from cmcfarlen May 18, 2026 22:01
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the ink_queue freelist / atomic list implementation to use std::atomic-based state (including a revised head_p representation) and adds unit tests/benchmarks to validate and measure the behavior. The goal is to eliminate UB from type punning and improve correctness around alignment and concurrency.

Changes:

  • Refactor head_p to an integral type and introduce memcpy-based view/load/store helpers for pointer+version packing.
  • Update freelist/atomiclist operations to use std::atomic (and add mutex-based mutual exclusion for pop paths).
  • Add Catch2 unit tests/benchmarks for freelist and atomic list behavior, and update build configuration accordingly.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
include/tscore/ink_queue.h Refactors head_p, adds atomic/mutex members to list types, and updates pointer/version access macros.
src/tscore/ink_queue.cc Migrates freelist/atomiclist logic to std::atomic + new packing helpers; adds alignment checks and mutexes.
src/tscore/unit_tests/test_ink_queue.cc New Catch2 unit tests and benchmarks for freelist/atomic list behavior.
src/tscore/CMakeLists.txt Adds the new unit test and links atomic.
src/proxy/logging/LogObject.cc Adapts CAS usage / version typing to the new head_p API.

Comment thread include/tscore/ink_queue.h
Comment thread include/tscore/ink_queue.h Outdated
Comment thread include/tscore/ink_queue.h Outdated
Comment thread src/tscore/ink_queue.cc Outdated
Comment thread src/tscore/ink_queue.cc Outdated
Comment thread src/tscore/ink_queue.cc
Comment on lines +652 to +656
h = FREELIST_POINTER(head);
ink_assert(item != TO_PTR(h));

recovered_item = new (reinterpret_cast<unsigned char *>(item) + l->offset) head_p{};
SET_FREELIST_POINTER_VERSION(*recovered_item, FREELIST_POINTER(head), 0);
Comment thread src/tscore/unit_tests/test_ink_queue.cc Outdated
Comment thread src/tscore/unit_tests/test_ink_queue.cc
Comment thread src/tscore/CMakeLists.txt Outdated
Comment thread include/tscore/ink_queue.h
JosiahWI added 9 commits May 19, 2026 06:45
* Fix macro definitions for all platforms

The definitions still used the `data` member of the old `head_p` struct.

* Clean up `head_p_view` definitions

* Remove unused `<iostream>` include

* Make `freelist_init` alignment check consistent

* Test atomiclist with offset

* Add @file description to test_ink_queue.cc

* Add missing `<vector>` include to unit tests

* Construct `InkFreeList` with placement new
* Only link libatomic when using 128bit atomics and lib exists
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TSan: ink atomic queue not so atomic

2 participants