Convert ink_queue implementation to std::atomic by JosiahWI · Pull Request #13170 · apache/trafficserver

JosiahWI · 2026-05-18T16:50:17Z

The PR title is fairly self-explanatory, but the design choices here deserve explicit mention.

The head_p type is no longer a union with a {pointer, version} field

This has been done to eliminate type punning, which was done all over the implementation, and is UB. The pointer and version are now always set through the macros FREELIST_POINTER, FREELIST_VERSION, and SET_FREELIST_POINTER_VERSION, which use a separate {pointer, version} struct type and memcpy on platforms where this is appropriate (see preprocessor defs for the list).

The head_p type has been changed into a type alias of the data type

This is necessary so that the head of the list will be an atomic integer type instead of an atomic class type to be sure it can use 128bit atomic hardware instructions on platforms that support them.

Freelist alignment is now adjusted to be satisfy head_p alignment requirements

This is actually a bug in master. There is no issue created for it. It happens to work to allocate head_p objects that are underaligned on our supported platforms, but it is UB and could very realistically fail... to drive this point home, the included unit tests segfault with the original implementation for this exact reason (they pass if the alignment is adjusted).

The freelist and atomiclist pop operations (called freelist_new for the freelist) are now locked to provide mutual exclusion

This one is annoying and might need to be reverted from this PR and considered separately. This particular change was made to fully fix #11640 - there is a minor data race without it in that the second pointer from the list head can be overwritten by an allocator's placement new before it is read without synchronization in freelist_new (a similar argument applies to atomiclist_push) by another thread, which is going to subsequently find out the list head is stale and retry. Thus, the garbage pointer is not dereferenced, but this is still UB.

I have been thinking about approaches here. One approach is to add an atomic flag to the list head that is set by any thread popping from the list. A thread that has successfully popped can spin on that flag to wait for the completion of any other threads still reading the memory it is about to return. My intuition is that this will be better, but I don't know without benchmarking, and it's a lot more complex than the lock.

Fixes #5398
Fixes #11640 in release mode only (dummy_forced_read calls still race)

Previous Work

See #7382. This PR is only a step in the direction of #7382; it retains a lot of the old code structure along with most of its design flaws. If this change is accepted, it should thereafter be possible to apply other design improvements from #7382, such as the fleshed out versioned pointer type, with greater confidence.

A Few Comments About Assertions

This PR adds a hoard of assertions that check alignment requirements. Most of them are debug assertions - the alignment check on the pointer passed to freelist_push is a release assert for now, because it would almost certainly indicate a major issue if it triggered. According to the comment from @bryancall, this assertion was in fact failing before (it was previously a debug assert, and he commented it out). I am hopeful that that issue is now resolved.

I have commented out the assertions in atomiclist, because the alignment requirements for atomiclist are established elsewhere in ATS - and established incorrectly. The head_p objects are misaligned all over the place. That is bad. Unfortunately, it's not trivial to fix, because the head_p objects have to be aligned at an offset from the base pointer, which also has to be aligned, but for a different object type (e.g. Event).

Performance Implications

I ran the following benchmarks in WSL running on an AMD Ryzen 5 5500U processor.

Freelist - Single Threaded Release Performance - Before

benchmark name                       samples       iterations    est run time
                                     mean          low mean      high mean
                                     std dev       low std dev   high std dev
-------------------------------------------------------------------------------
Single threaded alloc                          100           587      2.348 ms
                                        36.2374 ns    35.5824 ns    37.6799 ns
                                        4.69932 ns    2.48233 ns    9.03906 ns

Single threaded free                           100          1412     2.2592 ms
                                        15.2373 ns    14.9109 ns    16.1355 ns
                                        2.52519 ns    1.00718 ns    5.32402 ns

Freelist - Single Threaded Release Performance - After

Notice that the allocation takes a performance hit because of the added lock.

benchmark name                       samples       iterations    est run time
                                     mean          low mean      high mean
                                     std dev       low std dev   high std dev
-------------------------------------------------------------------------------
Single threaded alloc                          100           600       2.34 ms
                                        40.3439 ns    38.0766 ns    46.6134 ns
                                        17.5734 ns    7.42366 ns    35.7456 ns

Single threaded free                           100           346     2.3528 ms
                                        12.0705 ns      11.37 ns    15.4749 ns
                                        6.78033 ns   0.198969 ns    16.1671 ns

Freelist - Single Threaded Release Performance - After Without Lock

This represents the potential performance of allocation without the mutex guarding the allocation routine. This case represents similar behavior to the original code, which did not have the lock, either.

benchmark name                       samples       iterations    est run time
                                     mean          low mean      high mean
                                     std dev       low std dev   high std dev
-------------------------------------------------------------------------------
Single threaded alloc                          100           672      2.352 ms
                                        30.9758 ns    29.5555 ns    36.3515 ns
                                        12.3705 ns    2.25337 ns    28.5114 ns

Single threaded free                           100           489     2.3472 ms
                                        13.0952 ns    11.6199 ns    16.8676 ns
                                        10.5817 ns     2.0172 ns     19.303 ns

Multithreaded Performance (Contention)

I don't know the best way to test this. I would appreciate input from @bryancall or anyone who knows how to benchmark overall ATS performance. I'm concerned that the added lock will have unacceptable performance impacts, but I don't know how I should confirm this.

JosiahWI · 2026-05-18T16:55:04Z

+#elif defined(__x86_64__) || defined(__ia64__) || defined(__powerpc64__) || defined(__mips64)
+
+struct head_p_view {
+  int vaddr      : 48;
+  int version    : 15;
+  int vaddr_mode : 1;
+};
+#endif


I put this here as documentation but didn't use it anywhere. Should I remove it?

JosiahWI · 2026-05-18T16:57:39Z

    SET_FREELIST_POINTER_VERSION(next, FROM_PTR(nullptr), FREELIST_VERSION(item) + 1);
-    result = ink_atomic_cas(&l->head.data, item.data, next.data);
-  } while (result == 0);
-  {


I removed this unnecessary block nesting.

JosiahWI · 2026-05-18T17:00:02Z

+  while (e) {
+    head_p *e_ = to_head_p(e, l->offset);
+    void   *n  = TO_PTR(FREELIST_POINTER(*e_));
+    SET_FREELIST_POINTER_VERSION(*e_, n, FREELIST_VERSION(*e_));
+    e = n;


If the offset is applied before checking for nullptr, it is possible to pass the tscore tests but fail a bunch of other unit tests (event system and cache tests).

Copilot

Pull request overview

This PR refactors the ink_queue freelist / atomic list implementation to use std::atomic-based state (including a revised head_p representation) and adds unit tests/benchmarks to validate and measure the behavior. The goal is to eliminate UB from type punning and improve correctness around alignment and concurrency.

Changes:

Refactor head_p to an integral type and introduce memcpy-based view/load/store helpers for pointer+version packing.
Update freelist/atomiclist operations to use std::atomic (and add mutex-based mutual exclusion for pop paths).
Add Catch2 unit tests/benchmarks for freelist and atomic list behavior, and update build configuration accordingly.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
`include/tscore/ink_queue.h`	Refactors `head_p`, adds atomic/mutex members to list types, and updates pointer/version access macros.
`src/tscore/ink_queue.cc`	Migrates freelist/atomiclist logic to `std::atomic` + new packing helpers; adds alignment checks and mutexes.
`src/tscore/unit_tests/test_ink_queue.cc`	New Catch2 unit tests and benchmarks for freelist/atomic list behavior.
`src/tscore/CMakeLists.txt`	Adds the new unit test and links `atomic`.
`src/proxy/logging/LogObject.cc`	Adapts CAS usage / version typing to the new `head_p` API.

+    h = FREELIST_POINTER(head);
    ink_assert(item != TO_PTR(h));
+
+    recovered_item = new (reinterpret_cast<unsigned char *>(item) + l->offset) head_p{};
+    SET_FREELIST_POINTER_VERSION(*recovered_item, FREELIST_POINTER(head), 0);


* Fix macro definitions for all platforms The definitions still used the `data` member of the old `head_p` struct. * Clean up `head_p_view` definitions * Remove unused `<iostream>` include * Make `freelist_init` alignment check consistent * Test atomiclist with offset * Add @file description to test_ink_queue.cc * Add missing `<vector>` include to unit tests * Construct `InkFreeList` with placement new

* Only link libatomic when using 128bit atomics and lib exists

JosiahWI self-assigned this May 18, 2026

JosiahWI added Core Tests Cleanup labels May 18, 2026

JosiahWI added this to the 11.0.0 milestone May 18, 2026

JosiahWI force-pushed the refactor/std-atomic-ink-queue branch from e507f16 to caf1333 Compare May 18, 2026 16:54

JosiahWI commented May 18, 2026

View reviewed changes

Convert ink_queue implementation to std::atomic

a106ab4

JosiahWI commented May 18, 2026

View reviewed changes

JosiahWI force-pushed the refactor/std-atomic-ink-queue branch from caf1333 to a106ab4 Compare May 18, 2026 17:03

bryancall requested a review from Copilot May 18, 2026 22:01

Copilot started reviewing on behalf of bryancall May 18, 2026 22:01 View session

bryancall requested a review from cmcfarlen May 18, 2026 22:01

Copilot AI reviewed May 18, 2026

View reviewed changes

JosiahWI added 9 commits May 19, 2026 06:45

Make changes suggested by Copilot

5eb6d1d

* Only link libatomic when using 128bit atomics and lib exists

Fix race condition with assertion

a508b04

Link libatomic on all platforms but Apple

6f285b3

Fix incorrect check for non-Apple platforms

bdd3c5d

Implement complete cross-platform libatomic check

5d6ed37

Disable std::atomic<__int128> on lacking platforms

7254197

Do not link libatomic when not available

be6996e

Base TS_HAS_128BIT_CAS on __atomic hardware

020b82c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert ink_queue implementation to std::atomic#13170

Convert ink_queue implementation to std::atomic#13170
JosiahWI wants to merge 10 commits into
apache:masterfrom
JosiahWI:refactor/std-atomic-ink-queue

JosiahWI commented May 18, 2026 •

edited

Loading

Uh oh!

JosiahWI May 18, 2026

Uh oh!

JosiahWI May 18, 2026

Uh oh!

JosiahWI May 18, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JosiahWI commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Previous Work

A Few Comments About Assertions

Performance Implications

Freelist - Single Threaded Release Performance - Before

Freelist - Single Threaded Release Performance - After

Freelist - Single Threaded Release Performance - After Without Lock

Multithreaded Performance (Contention)

Uh oh!

JosiahWI May 18, 2026

Choose a reason for hiding this comment

Uh oh!

JosiahWI May 18, 2026

Choose a reason for hiding this comment

Uh oh!

JosiahWI May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JosiahWI commented May 18, 2026 •

edited

Loading

JosiahWI May 18, 2026 •

edited

Loading