Convert ink_queue implementation to std::atomic#13170
Conversation
e507f16 to
caf1333
Compare
| #elif defined(__x86_64__) || defined(__ia64__) || defined(__powerpc64__) || defined(__mips64) | ||
|
|
||
| struct head_p_view { | ||
| int vaddr : 48; | ||
| int version : 15; | ||
| int vaddr_mode : 1; | ||
| }; | ||
| #endif |
There was a problem hiding this comment.
I put this here as documentation but didn't use it anywhere. Should I remove it?
| SET_FREELIST_POINTER_VERSION(next, FROM_PTR(nullptr), FREELIST_VERSION(item) + 1); | ||
| result = ink_atomic_cas(&l->head.data, item.data, next.data); | ||
| } while (result == 0); | ||
| { |
There was a problem hiding this comment.
I removed this unnecessary block nesting.
| while (e) { | ||
| head_p *e_ = to_head_p(e, l->offset); | ||
| void *n = TO_PTR(FREELIST_POINTER(*e_)); | ||
| SET_FREELIST_POINTER_VERSION(*e_, n, FREELIST_VERSION(*e_)); | ||
| e = n; |
There was a problem hiding this comment.
If the offset is applied before checking for nullptr, it is possible to pass the tscore tests but fail a bunch of other unit tests (event system and cache tests).
caf1333 to
a106ab4
Compare
There was a problem hiding this comment.
Pull request overview
This PR refactors the ink_queue freelist / atomic list implementation to use std::atomic-based state (including a revised head_p representation) and adds unit tests/benchmarks to validate and measure the behavior. The goal is to eliminate UB from type punning and improve correctness around alignment and concurrency.
Changes:
- Refactor
head_pto an integral type and introducememcpy-based view/load/store helpers for pointer+version packing. - Update freelist/atomiclist operations to use
std::atomic(and add mutex-based mutual exclusion for pop paths). - Add Catch2 unit tests/benchmarks for freelist and atomic list behavior, and update build configuration accordingly.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
include/tscore/ink_queue.h |
Refactors head_p, adds atomic/mutex members to list types, and updates pointer/version access macros. |
src/tscore/ink_queue.cc |
Migrates freelist/atomiclist logic to std::atomic + new packing helpers; adds alignment checks and mutexes. |
src/tscore/unit_tests/test_ink_queue.cc |
New Catch2 unit tests and benchmarks for freelist/atomic list behavior. |
src/tscore/CMakeLists.txt |
Adds the new unit test and links atomic. |
src/proxy/logging/LogObject.cc |
Adapts CAS usage / version typing to the new head_p API. |
| h = FREELIST_POINTER(head); | ||
| ink_assert(item != TO_PTR(h)); | ||
|
|
||
| recovered_item = new (reinterpret_cast<unsigned char *>(item) + l->offset) head_p{}; | ||
| SET_FREELIST_POINTER_VERSION(*recovered_item, FREELIST_POINTER(head), 0); |
* Fix macro definitions for all platforms The definitions still used the `data` member of the old `head_p` struct. * Clean up `head_p_view` definitions * Remove unused `<iostream>` include * Make `freelist_init` alignment check consistent * Test atomiclist with offset * Add @file description to test_ink_queue.cc * Add missing `<vector>` include to unit tests * Construct `InkFreeList` with placement new
* Only link libatomic when using 128bit atomics and lib exists
The PR title is fairly self-explanatory, but the design choices here deserve explicit mention.
This has been done to eliminate type punning, which was done all over the implementation, and is UB. The pointer and version are now always set through the macros
FREELIST_POINTER,FREELIST_VERSION, andSET_FREELIST_POINTER_VERSION, which use a separate {pointer, version} struct type andmemcpyon platforms where this is appropriate (see preprocessor defs for the list).This is necessary so that the head of the list will be an atomic integer type instead of an atomic class type to be sure it can use 128bit atomic hardware instructions on platforms that support them.
head_palignment requirementsThis is actually a bug in master. There is no issue created for it. It happens to work to allocate
head_pobjects that are underaligned on our supported platforms, but it is UB and could very realistically fail... to drive this point home, the included unit tests segfault with the original implementation for this exact reason (they pass if the alignment is adjusted).This one is annoying and might need to be reverted from this PR and considered separately. This particular change was made to fully fix #11640 - there is a minor data race without it in that the second pointer from the list head can be overwritten by an allocator's placement new before it is read without synchronization in
freelist_new(a similar argument applies toatomiclist_push) by another thread, which is going to subsequently find out the list head is stale and retry. Thus, the garbage pointer is not dereferenced, but this is still UB.I have been thinking about approaches here. One approach is to add an atomic flag to the list head that is set by any thread popping from the list. A thread that has successfully popped can spin on that flag to wait for the completion of any other threads still reading the memory it is about to return. My intuition is that this will be better, but I don't know without benchmarking, and it's a lot more complex than the lock.
Fixes #5398
Fixes #11640 in release mode only (dummy_forced_read calls still race)
Previous Work
See #7382. This PR is only a step in the direction of #7382; it retains a lot of the old code structure along with most of its design flaws. If this change is accepted, it should thereafter be possible to apply other design improvements from #7382, such as the fleshed out versioned pointer type, with greater confidence.
A Few Comments About Assertions
This PR adds a hoard of assertions that check alignment requirements. Most of them are debug assertions - the alignment check on the pointer passed to
freelist_pushis a release assert for now, because it would almost certainly indicate a major issue if it triggered. According to the comment from @bryancall, this assertion was in fact failing before (it was previously a debug assert, and he commented it out). I am hopeful that that issue is now resolved.I have commented out the assertions in atomiclist, because the alignment requirements for atomiclist are established elsewhere in ATS - and established incorrectly. The head_p objects are misaligned all over the place. That is bad. Unfortunately, it's not trivial to fix, because the head_p objects have to be aligned at an offset from the base pointer, which also has to be aligned, but for a different object type (e.g.
Event).Performance Implications
I ran the following benchmarks in WSL running on an AMD Ryzen 5 5500U processor.
Freelist - Single Threaded Release Performance - Before
Freelist - Single Threaded Release Performance - After
Notice that the allocation takes a performance hit because of the added lock.
Freelist - Single Threaded Release Performance - After Without Lock
This represents the potential performance of allocation without the mutex guarding the allocation routine. This case represents similar behavior to the original code, which did not have the lock, either.
Multithreaded Performance (Contention)
I don't know the best way to test this. I would appreciate input from @bryancall or anyone who knows how to benchmark overall ATS performance. I'm concerned that the added lock will have unacceptable performance impacts, but I don't know how I should confirm this.