Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests: add more debugging, and a bugfix for dequeue #6

Open
wants to merge 10 commits into
base: HP
Choose a base branch
from

Conversation

pcordes
Copy link
Contributor

@pcordes pcordes commented Sep 6, 2018

This doesn't fix everything: there's still a use-after-free bug in here somewhere, even with mfence instead of lfence or sfence in case even stronger barriers made a difference at the locations you had them.

continue goes to the end of the loop body, not the top. So if(we lose the race) continue; attempts a CAS with uninitialized (or NULL) pn. That's a bug.

The rest of this is some cleanups and improvements to the Makefile and tests.

By setting p->next to (void*)-1 right before free()ing, we can detect use-after free: we'll never see that pointer value normally (because it's not aligned, and other reasons), so if we ever read that from memory we know we shouldn't have read that memory, and it could have faulted.

I didn't commit all these changes perfectly the first time, so a couple later ones fix mistakes in earlier ones. But after the final one, it compiles with minimal warnings, and the tests run, only sometimes hitting the assertion failure. (Revealing a bug that already existed, but wasn't being detected before.)

make test  will now rebuild the library and/or test_multithread.c if needed

Remove the @ so we can see which output goes with which test or build-command
…add XCHG

An actual lfence/sfence instruction isn't needed for acquire semantics,
just a barrier against compile-time reordering.

cygwin and mingw already imply __GNUC__.
… until end of test.

Only check consumer exit condition when dequeue fails,
so we don't need ctx->count to be updated.

This allows us to drop that point of contention between readers and writers.

Using a local counter and only doing an atomic add to the shared variable
at the end also reduces contention between threads in the test harness itself.
reduces contention between readers and writers, which both have to CAS
or XCHG their respective pointers.
This lets us detect how many nodes we never freed and left on the free-list

With some later changes, the first 3 tests (p1c1, p4c4, p100c10) all show freelist=1
while p10c100 shows Total push 5000000 elements, pop 5000000 elements. freelist=1987802, clean = 0
All the contention between consumers is a problem.

---

But with this version with lots of contention:
p1c1:
Total push 500000 elements, pop 500000 elements. freelist=1, clean = 0
p4c4:
Total push 2000000 elements, pop 2000000 elements. freelist=753408, clean = 0
bin/test_p100c10
Total push 50000000 elements, pop 50000000 elements. freelist=3536085, clean = 0
bin/test_p10c100
Total push 5000000 elements, pop 5000000 elements. freelist=1665580, clean = 0
compile with make CPPFLAGS=-DNDEBUG to disable assert()

make TESTWRAPPER="perf stat -d" test  to profile the tests
so they can inline even with -fPIC.

TODO: weak aliases for internal calls between lfq_ functions.
This triggers quickly in tests other than p1c1; the
complicated HP[tid] scheme apparently isn't safe.

Not surprising; lock-free linked lists are *HARD* without garbage collection.
http://www.yebangyu.org/LockFreeProgrammingPractice.pdf collects multiple papers,
including Lock-Free Linked Lists Using CAS

Reclaim in general for atomic dynamic data structures is hard.

Moving nodes to a free-list without trying to free() them would probably be easier,
and work reasonably as long as producers actually allocate nodes from the free list.
(another branch used a union for these, hiding this bug.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant