fix(concurrency): fix ThreadPool shutdown race causing TSan hang#316
Merged
Conversation
The root cause: shutdown() stored to shutdown_requested_ outside the queue_mutex_ lock, then called notify_all(). Under TSan's timing overhead a worker thread could finish work, re-acquire the lock, evaluate the condition predicate while shutdown_requested_ was still false (store not yet observed), go back to sleep, then miss the notify_all() — causing an indefinite hang. Fix: acquire queue_mutex_ before storing shutdown_requested_ = true so that the store and the condition variable notification are serialised under the same lock. Workers evaluating the predicate must hold the same lock, guaranteeing they either see the flag before sleeping or are woken by notify_all() — the classic condition-variable happens-before requirement. Additional changes: - Re-enable ThreadPoolTest.NoWorkAfterShutdown (was DISABLED_) - Fix spdlog linkage: change PRIVATE -> PUBLIC on keystone_concurrency so that public headers including spdlog/* propagate include paths to consumers (fixes concurrency_unit_tests compilation) - Add tsan Conan profile and CMake tsan preset (cmake --preset tsan) - Add deps-tsan, build-tsan, test-tsan justfile recipes; deps-tsan strips the Conan-generated CMakePresets.json from the tsan output folder to avoid the duplicate conan-debug preset collision Closes #145 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
CI Summary
|
Security Scan Results
Recommendations
Workflow: Security Scanning |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ThreadPool::shutdown()whereshutdown_requested_was stored outsidequeue_mutex_, allowing workers to miss thenotify_all()and hang indefinitely under TSanThreadPoolTest.NoWorkAfterShutdown(wasDISABLED_NoWorkAfterShutdown)spdloglinkage visibility (PRIVATE→PUBLIC) so public headers propagate include paths to consumerstsanConan profile and CMake preset (cmake --preset tsan,just test-tsan)Root Cause
shutdown()calledshutdown_requested_.store(true)outsidequeue_mutex_, thencondition_.notify_all(). Under TSan's timing overhead a worker could:shutdown_requested_ == false(store not yet observed)notify_all()that already firedThe fix acquires
queue_mutex_before setting the flag. The C++ condition variable contract requires the predicate-modifying store to be serialised under the same mutex that the waiting thread holds during predicate evaluation — this is the standard missed-wakeup prevention pattern.Test plan
ThreadPoolTest.NoWorkAfterShutdownre-enabled and passes in debug buildThreadPoolTest.*tests passconcurrency_unit_testsbuilds without errors (spdlog linkage fix)just test-tsan(blocked by upstream concurrentqueue incompatibility with-fsanitize=thread— pre-existing issue, not introduced here; ThreadPool itself has no concurrentqueue dependency)Closes #145
🤖 Generated with Claude Code