Fix heap-use-after-free in ~AgentCombiner#3291
Merged
chenBright merged 1 commit intoapache:masterfrom May 10, 2026
Merged
Conversation
After PR apache#2949 changed Agent::combiner from a raw pointer to a weak_ptr, MultiDimensionTest.shared started to fail under ASAN with a heap-use-after-free in clear_all_agents() called from ~AgentCombiner. Root cause ---------- std::shared_ptr makes its weak_ptr expire the instant use_count drops to 0, while ~AgentCombiner only starts running after that. The two events therefore form a race window between two threads: * thread A: last shared_ptr<AgentCombiner> released use_count -> 0 -> weak_ptr expired ~AgentCombiner -> clear_all_agents() takes _lock and starts walking _agents * thread B: thread exits, _destroy_tls_blocks() deletes ThreadBlock, each ~Agent calls combiner.lock() which now returns null (weak_ptr already expired), so commit_and_erase() is skipped and the Agent is freed while still linked into A's _agents list * thread A: dereferences a freed LinkNode in clear_all_agents() -> heap-use-after-free Fix --- Don't traverse _agents from ~AgentCombiner. Letting _agents go out of scope is safe because: * butil::LinkedList and butil::LinkNode have trivial destructors and never traverse on destruction, so tearing down _agents does not dereference any agent node. * Surviving Agents simply observe combiner.expired() == true in ~Agent and skip commit_and_erase, leaving prev_/next_ dangling but never read. * If the freed agent_id is later reused by a new combiner and the same TLS slot is taken, get_or_create_tls_agent calls Agent::reset and Append's it into the new combiner's _agents. LinkNode::InsertBefore only writes prev_/next_ (it never reads their stale values), so the dangling pointers are safely overwritten. * Agent::element is destroyed together with its ThreadBlock, so any non-POD resource it holds is still released; if the agent slot is reused, Agent::reset rewrites the element value before it is observed again. The body of clear_all_agents() is kept (commented out) for reference, together with a NOTE explaining why it must not be re-enabled from ~AgentCombiner.
Contributor
|
Will this cause memory leak? |
Contributor
Author
There is no memory leak. Agent is destroyed together with its ThreadBlock at thread exit. |
Contributor
|
LGTM |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
Issue Number: regression introduced by #2949
Problem Summary:
After PR #2949 ("Fix thread safety of AgentCombiner") changed
Agent::combinerfrom a raw pointer to aweak_ptr, theMultiDimensionTest.sharedUT reproducibly fails underAddressSanitizer with a heap-use-after-free in
clear_all_agents()called from
~AgentCombiner:Root cause: std::shared_ptr makes its weak_ptr expire the instant use_count drops to 0,
while ~AgentCombiner only starts running after that. The two events therefore form a race window between two threads:
What is changed and the side effects?
Changed:
Don't traverse _agents from ~AgentCombiner. Letting _agents go out of scope is safe because:
so tearing down _agents does not dereference any agent node.
leaving prev_/next_ dangling but never read.
get_or_create_tls_agent calls Agent::reset and Append's it into the new combiner's agents.
LinkNode::InsertBefore only writes prev/next_ (it never reads their stale values), so the
dangling pointers are safely overwritten.
is still released; if the agent slot is reused, Agent::reset rewrites the element value before it is
observed again.
The body of clear_all_agents() is kept (commented out) for reference, together with a NOTE explaining
why it must not be re-enabled from ~AgentCombiner.
Side effects:
Performance effects:
Breaking backward compatibility:
Check List: