-
Notifications
You must be signed in to change notification settings - Fork 17
Race Condition Between GC Chunk Reset and Background Expired ReplReq Cleanup #401
Copy link
Copy link
Open
Description
During GC task processing, a move_to_chunk is selected from m_reserved_chunk_queue and reset via purge_reserved_chunk() to ensure the chunk is completely clean. This operation resets the old append blk allocator and creates a new one to replace it.
At the time of chunk reset, there may exist stale rreqs associated with the move_to_chunk. When these rreqs are cleaned up by the background gc_repl_reqs(), they will free blocks on the chunk, creating two potential race conditions:
Risk 1: Free on New Allocator
- If the expired rreq is cleaned up AFTER the new allocator is created, the free operation targets the new allocator instead of the old one
- Currently no immediate impact because append allocator's
free()only incrementsm_freeable_nblkscounter - Future risk: If real free operations are implemented, this will cause unexpected free on wrong allocator
Risk 2: Free on Destroyed Allocator
- If the expired rreq is cleaned up AFTER the old allocator is reset but BEFORE the new allocator is created, the free operation accesses a destroyed allocator
- This causes crashes due to accessing freed memory (e.g., destroyed superblock). Here is a timeline of observed crash:
T1: cp_flush obtains pointer to old allocator
T2: GC resets allocator (destroys old superblock), set m_is_dirty=false on old alloactor
T3: gc_repl_reqs frees expired rreq → sets m_is_dirty=true on old allocator pointer
T4: cp_flush executes on old allocator → enter m_sb write due to m_is_dirty is true -> accesses destroyed superblock → crash
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels