Skip to content

storage: merge reclaimed tail pages with deserialized FSM state#401

Merged
adsharma merged 1 commit into
mainfrom
fsm_merge_fix
Apr 21, 2026
Merged

storage: merge reclaimed tail pages with deserialized FSM state#401
adsharma merged 1 commit into
mainfrom
fsm_merge_fix

Conversation

@adsharma

Copy link
Copy Markdown
Contributor

Fixes: #400

Summary

Fix recovery-time free-space manager (FSM) inconsistency where tail reclamation after checkpoint deserialization
could leave overlapping free entries, potentially enabling double allocation of the same page after reopen.

Problem

During startup, we:

  1. deserialize FSM entries from checkpoint metadata, then
  2. reclaim post-checkpoint tail pages via PageManager::reclaimTailPagesIfNeeded().

Previously, tail reclaim used evictAndAddFreePages(), which inserts directly into freeLists and does not merge
overlaps with existing deserialized entries (it only avoids exact duplicate (start,numPages) tuples).
This could produce overlapping free ranges in reopened state.

Root cause

reclaimTailPagesIfNeeded() bypassed the merge/coalescing path used elsewhere by FSM.

Fix

1) Reclaim tail through merge path

In src/storage/page_manager.cpp:

  • replaced:
    • freeSpaceManager->evictAndAddFreePages(fileHandle, tail);
  • with:
    • freeSpaceManager->addUncheckpointedFreePages(tail);
    • freeSpaceManager->mergeFreePages(fileHandle);

This guarantees tail reclaim is merged with deserialized FSM state (overlap + adjacency coalescing).

2) Explicitly clear moved uncheckpointed list

In src/storage/free_space_manager.cpp:

  • after mergePageRanges(std::move(uncheckpointedFreePageRanges), ...), explicitly call:
    • uncheckpointedFreePageRanges.clear();

This mirrors finalizeCheckpoint() behavior and avoids relying on moved-from container state.

Test

Added regression test in test/transaction/checkpoint_test.cpp:

  • ReviewFixesTest.ReclaimTailMergesWithDeserializedFSMEntries

Test flow:

  • create checkpointed DB state,
  • inject a synthetic free range beyond EOF,
  • checkpoint again,
  • append raw pages to force reclaim-tail-on-reopen,
  • reopen and query CALL FSM_INFO(),
  • assert recovered free ranges are non-overlapping.

Behavior:

  • pre-fix: overlapping entries observed.
  • post-fix: no overlaps.

Validation

  • make test-build-release
  • build/release/test/transaction/transaction_test
    --gtest_filter="ReviewFixesTest.ReclaimTailMergesWithDeserializedFSMEntries"
  • sanity: ... --gtest_filter="ReviewFixesTest.HashIndexBasicRecoveryAfterCheckpoint"

All passed.

Problem
- On startup we deserialize the checkpointed FreeSpaceManager (FSM), then reclaim any
  post-checkpoint file tail via PageManager::reclaimTailPagesIfNeeded().
- The previous reclaim path inserted the tail directly into freeLists using
  evictAndAddFreePages().
- Direct insertion only deduplicates exact (start,numPages) pairs and does not merge overlaps.
  If a deserialized free range already overlapped the reclaimed tail, reopened FSM state could
  contain overlapping entries.
- Overlapping free entries allow the same physical page to be handed out twice, which can corrupt
  page-type expectations and crash lookup paths after reopen.

Root cause
- reclaimTailPagesIfNeeded() bypassed the FSM merge path and therefore skipped overlap/adjacency
  coalescing against deserialized state.

Fix
1) Reclaim tail through the merge path
- In PageManager::reclaimTailPagesIfNeeded(), replace:
    freeSpaceManager->evictAndAddFreePages(fileHandle, tail)
  with:
    freeSpaceManager->addUncheckpointedFreePages(tail)
    freeSpaceManager->mergeFreePages(fileHandle)
- This reuses the same coalescing logic as checkpoint-finalization/rollback merge and guarantees
  reclaimed tail pages are merged with existing FSM entries.

2) Clear moved uncheckpointed ranges after merge
- In FreeSpaceManager::mergeFreePages(), explicitly clear
  uncheckpointedFreePageRanges after mergePageRanges(std::move(...)).
- This mirrors finalizeCheckpoint() behavior and avoids relying on moved-from vector state.

Test
- Added ReviewFixesTest.ReclaimTailMergesWithDeserializedFSMEntries in
  test/transaction/checkpoint_test.cpp.
- The test deterministically seeds an existing free entry at checkpointNumPages and then invokes
  reclaimTailPagesIfNeeded(checkpointNumPages).
- Assertion: resulting FSM contains no overlapping [start, end) intervals.
- Behavior:
  * old code (direct evictAndAddFreePages): test fails with overlap
  * fixed code (addUncheckpointedFreePages + mergeFreePages): test passes

Notes
- This is a recovery-time FSM consistency fix only; allocation semantics outside recovery are
  unchanged.
- The explicit clear in mergeFreePages() is a safety/maintainability improvement.
@adsharma adsharma merged commit 9083131 into main Apr 21, 2026
5 checks passed
@adsharma adsharma deleted the fsm_merge_fix branch April 21, 2026 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: free space manager not properly merge with FSM state, causing mem pollution and triggering segment fault

1 participant