Skip to content

Blob read fails due to stale blob_id→location index when GC runs before log replay #388

@Besroy

Description

@Besroy

SH test found a read_verify failed with "Invalid shard_id".
The reason is that the blob_id→location index was stale: during crash recovery, GC moved the blob’s data to a new chunk before log replay completed, and the subsequent replay updated the index back to the old location. Reads then targeted the wrong block, finding a header with a different shard_id.

For example:

  • T1: shard=1, blob=100 committed at LSN 100.
  • T2: SM crashed with cp_lsn=90 and dc_lsn=110.
  • T3: On restart, GC manager started and run before replication svc start completed.
  • T4: GC moved the blob’s chunk (location changed, e.g., from chunk=40 to chunk=95).
  • T5: Log replay applied LSN 100 and reset the blob’s index to the old location (chunk=40), resulting in reads targeting stale blocks that no longer contain the correct shard’s data.

Relevant Logs:

  • Blob get failure (invalid shard_id):
[02/05/26 23:51:59.708] [storage_mgr] [debug] [402416] [hs_blob_manager.cpp:322:_get_blob] [traceID=8132044346085171088,shardID=0x48,pg=0,shard=0x48,blob=3099880] Blob Get request: pg=0, group=b9af911a-6c1b-4ff4-94fa-c10aa39f0d0e, shard=0x48, blob=3099880, offset=0, len=0
[02/05/26 23:51:59.708] [storage_mgr] [debug] [402416] [hs_blob_manager.cpp:360:_get_blob_data] [traceID=8132044346085171088,shardID=0x48,pg=0,shard=0x48,blob=3099880] Reading from blkid=[{blk#=287643 count=2 chunk=40},] to buf=0x582e379e2000
[02/05/26 23:51:59.718] [storage_mgr] [error] [18] [hs_blob_manager.cpp:708:do_verify_blob] Invalid shard_id in header: [header=magic=0x21fdffdba8d68fc6 version=2 shard=0x57 blob_size=4096 user_size=32 algo=1 hash=65 2a 49 ae ..., user_key=storage_hammer:nuobject2-sh-1007]
  • Blob put and commit to chunk 40:
[02/02/26 09:18:47.222] [storage_mgr] [debug] [98] [hs_blob_manager.cpp:118:_put_blob] [traceID=15106702424960586969,shardID=0x48,pg=0,shard=0x48,blob=3099880] Blob Put request: pg=0, group=b9af911a-6c1b-4ff4-94fa-c10aa39f0d0e, shard=0x48, length=4096
 [02/02/26 09:18:47.222] [storage_mgr] [trace] [98] [hs_blob_manager.cpp:193:_put_blob] ... header=... shard=0x48 ...
 [02/02/26 09:18:47.222] [storage_mgr] [debug] [98] [hs_blob_manager.cpp:497:blob_put_get_blk_alloc_hints] ... Picked p_chunk_id=40 ...
 [02/02/26 09:18:49.269] [storage_mgr] [trace] [79] [hs_blob_manager.cpp:257:on_blob_put_commit] blob put commit lsn=4920918, pbas=[{blk#=287643 count=2 chunk=40},]
 [02/02/26 09:18:49.269] [storage_mgr] [debug] [79] ... Put blob success blkid=[{blk#=287643 count=2 chunk=40},]
  • Checkpoint flush during put:
    [02/02/26 09:18:48.923] [storage_mgr] [debug] [21] [raft_repl_dev.cpp:2456:cp_flush] ... cp flush ... lsn=4920910, clsn=3910000 ...

  • Shard sealed:

[02/02/26 09:18:56.311] [storage_mgr] [debug] [99] [hs_shard_manager.cpp:256:_seal_shard] ... Seal shard request: is_open=true
[02/02/26 09:18:56.313] [storage_mgr] [debug] [79] ... Shard sealed success, is_open=false
  • Restart order: GC starts before log replay completes:
[02/02/26 09:21:44.725] [storage_mgr] [info] [10] [raft_repl_dev.cpp:88:RaftReplDev] ... commited_lsn=4923664, checkpoint_lsn:4920910 ...
[02/02/26 09:21:44.728] [storage_mgr] [info] [10] [hs_homeobject.cpp:358:operator()] Starting GC manager
[02/02/26 09:21:44.728] [storage_mgr] [info] [10] [raft_repl_service.cpp:171:start] Starting LogStore service, fist_boot = false
  • GC moves data from chunk 40 to 95:
[02/02/26 09:22:44.728] [storage_mgr] [debug] [69] [gc_manager.cpp:1206:process_gc_task] ... start process gc task for move_from_chunk=40 ...
 [02/02/26 09:24:50.015] [storage_mgr] [info] [69] [gc_manager.cpp:1307:process_gc_task] ... task for move_from_chunk=40 to move_to_chunk=95 ... completed!
  • Log replay re-applies old location (chunk=40):
  [02/02/26 09:28:04.506] [storage_mgr] [trace] [10] [raft_repl_dev.cpp:2583:on_log_found] ... Applying ... lsn=4920918 ...
  [02/02/26 09:28:04.506] [storage_mgr] [debug] [10] [raft_repl_dev.cpp:2638:on_log_found] Replay log on restart, ... local_blkid=[[{blk#=287643 count=2 chunk=40},] ]
  [02/02/26 09:28:04.506] [storage_mgr] [debug] [10] [raft_repl_dev.cpp:1631:handle_commit] ... Commit ... local_blkid=[[{blk#=287643 count=2 chunk=40},] ]
  [02/02/26 09:28:04.506] [storage_mgr] [trace] [10] [hs_blob_manager.cpp:222:local_add_blob_info] ... blob put commit ... pbas=[{blk#=287643 count=2 chunk=40},]
...
  [02/02/26 09:28:04.834] [storage_mgr] [info] [10] [raft_repl_service.cpp:173:start] Started LogStore service, log replay should already done till this point

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions