Skip to content

[RESILIENCE] Data integrity scrubber — background verification of SSTable checksums #225

@ElioNeto

Description

@ElioNeto

Description

Bit rot, silent data corruption, and media errors can corrupt SSTable files. ApexStore needs a background scrubber that periodically reads and verifies all SSTable data against stored checksums.

Implementation

  1. Background thread with configurable schedule (default: daily)
  2. For each SSTable:
    • Read all blocks
    • Verify CRC32 checksums
    • Verify bloom filter integrity
    • Verify key ordering
  3. On corruption detected:
    • Log the affected keys/ranges
    • Attempt repair from WAL archive or replica
    • If repair impossible, isolate the corrupted file
  4. Scrub progress and findings exposed in /metrics

Configuration

resilience:
  scrubber:
    interval_secs: 86400  # daily
    repair_from_wal: true
    report_only: false  # if true, don't auto-repair

Labels

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions