Skip to content

rawdb: recover witness size from filesystem metadata miss#2200

Merged
pratikspatil024 merged 1 commit into
0xPolygon:developfrom
AaronAjose:witness-size-fs-metadata-fallback
May 12, 2026
Merged

rawdb: recover witness size from filesystem metadata miss#2200
pratikspatil024 merged 1 commit into
0xPolygon:developfrom
AaronAjose:witness-size-fs-metadata-fallback

Conversation

@AaronAjose
Copy link
Copy Markdown
Contributor

Description

This PR fixes a data-consistency bug in filesystem-backed witness storage.

fsWitnessStore.WriteWitness writes witness blobs to disk first (tmp + rename) and writes witnessSizeKey to DB afterward. If the file write succeeds but DB Put fails, the process exits via log.Crit. After restart, the file may exist while DB size metadata is missing.

Before this change, ReadWitnessSize only read DB metadata and returned nil when missing. In WIT pagination/metadata paths, that becomes 0/unavailable, which can silently fail witness page serving for data that is actually present on disk.

This PR adds a fallback in ReadWitnessSize: when DB size metadata is missing and witness storage is filesystem-backed, size is recovered from os.Stat on the witness file. A unit test was added to validate this crash-residue scenario (file exists + size key missing).

Changes

  • Bugfix (non-breaking change that solves an issue)
  • Hotfix (change that solves an urgent issue, and requires immediate attention)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (change that is not backwards-compatible and/or changes current functionality)
  • Changes only for a subset of nodes

Breaking changes

None.

Nodes audience

This primarily affects nodes using filesystem witness storage (--witness.filestore) and only on recovery paths where DB size metadata is missing while witness files exist on disk.

No new flags or config behavior changes were introduced.

Checklist

  • I have added at least 2 reviewer or the whole pos-v1 team
  • I have added sufficient documentation in code
  • I will be resolving comments - if any - by pushing each fix in a separate commit and linking the commit hash in the comment reply
  • Created a task in Jira and informed the team for implementation in Erigon client (if applicable)
  • Includes RPC methods changes, and the Notion documentation has been updated

Cross repository changes

  • This PR requires changes to heimdall
    Heimdall PR link: N/A
  • This PR requires changes to matic-cli
    matic-cli PR link: N/A

Testing

  • I have added unit tests
  • I have added tests to CI
  • I have tested this code manually on local environment
  • I have tested this code manually on remote devnet using express-cli
  • I have tested this code manually on amoy
  • I have created new e2e tests into express-cli

Manual tests

Not executed in this cycle.
Note: full Go test/build validation is currently blocked unless Go >=1.26.2 toolchain is available (project go.mod requirement).

Additional comments

Added unit test:

  • core/rawdb/database_witness_test.go
  • TestOpen_FSWitnessStore_ReadWitnessSizeFallbackToFile

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@adamdossa adamdossa requested a review from pratikspatil024 May 5, 2026 15:25
@pratikspatil024
Copy link
Copy Markdown
Member

@claude - review once

@claude
Copy link
Copy Markdown

claude Bot commented May 6, 2026

Claude finished @pratikspatil024's task in 1m 48s —— View job


Code Review: rawdb: recover witness size from filesystem metadata miss

  • Read changed files and understand the diff
  • Review core/rawdb/accessors_state.go changes
  • Review core/rawdb/witness_store_fs.go changes
  • Review core/rawdb/database_witness_test.go changes
  • Post review findings

Overall: This is a well-scoped bugfix that addresses a real crash-residue scenario (file write succeeds → DB put fails → log.Crit exits → restart finds file but no size metadata). The approach — falling back to os.Stat when DB metadata is missing — is sound. A few observations below.


Suggestion: Consider repairing the DB metadata on recovery

core/rawdb/accessors_state.go:319-327 — When the fallback successfully recovers the size from the filesystem, it returns the value but does not write it back to the DB. This means every subsequent ReadWitnessSize call for this hash will re-hit os.Stat instead of the fast DB path. This matters because ReadWitnessSize is called in the P2P WIT handler (eth/handler_wit.go:126, :195) on peer requests — repeated filesystem stat calls on a hot path could add latency.

Since readWitnessSizeFromFile already has access to s.db (which is ethdb.Database, a writer), the repair could be done inside the fallback method itself:

func (s *fsWitnessStore) readWitnessSizeFromFile(hash common.Hash) *uint64 {
    info, err := os.Stat(witnessFilePath(s.dir, hash))
    if err != nil || !info.Mode().IsRegular() {
        return nil
    }
    size := uint64(info.Size())

    // Repair missing DB metadata so future reads hit the fast path.
    var buf [8]byte
    binary.BigEndian.PutUint64(buf[:], size)
    if err := s.db.Put(witnessSizeKey(hash), buf[:]); err != nil {
        log.Warn("Failed to repair witness size metadata", "hash", hash, "err", err)
    }

    return &size
}

This is a self-healing pattern — the first recovery call fixes the inconsistency, and all subsequent calls use the DB. Not a blocker, but recommended. Fix this →


Minor: Concrete type assertion couples accessor to implementation

core/rawdb/accessors_state.go:322 — The nested assertion wsdb.WitnessStore().(*fsWitnessStore) couples the generic ReadWitnessSize function to the concrete fsWitnessStore type. This is acceptable given the current codebase (only two store implementations), but if more WitnessStore backends are added, each would need its own branch here. An alternative would be an optional interface:

type witnessSizeRecoverer interface {
    readWitnessSizeFromFile(hash common.Hash) *uint64
}

Not a blocker — the current approach is pragmatic and correct.


Test review

core/rawdb/database_witness_test.go:78-108 — The test is well-structured:

  • Creates a real fsWitnessStore via Open
  • Writes through the store (creating both file and DB metadata)
  • Deletes only the DB size key to simulate crash residue
  • Verifies the fallback returns the correct file size

One minor gap: the test does not verify that the original (non-fallback) path also works — i.e., that ReadWitnessSize returns the correct value before the delete. Adding a check before line 97 would make the test more robust:

// Verify normal path works first.
size := ReadWitnessSize(db, hash)
if size == nil || *size != uint64(len(payload)) {
    t.Fatal("expected witness size from DB metadata")
}

Security notes

  • No new external attack surface: The os.Stat fallback only triggers when the DB has no size key. An external peer cannot force this path — it only applies to local crash residue.
  • No unbounded allocation: os.Stat returns metadata, not file contents.
  • TOCTOU: There's a theoretical race between os.Stat and a concurrent DeleteWitness, but since this is a recovery path for rare crash residue, it's not a practical concern.

Verdict

Approve with suggestion — the fix is correct and well-tested. The self-repair suggestion (writing recovered size back to DB) would make it more robust in production by avoiding repeated os.Stat calls on the P2P serving path.

Copy link
Copy Markdown
Member

@pratikspatil024 pratikspatil024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pratikspatil024
Copy link
Copy Markdown
Member

@AaronAjose - is it possible for you to sign this commit? You can rebase locally to rewrite history with signed commits and force-push.
If not, I can do it from my end.

@AaronAjose AaronAjose force-pushed the witness-size-fs-metadata-fallback branch from c827015 to 628eec1 Compare May 8, 2026 02:14
@AaronAjose AaronAjose force-pushed the witness-size-fs-metadata-fallback branch from 628eec1 to 2c39eb5 Compare May 8, 2026 02:44
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented May 8, 2026

@AaronAjose
Copy link
Copy Markdown
Contributor Author

@pratikspatil024 Done, pls re-check.

@pratikspatil024
Copy link
Copy Markdown
Member

Thanks @AaronAjose!

@pratikspatil024 pratikspatil024 merged commit 199cd50 into 0xPolygon:develop May 12, 2026
13 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants