Skip to content

fix(archiver): skip descendants of invalid-attestations checkpoints#23502

Merged
spalladino merged 2 commits into
merge-train/spartanfrom
spl/test/archiver-stuck-non-consecutive-checkpoint
May 27, 2026
Merged

fix(archiver): skip descendants of invalid-attestations checkpoints#23502
spalladino merged 2 commits into
merge-train/spartanfrom
spl/test/archiver-stuck-non-consecutive-checkpoint

Conversation

@spalladino
Copy link
Copy Markdown
Contributor

@spalladino spalladino commented May 22, 2026

Motivation

archiver/src/modules/l1_synchronizer.ts skipped checkpoints with insufficient/invalid attestations under the assumption that the next proposer would invalidate them before publishing. When that assumption was violated — i.e., proposer P2 published a valid-attestations checkpoint that extended P1's invalid one — the archiver hit InitialCheckpointNumberNotSequentialError in block_store.addCheckpoints, the catch handler rolled back the L1 sync point, and the next poll re-fetched the same range and re-threw. The archiver looped indefinitely. The protocol already defines OffenseType.PROPOSED_DESCENDANT_OF_CHECKPOINT_WITH_INVALID_ATTESTATIONS for exactly this case but the slasher couldn't see valid-attestations descendants because the archiver threw before emitting any event.

Human Note

This is particularly relevant under pipelining. Attestors now attest to a checkpoint before the previous one is pushed to L1, so they can be inadvertently attesting to a checkpoint built on top of one that became invalid as it was published to the rollup the contract with wrong attestations. So an honest attestor could get slashed if the proposer was malicious.

Approach

In the synchronizer, persist rejected ancestors in the block store keyed by archive root. On each new checkpoint, before attestation validation, compare its header.lastArchiveRoot against the persisted set — if it matches, skip the checkpoint as a descendant of an invalid ancestor and emit a new L2BlockSourceEvents.CheckpointBuiltOnInvalidAncestorDetected event with enough metadata to resolve the proposer. The slasher's AttestationsBlockWatcher is fixed to slash the proposer (not the attestors) under the new event.

Fixes A-1072

@spalladino spalladino added ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure ci-draft Run CI on draft PRs. labels May 22, 2026
@spalladino spalladino changed the title test(e2e): repro archiver stuck on invalid-attestations descendant fix(archiver): skip descendants of invalid-attestations checkpoints May 22, 2026
Adds an e2e test that deterministically reproduces the archiver retry
loop documented at l1_synchronizer.ts:905-908: when a checkpoint with
insufficient attestations lands and the next proposer publishes a valid
descendant without first invalidating it, block_store.addCheckpoints
throws InitialCheckpointNumberNotSequentialError on every poll.

Gated by a new test-only sequencer flag skipWaitForValidParentCheckpointOnL1
that bypasses the parent-validity check inside the checkpoint proposal job.
@spalladino spalladino force-pushed the spl/test/archiver-stuck-non-consecutive-checkpoint branch 2 times, most recently from a78bb63 to 9b97cbf Compare May 26, 2026 20:24
@spalladino spalladino force-pushed the spl/test/archiver-stuck-non-consecutive-checkpoint branch from 9b97cbf to ab6e131 Compare May 26, 2026 21:15
@spalladino spalladino marked this pull request as ready for review May 26, 2026 21:15
@spalladino spalladino removed the ci-draft Run CI on draft PRs. label May 26, 2026
@spalladino spalladino enabled auto-merge (squash) May 26, 2026 21:52

// Drop any rejected entry for this archive root: a checkpoint that was previously rejected
// (e.g. invalid attestations) is now being ingested as valid, so its descendants are allowed.
await this.removeRejectedCheckpointByArchiveRoot(checkpoint.checkpoint.archive.root);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalid checkpoints that aren't later replaced by valid ones with an exact archive root, will they just accumulate in the store?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Though I don't expect many invalid checkpoints out there, and the invalid checkpoints are smaller than the valid ones, which we store forever.

@spalladino spalladino merged commit 983038e into merge-train/spartan May 27, 2026
20 of 26 checks passed
@spalladino spalladino deleted the spl/test/archiver-stuck-non-consecutive-checkpoint branch May 27, 2026 11:18
PhilWindle pushed a commit that referenced this pull request May 27, 2026
Fixes the invalid checkpoint descendant e2e timing by keeping sequencers
stopped until the test has selected adjacent target proposers, installed
listeners, applied malicious configs, and warped to the intended
pipelined build window.

This avoids applying malicious config to an earlier slot owned by the
same validator, which is what caused the CI run for PR #23502 to miss
the intended P1/P2 checkpoint pair.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants