Skip to content

Gloas lookup sync#68

Open
dapplion wants to merge 24 commits intounstablefrom
gloas-lookup-sync-fixes
Open

Gloas lookup sync#68
dapplion wants to merge 24 commits intounstablefrom
gloas-lookup-sync-fixes

Conversation

@dapplion
Copy link
Copy Markdown
Owner

@dapplion dapplion commented Mar 9, 2026

Summary

Fixes for Gloas lookup sync + PayloadEnvelopesByRoot RPC wiring.

Rebased on current sigp/unstable.

Changes

  • Rewrite single block lookup state machine for Gloas (AwaitingParent, PayloadRequest states)
  • Fix genesis parent check + infinite retry loop
  • Address full parent missing payload (envelope_is_known_to_fork_choice)
  • Fix on_completed_request no-op + clear all peer sets on disconnect
  • Wire PayloadEnvelopesByRoot RPC: replaces the NoRequestNeeded placeholder with actual request sending, response routing, and state machine updates

Test plan

  • Compiles on current unstable
  • Test on local Kurtosis ePBS devnet
  • Verify envelope downloads during single block lookups at head

Comment thread .ai/plans/gloas-lookup-levels.md Outdated
@dapplion dapplion force-pushed the gloas-lookup-sync-fixes branch from f551bd3 to bae92bd Compare March 10, 2026 02:23
@dapplion dapplion requested a review from michaelsproul as a code owner March 10, 2026 02:23
@dapplion dapplion force-pushed the gloas-lookup-sync-fixes branch 6 times, most recently from a6a37aa to f28fed4 Compare March 16, 2026 17:09
@dapplion dapplion force-pushed the gloas-lookup-sync-fixes branch from f28fed4 to 8f4a5f0 Compare April 2, 2026 04:34
@dapplion dapplion force-pushed the gloas-lookup-sync-fixes branch from 8f4a5f0 to d7af091 Compare April 7, 2026 04:28
macladson and others added 17 commits April 7, 2026 06:23
sigp#8926


  Add a step to CI which runs `cargo check` across all combinations of features for certain crates using `cargo-hack`


Co-Authored-By: Mac L <mjladson@pm.me>
…igp#9092)

Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>

Co-Authored-By: Pawan Dhananjay <pawandhananjay@gmail.com>
sigp#3768


  Made the --validator-dir flag global so that it can be specified in any order


Co-Authored-By: Mike Jerred <mjerred.work@gmail.com>

Co-Authored-By: chonghe <44791194+chong-he@users.noreply.github.com>
Co-Authored-By: Mark Liu <mark@prove.com.au>

Co-Authored-By: Michael Sproul <michaelsproul@users.noreply.github.com>
…p#8454)

Addresses sigp#5403


  - Added `check_fee_recipient()` method to validate individual validators
- Added `check_all_fee_recipients()` to validate all validators on startup
- Validator client now fails to start if any enabled validator lacks a fee recipient and no global flag is used.
- Added Clear error messages to guide users on how to fix the issue
- Added unit tests


Co-Authored-By: AbolareRoheemah <roheemahabo@gmail.com>
Co-Authored-By: CATS <dev@yaksha3.ai>

Co-Authored-By: chonghe <44791194+chong-he@users.noreply.github.com>
Co-Authored-By: Weixie Cui <cuiweixie@gmail.com>
sigp#9077


  Where possible replaces all instances of `validator_monitor::timestamp_now` with `chain.slot_clock.now_duration().unwrap_or_default()`.
Where chain/slot_clock is not available, instead replace it with a convenience function `slot_clock::timestamp_now`.
Remove the `validator_monitor::timestamp_now` function.


Co-Authored-By: Mac L <mjladson@pm.me>
- sigp#6689


  The intention is to only modify the INFO logs that's emitted regularly to reduce the verbosity. But I understand that this change will affect other display in the logs too that uses the `ExecutionBlockHash` display. So would love some feedbacks about the change.


Co-Authored-By: Tan Chee Keong <tanck@sigmaprime.io>

Co-Authored-By: Mac L <mjladson@pm.me>
Co-Authored-By: Barnabas Busa <busa.barnabas@gmail.com>
Gossip verify and cache bids and proposer preferences. This PR also ensures we subscribe to new fork topics one epoch early instead of two slots early. This is required for proposer preferences.


  


Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>
Closes sigp#8949


  Implements peer penalties and REJECT/IGNORE message propagation for `SignedExecutionPayloadEnvelope` gossip handling, completing follow-up work from sigp#8806.

Feedback on the error classification would be appreciated.

### Key Implementation Details

- Maps all 15 `EnvelopeError` variants to REJECT/IGNORE based on [Gloas p2p spec](https://github.com/ethereum/consensus-specs/blob/master/specs/gloas/p2p-interface.md#execution_payload)
- Follows `ExecutionPayloadError` handling pattern from block gossip (`penalize_peer()` method)
- Uses explicit variant matching (rather than catch-all `_`) for type safety
- Applies `LowToleranceError` penalty for protocol violations (invalid signatures, mismatches, etc.)
- Ignores without penalty for spec-defined cases (unknown block root, prior to finalization) and internal errors


Co-Authored-By: 0u-Y <yyw1000@naver.com>

Co-Authored-By: Eitan Seri-Levi <eserilev@gmail.com>
New audit failure from `RUSTSEC-2026-0098`


  Bump `rustls-webpki` to an unaffected version, add an ignore for the old version used by `warp` 0.3


Co-Authored-By: Mac L <mjladson@pm.me>

Co-Authored-By: Pawan Dhananjay <pawandhananjay@gmail.com>
…nd (sigp#9129)

The tracing exporter uses a `PrefixBasedSampler` that only samples root spans whose name starts with `lh_`. Rename the VC root spans to include the prefix so their traces are exported.

Thanks @lmnzx for pointing this out!


  


Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
Co-Authored-By: shane-moore <skm1790@gmail.com>

Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>
N/A


  Adds lints for rust 1.95. Mostly cosmetic.
1. .zip(a.into_iter()) -> .zip(a) . Also a few more places where into_iter is not required
2. replace sort_by with sort_by_key
3. move if statements inside match block.
4. use checked_div instead of if statements. I think this is debatable in terms of being better, happy to remove it if others also feel its unnecessary


Co-Authored-By: Pawan Dhananjay <pawandhananjay@gmail.com>
Co-Authored-By: shane-moore <skm1790@gmail.com>

Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>

Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>

Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
Copy link
Copy Markdown

@pawanjay176 pawanjay176 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the simplicity of the new approach and the removal of the Request state trait.
First pass, will continue tomorrow.
Also, curious if you want to wire up the payload processing in this PR or leave it to a future PR. I'm fine with both.

peers,
peer_type,
cx.next_id(),
awaiting_parent.map(AwaitingParent::PreGloas),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit confusing.

Could AwaitingParent be a struct instead of an enum?

struct AwaitingParent {
    parent_hash: Hash256,
    parent_root: Option<ExecutionBlockHash>
}

I find the pattern of creating an AwaitingParent::PreGloas variant by default and adjusting it later very confusing. a gloas parent_hash not containing a parent_root is conceptually confusing as well.

With the struct approach, adding a parent root later on as we figure out its gloas seems clearer to me.

"Block download response"
);

let result = lookup.on_block_download_response(id.req_id, response.map_err(|_| ()), cx);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get why we are swallowing the errors here and in the other on_*download_response functions.

peer: PeerId,
},
/// Block processing complete
Done {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Complete instead of Done maybe to be consistent with naming in other parts of the codebase?

if let BlockRequest::Downloading { state, .. } = self {
state.insert_verified_response(result)
} else {
false
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we log something here? Might be hard to debug in a later refactor if we accidentally modify the state and the invalid state error gets swallowed

peer_group: PeerGroup,
},
/// Data sent for processing, awaiting result
Processing {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: re review why we need DataDownloadKind here?

match &mut self.data_request {
DataRequest::WaitingForBlock => {
if let Some(block) = self.block_request.peek_block() {
let block = block.clone();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary clone here?

Duration::ZERO,
)
.map_err(LookupRequestError::SendFailedProcessor)?;
self.block_request = BlockRequest::Processing { block, peer };
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we break after this too? processing needs async trigger

}
let kind = data.kind();
let peer_group = peer_group.clone();
self.data_request = DataRequest::Processing { kind, peer_group };
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can break after this too?

match &mut self.payload_request {
PayloadRequest::WaitingForBlock => {
if let Some(block) = self.block_request.peek_block() {
let block = block.clone();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary clone again?

state.on_download_start(req_id)?;
}
Ok(LookupRequestResult::NoRequestNeeded(_reason)) => {
// Payload RPC not wired yet — skip download, mark as done
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have it now right? Can add it in maybe? I'm also fine with deferring that to another PR though

michaelsproul and others added 5 commits April 20, 2026 00:59
Fix a **consensus fault** in `PreEpochCache` 🙀

Fortunately it's only reachable on a network with `total_active_balance=0`, i.e. a network that's already completely dead. As such this PR is not time-sensitive in any way.


  Add the floor on `total_effective_balance` when converting from `PreEpochCache` to `EpochCache`. An alternative would be to add the floor inside `PreEpochCache::get_total_active_balance`, however that would be redundant, as the only place this function is called outside this file is in single-pass epoch processing:

https://github.com/sigp/lighthouse/blob/176cce585c1ba979a6210ed79b6b6528596cdb8c/consensus/state_processing/src/per_epoch_processing/single_pass.rs#L461-L462

The `set_total_active_balance` call already handles the floor.

A regression test is included.


Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
Fix a vulnerability in the slasher whereby it would OOM upon processing an invalid attestation with an artificially high `validator_index`. This fix has already been made available to affected users on the `slasher-fix` branch.


  - Prevent attestations from being passed to the slasher prior to signature verification. This was unnecessary, as they would later be passed on successful validation as well.
- Add a defensive cap on the maximum validator index processable by the slasher. The cap is high enough that it shouldn't be reached for several years, and will quickly result in warning logs if forgotten.
- Add a regression test that confirms that the issue is fixed.


Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
Fix database pruning post-Gloas


  - Fix DB pruning logic (and state summaries DAG)
- Get the `beacon_chain` tests running with `FORK_NAME=gloas` 🎉


Co-Authored-By: Michael Sproul <michael@sigmaprime.io>

Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>

Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>

Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>

Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
This reverts commit 2749e18, from:

- sigp#9092

We no longer need those changes since the abolition of pending/full states.


  


Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
@dapplion dapplion force-pushed the gloas-lookup-sync-fixes branch from 5533da0 to c23b1af Compare April 21, 2026 22:38
@dapplion
Copy link
Copy Markdown
Owner Author

Rebased onto sigp/unstable and squashed to a single commit. Pushed c23b1af56.

Addressed @pawanjay176's comments from 2026-04-18 (17/18):

  • AwaitingParent enum → struct with Option<ExecutionBlockHash> parent_hash + pre_gloas()/post_gloas() constructors (mod.rs:489)
  • DoneComplete across BlockRequest / DataRequest / PayloadRequest; is_done()is_complete() (sbl.rs:118)
  • on_*_download_response handlers now debug! log the error before collapsing to Err(()) (mod.rs:550)
  • insert_verified_response logs before dropping on bad state (sbl.rs:184)
  • Docstring added to DataDownloadKind explaining why it caches the request kind (sbl.rs:203)
  • Docstring on the Arc<RwLock<..>> peer sets explaining sharing with long-running ActiveCustodyRequest (sbl.rs:412)
  • Docstring on is_full_payload for intended usage; conservative false + log when block not downloaded (sbl.rs:470, :475)
  • Docstring on continue_requests loop explaining per-stream semantics (sbl.rs:567)
  • Pass block_root directly; drop the local br (sbl.rs:572)
  • if false dead branch → if state.is_completed() (sbl.rs:574)
  • "Single peer" comment added to block download peer handling (sbl.rs:580)
  • let Some(_) = ... else {}.is_some() + bool flag (sbl.rs:600)
  • break after transitioning to Processing (sbl.rs:673, :733) — "processing needs async trigger"
  • .clone() of Arc<Block> eliminated by extracting (slot, expected_blobs) before the &mut self helper call (sbl.rs:684, :744); helpers renamed create_data_request/create_payload_request

Deferred (per your "fine with deferring that to another PR" note):

  • Wiring payload-envelope processing (sbl.rs:759). Adds a send_payload_for_processing in SyncNetworkContext plus a beacon_processor Work variant; keeping this PR focused on the download path.

Test status (FORK_NAME=gloas): 59/59 lookup tests pass.

Will open the upstream PR against sigp:unstable next.

@dapplion dapplion force-pushed the gloas-lookup-sync-fixes branch 3 times, most recently from eb40ff0 to 3ff8491 Compare April 22, 2026 06:56
Rewrites the single block lookup state machine for Gloas, where block, data
(blobs/columns), and execution payload envelope are independent components
that can arrive and import out of order.

- Three additive-only sub-state-machines for block / data / payload streams.
  Peer sets start empty for data/payload and grow as children arrive — the
  parent lookup's completion requirement can widen over time without
  mutating any state machine.
- `AwaitingParent` becomes a struct carrying the child's `parent_block_hash`
  so the parent can be classified empty/full from the child's bid reference.
- Wires `PayloadEnvelopesByRoot` RPC end-to-end through `SyncNetworkContext`:
  request sending, response routing (`SingleLookupReqId::SinglePayloadEnvelope`),
  and integration into `PayloadRequest`. Envelope *processing* is still a TODO;
  only the download path is wired.
- Test rig: serves envelopes from a `network_envelopes_by_root` cache
  populated from the external harness; bumps test validator count to 8 so
  `proposer_lookahead` can populate at the Fulu → Gloas upgrade.
- Enables gloas in `TEST_NETWORK_FORKS`.
- Fixes: genesis parent check, infinite retry loop on repeated download
  failure, no-op in `on_completed_request`, and peer sets not being cleared
  on disconnect.
@dapplion dapplion force-pushed the gloas-lookup-sync-fixes branch from 3ff8491 to ebe9fe2 Compare April 22, 2026 07:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.