Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missed attestation after v1.1.0 #4600

Closed
twoeths opened this issue Sep 27, 2022 · 6 comments · Fixed by #4649 or #4788
Closed

Missed attestation after v1.1.0 #4600

twoeths opened this issue Sep 27, 2022 · 6 comments · Fixed by #4649 or #4788
Labels
prio-high Resolve issues as soon as possible. scope-profitability Issues to directly improve validator performance and its profitability.

Comments

@twoeths
Copy link
Contributor

twoeths commented Sep 27, 2022

Describe the bug

With v1.1.0 we have ~2% wrong head attestations and ~1% missed attestation

Screen Shot 2022-09-27 at 14 48 44

This is because in v1.1.0 we improved gossipsub hence we have a lot of mesh peers, and that introduced the I/O lag issue

Local validator published unaggregated attestation validatorIndex=429161, slot=4786296, committeeIndex=5, subnet=5, sentPeers=17, delaySec=2.5950000286102295

at the validator side

Sep-27 06:19:39.000[]                ^[[34mdebug^[[39m: HttpClient request routeId=submitPoolAttestations
Sep-27 06:19:43.733[]                ^[[34mdebug^[[39m: HttpClient response routeId=submitPoolAttestations

Screen Shot 2022-09-27 at 14 55 05

Expected behavior

  • Should improve vc REST API request times submitPoolAttestations and beacon REST API response times submitPoolAttestations
@twoeths
Copy link
Contributor Author

twoeths commented Sep 27, 2022

I hope the libp2p upgrade could help this since we improve the gossipsub heartbeat performance there, see ChainSafe/js-libp2p-gossipsub#256 (comment)

cc @wemeetagain

@twoeths
Copy link
Contributor Author

twoeths commented Sep 27, 2022

a sample missed attestation journey:

  • validator log
Sep-27 06:19:37.867[]              ^[[36mverbose^[[39m: Found new chain head slot=4786296, head=0x233d47e151d7cdc12b5c6507c093c4173603cc435f4e59cd940f9b3f09073552, previouDuty=0xcbc54f5a7eea4a658893901025f8d435af83973aa2e5dd4da5548cd1cddb5ec2, currentDuty=0xc001899c93d5584fbcfb5b34dfb84f841dc079866f051842f413c36325ab60ee
Sep-27 06:19:37.868[]                ^[[34mdebug^[[39m: HttpClient request routeId=produceAttestationData
Sep-27 06:19:37.871[]                ^[[34mdebug^[[39m: HttpClient response routeId=produceAttestationData
Sep-27 06:19:37.881[]                ^[[34mdebug^[[39m: Signed attestation slot=4786296, index=58, head=0x233d47e151d7cdc12b5c6507c093c4173603cc435f4e59cd940f9b3f09073552, validatorIndex=426103
Sep-27 06:19:37.883[]                ^[[34mdebug^[[39m: Signed attestation slot=4786296, index=53, head=0x233d47e151d7cdc12b5c6507c093c4173603cc435f4e59cd940f9b3f09073552, validatorIndex=429043
Sep-27 06:19:37.885[]                ^[[34mdebug^[[39m: Signed attestation slot=4786296, index=5, head=0x233d47e151d7cdc12b5c6507c093c4173603cc435f4e59cd940f9b3f09073552, validatorIndex=429161
Sep-27 06:19:37.886[]                ^[[34mdebug^[[39m: Signed attestation slot=4786296, index=49, head=0x233d47e151d7cdc12b5c6507c093c4173603cc435f4e59cd940f9b3f09073552, validatorIndex=429626
Sep-27 06:19:37.888[]                ^[[34mdebug^[[39m: Signed attestation slot=4786296, index=41, head=0x233d47e151d7cdc12b5c6507c093c4173603cc435f4e59cd940f9b3f09073552, validatorIndex=429628
Sep-27 06:19:37.890[]                ^[[34mdebug^[[39m: Signed attestation slot=4786296, index=29, head=0x233d47e151d7cdc12b5c6507c093c4173603cc435f4e59cd940f9b3f09073552, validatorIndex=426188
Sep-27 06:19:39.000[]                ^[[34mdebug^[[39m: HttpClient request routeId=submitPoolAttestations
Sep-27 06:19:43.733[]                ^[[34mdebug^[[39m: HttpClient response routeId=submitPoolAttestations
Sep-27 06:19:43.733[]                 ^[[32minfo^[[39m: Published attestations slot=4786296, count=6
  • beacon node log:
Sep-27 06:19:35.000[CHAIN]         ^[[36mverbose^[[39m: Clock slot slot=4786296
Sep-27 06:19:41.595[API]             ^[[34mdebug^[[39m: Req req-aun 65.109.3.117 submitPoolAttestations

Sep-27 06:19:43.724[VMON]            ^[[34mdebug^[[39m: Local validator published unaggregated attestation validatorIndex=429161, slot=4786296, committeeIndex=5, subnet=5, sentPeers=17, delaySec=2.5950000286102295

Sep-27 06:27:31.309[VMON]            ^[[34mdebug^[[39m: Failed attestation in previous epoch validatorIndex=429161, prevEpoch=149571, isPrevSourceAttester=false, isPrevHeadAttester=false, isPrevTargetAttester=false, inclusionDistance=null

==> the I/O lag caused 2.6s delay on submitPoolAttestations http request and 2.1s on validation + gossip publish which make it 4.7s delay in total and it was not able to be included in any AggregateAndProof

@philknows
Copy link
Member

So the best way to fix this issue is just to merge libp2p upgrade and push this for testing ASAP? I don't suppose there's anything else we can do at this point for 1.1.x. It seems to only be a larger issue for lower end hardware with a lot (100+) validators?

@philknows philknows added prio-high Resolve issues as soon as possible. scope-profitability Issues to directly improve validator performance and its profitability. labels Oct 4, 2022
@philknows
Copy link
Member

Related issues to #4002 and #3694 ?

@twoeths
Copy link
Contributor Author

twoeths commented Oct 4, 2022

So the best way to fix this issue is just to merge libp2p upgrade and push this for testing ASAP?

I don't think the new libp2p would help

It seems to only be a larger issue for lower end hardware with a lot (100+) validators?

yes, we'll have issue with a node of >= 64 validators which cause the node to subscribe to all subnets

Related issues to #4002 and #3694 ?

I think it's mainly #4002

@philknows we have afterBlockDelaySlotFraction (the time to hold attestation at vc if block comes too early) configurable. Right now its 2s, we'll see if reducing that value would help at submitting right at 1/3 of slot could give a big delay to the submission of attestation at the http/api side

@twoeths
Copy link
Contributor Author

twoeths commented Nov 21, 2022

I haven't seen any missed attestations due to delayed submitPoolAttestations in the last 2 days (on lido nodes). In fact, the I/O lag issue was mitigated with v1.2.1 release and after_block_delay=0 (deployed it in the last 5 days), below is a metric of 30 days on a 100-validator mainnet node

Screen Shot 2022-11-21 at 11 38 40

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
prio-high Resolve issues as soon as possible. scope-profitability Issues to directly improve validator performance and its profitability.
Projects
Status: Done
2 participants