Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle errors thrown by EL's executePayload #3545

Merged
merged 20 commits into from
Jan 5, 2022

Conversation

g11tech
Copy link
Contributor

@g11tech g11tech commented Dec 21, 2021

Motivation
Currently in kintsugi devnets it has been observed that EL throws error like internal error, invalid merkles and sometimes connection refused. There could be host of other scenarios where EL is not able to respond. This causes block to be treated as invalid in the current handling, leading to downscoring of the peers.
In a very limited peer scenario, lodestar eventually kicks all peers and goes out of sync unless peerstore is removed.

This PR

  • tracks the EL operation errors separately on executionEngine level and passes the information upstream.
  • Where these errors are treated in optimistic scenario allowing CL to move its consensus forward.
  • The user is also logged a warn for him to check and take appropriate action/investigation.

and unnecessary/invalid down-scoring of the peer is also prevented.

Closes #3537

@codeclimate
Copy link

codeclimate bot commented Dec 21, 2021

Code Climate has analyzed commit 2293644 and detected 3 issues on this pull request.

Here's the issue category breakdown:

Category Count
Complexity 3

View more on Code Climate.

@codecov
Copy link

codecov bot commented Dec 21, 2021

Codecov Report

Merging #3545 (2293644) into master (53f31a7) will decrease coverage by 0.12%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3545      +/-   ##
==========================================
- Coverage   37.56%   37.43%   -0.13%     
==========================================
  Files         310      311       +1     
  Lines        8253     8307      +54     
  Branches     1273     1286      +13     
==========================================
+ Hits         3100     3110      +10     
- Misses       5005     5049      +44     
  Partials      148      148              

@g11tech g11tech changed the title Handle errors thrown by EL's executePayload as optimistic Handle errors thrown by EL's executePayload Dec 23, 2021
@g11tech
Copy link
Contributor Author

g11tech commented Jan 4, 2022

@dapplion reverted the retry mechanism(s). Kindly review.

Would want to tackle the retry mechanism of executionEngine(#3567) in separate PR (along with through review of as how jsonRPC client behaves in various unavailability scenarios) as the aim of this PR is to prevent down-scoring peers because of EL errors.
Local testing (with EL being shutdown) shows that peer count stays good (instead of previous version where it would go to zero)

@g11tech
Copy link
Contributor Author

g11tech commented Jan 4, 2022

Been running locally on kintsugi for an hour+ now with EL shutdown, peer count still strong 25.

Copy link
Contributor

@dapplion dapplion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Excellent work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CL<>EL interop: Peers are getting downscored when EL errors on payload execution
3 participants