Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve failed duty message #866

Closed
xenowits opened this issue Jul 27, 2022 · 0 comments
Closed

Improve failed duty message #866

xenowits opened this issue Jul 27, 2022 · 0 comments
Assignees
Labels
enhancement New feature or request Let's discuss Discussion for general feedback and positive criticism :)

Comments

@xenowits
Copy link
Contributor

xenowits commented Jul 27, 2022

Problem to be solved

We need more specific reasons as to why a duty failed. For ex: if the duty got stuck at validatorAPI, one of the reasons might be that the VC didn't successfully submit a signed duty. Or if it got stuck at consensus, we can say that quorum might not have reached.

At the same time, we also need to think how specific we want to go. For ex, when quorum was not met in the consensus event, was that because of a byzantine node or simply because not peers were down? Note that determining if a node is byzantine is a difficult problem.

Currently, tracker logs a message like the one below when a duty fails:
10:57:21.000 WARN tracker Duty failed {"component": "parSigDBThreshold", "reason": "12826636/attester failed in parSigDBThreshold component", "duty": "12826636/attester"}

We want something like this:
10:57:21.000 WARN tracker Duty failed {"component": "parSigDBThreshold", "reason": "Not enough partial signatures in parSigDB", "duty": "12826636/attester"}

Proposed solution

Analyze the events for the duty to figure out the probable reason for failure. Although, it's difficult to illustrate ALL the scenarios in which duties can fail. However, here are some examples and proposals to solve them:

  • Scheduler:
    • Duty not resolved:
      • It may happen if there are no active DVs.
      • It may also happen if either resolveProDuties() or resolveAttDuties() fails.
  • Fetcher:
    • Beacon node unavailable:
      • It may happen that fetcher isn't able to query the BN. We say can that "couldn't fetch due to unavailable beacon node".
    • Proposer duty may fail due to a failed randao duty. Randao can fail if there are insufficient no of partial signatures.
  • Consensus:
    • Quorum not reached:
      • It may happen that some peers are down at the time of consensus. We can say that "quorum not reached due to insufficient peers".
      • Or, it might just happen that a byzantine node is thwarting consensus.
  • ValidatorAPI:
    • validatorAPI may fail to query data from BN. So, it doesn't provide VC with any data to sign. We can say "validatorAPI is not able to connect to BN".
    • Or, the issue might be entirely on the VC side. For ex, VC is not properly connected to BN or isn't providing signed data to charon.
  • ParsigDBInternal:
    • We can say "couldn't save partially signed duty data set received from VC".
  • ParSigEx:
    • Peers down:
      • Duties can fail at ParSigEx if not enough peers submit their partial signatures. We can say "not enough peers".
    • Invalid sigs:
      • It may also happen if byzantine peers send invalid signatures which fail at verification step.
  • ParsigDBThreshold:
    • We can say "could not reach threshold".
  • SigAgg:
    • Not enough partial signatures
      • SigAgg can't complete if there aren't enough partial signatures. We can say "not enough peers".
  • Bcast:
    • Beacon node unavailable
      • Broadcast may fail when charon can't connect to BN, either because BN is down or there's some connectivity issues with charon. We can say "bcast failed as charon can't connect to BN".

Out of Scope

Use ONLY the events in tracker to determine the possible reason. Ex: if we suspect that BN is down, don't ping to find out.

@dB2510 dB2510 added the enhancement New feature or request label Jul 27, 2022
@Battenfield Battenfield added Let's discuss Discussion for general feedback and positive criticism :) Size: 5 labels Jul 27, 2022
@xenowits xenowits self-assigned this Aug 1, 2022
obol-bulldozer bot pushed a commit that referenced this issue Aug 16, 2022
Improve failed duty message.

category: refactor
ticket: #866
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Let's discuss Discussion for general feedback and positive criticism :)
Projects
None yet
Development

No branches or pull requests

4 participants