Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getAttesterDuties is being called multiple times in an epoch by 1 validator process #3201

Closed
g11tech opened this issue Sep 18, 2021 · 9 comments
Assignees

Comments

@g11tech
Copy link
Contributor

g11tech commented Sep 18, 2021

Describe the bug
In each epoch the validator calls for getAttesterDuties, which are supposed to be called once an epoch unless the reorg happens, but lodestar is consistently making 3-4 requests doesn't matter if the 1, 16 or 975 validators are attached (but through ONE VALIDATOR PROCESS) , and it seems to be doing calculation in all of the calls (may be in parallel)

Expected behavior
Should be called just once (per validator) or wait till previous request is completed.

Steps to Reproduce
Run beacon node on prater, attached validator(s) and observe the metrics dashboard.

Screenshots
for 1 validator:
image
for 16 validators:
image
for 975 validators:
image

Desktop (please complete the following information):

  • OS:
  • Version:
  • Branch:
  • Commit hash:

v0.30.0/master/+23/6cf7c2c3 (git)

@g11tech g11tech self-assigned this Sep 18, 2021
@dapplion
Copy link
Contributor

@g11tech Note that this chart can be very misleading. A single dot does not mean 1 call. A single dot means that the average value in prometheus db for that metric was X, so it could mean 100, 1000 requests that average to that value. Similarly 3 dots may just represent a single call but because of interval polling between grafana and prometheus it renders as three reads for the same scraped value

@dapplion
Copy link
Contributor

dapplion commented Sep 18, 2021

@g11tech You can review the validator attester duties flow here

private runDutiesTasks = async (epoch: Epoch): Promise<void> => {

duties are requested twice per epoch

  • 1 req for current epoch
  • 1 req for next epoch
  • (maybe) again if new indices are discovered during calls

@g11tech
Copy link
Contributor Author

g11tech commented Sep 18, 2021

fair point, let me look at the counter of the metric:
in 15 second interval (the grafana interval for fetching data) it has gone up by 8-10 (in multiple times/epochs).
as per my understanding of histogram metric, counter for a metric goes each time one observes ( +1 for each observe/startTimer command).

image

so going up by 2 would make sense (as in the last observation in the above graph)

let me figure out the new indices scenario.

@dapplion
Copy link
Contributor

fair point, let me look at the counter of the metric:
in 15 second interval (the grafana interval for fetching data) it has gone up by 8-10 (in multiple times/epochs).
as per my understanding of histogram metric, counter for a metric goes each time one observes ( +1 for each observe/startTimer command).

image

so going up by 2 would make sense (as in the last observation in the above graph)

let me figure out the new indices scenario.

This chart is much better, clearly the validator is calling getAttesterDuties ~8 times during the first 3 epochs or your screen capture

@g11tech
Copy link
Contributor Author

g11tech commented Sep 19, 2021

👍 will investigate further, may be can help us with our memory problem!

@g11tech
Copy link
Contributor Author

g11tech commented Sep 23, 2021

tried to investigate it on a local test setup, nothing went awry so then investigated this across 16vc and 975 vc,
on 16vc each slot there were two calls only, and the delta(lodestar_api_rest_response_time_seconds_count{operationId="getAttesterDuties"}[1m]) was also 3 eevrywhere (grafana shows delta of 3 for 2 api calls registered)

also picked a point "Sep-23 08:01" where api_rest_response was spiking for manual inspection, in both beacon logs and validator logs there were only two calls as expected.

then inspected this on 975vc node, picked the same point, there it was spiking on the same point as expected, but registering a delta of 12.
image

so moved to manually inspect the log at same point "Sep-23 08:01":

image

then realized that this is showing multiple requests but from four different validator ips, inspected docker containers and confirmed the same.
image
4 validator processes with 12 delta i.e. 3 per validator process, so it squares off. hence closing this as FALSE POSITIVE.

@g11tech
Copy link
Contributor Author

g11tech commented Sep 23, 2021

RECOMMENDATION:

  1. may be we can also track in metrics dashboard how many validator processes are connected on the beacon node. this will keep things straightforward and prevent any confusion.
  2. a security check can be done when there are two calls (by different processes) asking for validator duties on some common index on the same epoch. may be there is, I am not sure of this.
  3. further investigation of how these parallel calls may be affecting the performance of the beacon node, some of the call timings show 8 seconds in grafana dashboard, but might need deeper inspection.

But closing this issue, as investigation has concluded this to be of false positive.

@g11tech g11tech closed this as completed Sep 23, 2021
@dapplion
Copy link
Contributor

What would be an issue here is how much regen the calls are causing compounded. If the validator clients call getAttesterDuties multiple times but always result in 1 epoch transition run, then it's okay. The problem would be if multiple epoch transitions are run due to those calls

@g11tech
Copy link
Contributor Author

g11tech commented Sep 24, 2021

yes, this looks a good point to be investigated. i will create an issue for this and carry forward the investigation 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants