You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been collecting some data to investigate delay of REST API responses in addition to data we get from metrics like #6691. This is from a Holesky beacon node running in a DVT setup with ~250 connected validators.
The data for this was simply collected by creating a log event if event loop lag > 1 second by running this branch unstable...nflaig/event-loop-delay. And all the data points are collected on the main thread, meaning event loop lag in network thread is not considered which might cause delays on some APIs that interact with the network, like getting the peer count, or submitting attestations / blocks.
This clearly shows the expected lag during the 8 second of the slot due to state / epoch transitions. But other seconds of the slot are mostly unaffected by event loop lag and should have a marginal effect on API latency (see % distribution below)
Percentage of Event Loop Lags per Slot Second
The percentage of lags above > 1 second are also mostly in the 8 second of the slot
Percentage of Slots with Event Loop Lag > 1 second
When looking at the percentage of slots over last few days, the amount of slots with an event loop lag is relatively low, especially for slot seconds other than 8.
Conclusion
Based on this data, it seems unlikely that event loop lag has a significant impact on API latency as during the 8-9 slot second, the validator client does not send any requests and the main tasks on the beacon node side is state and epoch transition while tasks like polling validator indices and getting duties happens at the beginning of the first slot of the epoch, and event loop lag there is relatively low and should not cause timeouts of the request even for really short timeouts like 2 seconds.
Next steps
It would be great if we could visualize similar data points in our metrics, one approach for this could be to look at event loop utilization (ELU) for certain slot seconds, this also gives us more data look at if we improve state / epoch transition or block processing as it should reduce the ELU during those slot seconds, see #6820 (comment).
The text was updated successfully, but these errors were encountered:
New data from latest release (v1.21.0), looks quite a bit better 🎉
Compared to previous, the event loop lag in the range of 3-4 seconds is less frequent
The next one is interesting, while we improved the lag in the 8 second of the slot, it looks like we have much less lags in other slots of the epoch as well that are > 1 second
We have ~2% less slots with event loop lag > 1 second, the percentage in 8 second slot went up which is kinda strange but the lag duration overall is less as show above
I've been collecting some data to investigate delay of REST API responses in addition to data we get from metrics like #6691. This is from a Holesky beacon node running in a DVT setup with ~250 connected validators.
The data for this was simply collected by creating a log event if event loop lag > 1 second by running this branch unstable...nflaig/event-loop-delay. And all the data points are collected on the main thread, meaning event loop lag in network thread is not considered which might cause delays on some APIs that interact with the network, like getting the peer count, or submitting attestations / blocks.
Using data from event-loop-lag-detected.log created the following diagrams.
Event Loop Lag: Slot Seconds vs. Delay
This clearly shows the expected lag during the 8 second of the slot due to state / epoch transitions. But other seconds of the slot are mostly unaffected by event loop lag and should have a marginal effect on API latency (see % distribution below)
Percentage of Event Loop Lags per Slot Second
The percentage of lags above > 1 second are also mostly in the 8 second of the slot
Percentage of Slots with Event Loop Lag > 1 second
When looking at the percentage of slots over last few days, the amount of slots with an event loop lag is relatively low, especially for slot seconds other than 8.
Conclusion
Based on this data, it seems unlikely that event loop lag has a significant impact on API latency as during the 8-9 slot second, the validator client does not send any requests and the main tasks on the beacon node side is state and epoch transition while tasks like polling validator indices and getting duties happens at the beginning of the first slot of the epoch, and event loop lag there is relatively low and should not cause timeouts of the request even for really short timeouts like 2 seconds.
Next steps
It would be great if we could visualize similar data points in our metrics, one approach for this could be to look at event loop utilization (ELU) for certain slot seconds, this also gives us more data look at if we improve state / epoch transition or block processing as it should reduce the ELU during those slot seconds, see #6820 (comment).
The text was updated successfully, but these errors were encountered: