Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I/O performance issue when subscribing to sync committee #5741

Closed
twoeths opened this issue Jul 10, 2023 · 1 comment · Fixed by #5782
Closed

I/O performance issue when subscribing to sync committee #5741

twoeths opened this issue Jul 10, 2023 · 1 comment · Fixed by #5782
Assignees
Labels
prio-high Resolve issues as soon as possible. scope-performance Performance issue and ideas to improve performance.

Comments

@twoeths
Copy link
Contributor

twoeths commented Jul 10, 2023

Describe the bug

A lot of attestations were submitted late from Jun 29 to Jul 01

Screenshot 2023-07-10 at 11 08 08

This happened when the node subscribed to sync committee topic (notice that job wait time was too big for sync committee messages)

Screenshot 2023-07-10 at 11 09 58

topic peers spiked because a lot of nodes with big long lived subnets connected to us at that time

Screenshot 2023-07-10 at 11 12 04

there are more messages sent to us

Screenshot 2023-07-10 at 11 15 11

and we have to published to too many peers also

Screenshot 2023-07-10 at 11 15 49

this caused I/O lag issue (see #5740)

Screenshot 2023-07-10 at 11 16 14

node received too many PRUNE messages

Screenshot 2023-07-10 at 11 17 53

mesh peers were dropped (including core topics)

Screenshot 2023-07-10 at 11 18 09

this caused gossip blocks being late, node had to process a lot of attestations at the same time which push pressure to our BLS worker pool (see #5739)

Expected behavior

No I/O performance issue, network thread (useWorker=true) could resolve this

Steps to reproduce

No response

Additional context

No response

Operating system

Linux

Lodestar version or commit hash

v1.8.0

@philknows philknows added prio-high Resolve issues as soon as possible. scope-performance Performance issue and ideas to improve performance. labels Jul 10, 2023
@twoeths twoeths self-assigned this Jul 20, 2023
@twoeths
Copy link
Contributor Author

twoeths commented Jul 20, 2023

When the issue happens, actually most of the peers are outbound peers

Screenshot 2023-07-20 at 11 08 34

This is when we have sync committee duty

Screenshot 2023-07-20 at 11 08 53

But most of the time we already had enough subnet peers

Screenshot 2023-07-20 at 11 14 20

When we have enough subnet peers, we should not dial peers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
prio-high Resolve issues as soon as possible. scope-performance Performance issue and ideas to improve performance.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants