Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Degraded attestation performance when Teku VC has a secondary BN defined #8180

Closed
rolfyone opened this issue Apr 8, 2024 · 5 comments · Fixed by #8405
Closed

Degraded attestation performance when Teku VC has a secondary BN defined #8180

rolfyone opened this issue Apr 8, 2024 · 5 comments · Fixed by #8405
Assignees

Comments

@rolfyone
Copy link
Contributor

rolfyone commented Apr 8, 2024

From Nash on discord:

Scenario A. Teku VC has a primary (teku) and secondary (lighthouse) BNs. The secondary BN reports libp2p issues and is later taken down, the whole time we are observing abnormal attestation issues and extremely poor sync performance. As soon as we remove the secondary BN from the config the performance normalizes! While experiencing the issue Teku-VC is complaining with
Apr 04 09:52:41 ursa-Xset teku[2401221]: 09:52:41.974 ERROR - Validator *** Error while connecting to beacon node event stream com.launch
darkly.eventsource.StreamIOException: java.net.ConnectException: Failed to connect to /127.0.0.1:1752 (See log file for full stack trace)
Apr 04 09:52:44 ursa-Xset teku[2401221]: 09:52:44.000 WARN - Validator *** There are no beacon nodes from the configured ones that are ready to be used as an event stream failover

Scenario B. Teku VC has a primary (teku) and secondary (lighthouse) BNs. We observe minor sporadic attestation issues, but no obvious logs on Teku-VC side. We bring down secondary-BN, no logs or complaints from Teku-VC but attestation issues continue. Once we remove secondary-BN from VC config perfomance is normalized.
We have disabled dual-homing of our Teku-VC until further understanding of the situation. There's several obvious issues here:
overall perf degradation when using 2BNs.
decreased perf when failover (!) BN is down.


will ask to confirm lighthouse / teku versions and we'll need further investigation.

@astrostakers
Copy link

astrostakers commented Apr 8, 2024

Astro-Stakers team here, additional comment:
When experiencing the issue we swapped Lighthouse for Nimbus (suspecting Teku <> LH incompatibility issues) but the issues continued with Nimbus, we suspect there's a general degradation issue when Teku has 2 BNs.

Versions:

  • Ligthouse Gul'karna
  • Teku 24.3.1

VC config:

[Unit]
Description=Teku VC Holesky
Wants=network-online.target
After=network-online.target

[Service]
Type=simple
User=validator
Group=validator
Restart=on-failure
RestartSec=3
KillSignal=SIGINT
TimeoutStopSec=900
ExecStart=/usr/local/bin/teku/bin/teku validator-client \
  --network=holesky \
  --data-path=/var/lib/teku_validator \
  --validator-keys=/var/lib/teku_validator/validator_keys:/var/lib/teku_validator/validator_keys \
  --beacon-node-api-endpoint=http://localhost:5052, http://localhost:1752 \
  --validators-proposer-default-fee-recipient=0x9479837fd33718e294AFe681e50b1a5000D60E91 \
  --validators-builder-registration-default-enabled \
  --validators-proposer-blinded-blocks-enabled=true \
  --validators-graffiti="Astro-Stakers" \
  --metrics-enabled=true \
  --metrics-port=8009 \
  --doppelganger-detection-enabled false \
  --validator-is-local-slashing-protection-synchronized-enabled=false

[Install]
WantedBy=multi-user.target

@astrostakers
Copy link

Screenshot 2024-04-08 at 3 48 32 PM

Green line represents our node with 2 BNs configured vs one in the same location and same config and software stack.

@StefanBratanov
Copy link
Contributor

StefanBratanov commented Apr 12, 2024

Hi @astrostakers

This error also indicates that there was an issue with the primary beacon node and the VC tried to connect to the failover. Is it possible to share more logs around that time both from the primary BN and VC?

Validator *** There are no beacon nodes from the configured ones that are ready to be used as an event stream failover

@StefanBratanov
Copy link
Contributor

Hi @astrostakers do you have any update on this? (following up on my latest comment)

@jumanzii
Copy link

same here. here is the perf difference. (with fallback node & without fallback node)
image
image

@zilm13 zilm13 self-assigned this Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants