Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load balance resolved bootnodes #786

Closed
corverroos opened this issue Jul 11, 2022 · 1 comment
Closed

Load balance resolved bootnodes #786

corverroos opened this issue Jul 11, 2022 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@corverroos
Copy link
Contributor

corverroos commented Jul 11, 2022

Problem to be solved

With public testnet approaching and with direct network connections via NAT traversal being a problem, we are going to rely on Obol hosted bootnodes to act as circuit relays for cluster connectivity (for charon run and charon dkg).

Problems:

  • We currently only have a single bootnode, and this will not scale to support hundreds of nodes and thousands of relay connections.
  • Clusters must also use the same bootnode instance.
  • Bootnodes must be long lived, since charon doesn't support dynamic bootnodes or recycling of bootnodes.
  • Bootnodes need to be able to handle a large amount of relay connections.

We therefore need a scalable bootnode solution that would support hundreds of long lived cluster for our public testnet.

Proposed solution

Charon resolves bootnodes on startup. It polls a static configured HTTP (using DNS) endpoint (http://bootnode.gcp.obol.tech:3640/enr) on startup which returns a bootnode ENR. That ENR contains the publickey and public addresses (TCP and UDP) of the bootnode which we then provide to discv5 (UDP) and libp2p (TCP) which is then used for the rest of the lifetime of the instance.

If we deploy a number of bootnode instances (e.g. 64), which are all publicly available on their own and we deploy a loadbalancer using sticky header routing then clusters will resolve 1-of-64 bootnodes.

  • Deploy sufficient bootnodes (since we cannot change the number since that will result in previously-resolved bootnodes to mismatch subsequently-resolved bootnodes).
  • Each bootnode should be long-loved. Since its ENR should be static for the lifetime of a cluster. Suggest using static-sets to achieve ENR consistency across restarts/deploys.
  • If 64 small instances are insufficient to handle the load, we can (to a limited extent) vertically scale by increasing the instance size.
  • Deploy a HTTP header based sticky load balancer at http://bootnode.gcp.obol.tech:3640/enr.
  • Charon will include a Charon-Cluster:<lock_hash> header as part of bootnode ENR resultion HTTP call.
  • All charon nodes of the same cluster will therefore resolve the same long lived bootnode.

Out of Scope

We only support load balancing at ENR resolution time. We do not support dynamic rolling bootnode load balancing.

@corverroos corverroos added the enhancement New feature or request label Jul 11, 2022
@corverroos corverroos self-assigned this Jul 19, 2022
obol-bulldozer bot pushed a commit that referenced this issue Jul 19, 2022
Add the "Charon-Cluster" header to the http request when resolving bootnode ENR on startup.

category: feature
ticket: #786
@corverroos
Copy link
Contributor Author

Closing this for now since the charon part is done. There is another ticket on ObolNetwork/obol-infrastructure#50 for the infra side of things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants