Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve telemetry of bandwidth usage #1061

Open
1 of 5 tasks
Tracked by #30 ...
lasarojc opened this issue Jun 30, 2023 · 1 comment
Open
1 of 5 tasks
Tracked by #30 ...

Improve telemetry of bandwidth usage #1061

lasarojc opened this issue Jun 30, 2023 · 1 comment
Assignees
Labels
e2e Related to our end-to-end tests P:bandwidth-optimization Priority: Optimize bandwidth usage

Comments

@lasarojc
Copy link
Contributor

lasarojc commented Jun 30, 2023

Target audience

Operators and consensus developers.

Problem definition

Currently we rely on a small set of metrics, from a small set of setups, to determine how bandwidth is used.
We need to gather more data on how the bandwidth is being used in real world scenarios.
The data gathered and the method to do so should be well understood by operators, should not disclose any information the validators wouldn't like to share, and should not burden operators.

Upsides

  • Develop fine grain understanding of data/metadata traffic on the network in in CometBFT.
  • The approaches developed could later be expanded to collect other kinds of information, for example, storage usage.

Downsides

  • Validators need to vet information before sharing with the team, which adds to their work.
  • Code developed may end up being used only in test environments.

Tasks

Goals (1w)

Stretch goals

Long term goals

Definition of done

  • Metrics to be captured are defined.
  • Running an experiment in the e2e testbed, or over a single administrative domain in a real-world deployment details bandwidth usage.
  • Details come as a graph where a link between two nodes is weighted by the amount of information that transited among.
  • This information is broken down wrt. the reactors (mempool, consensus, evidence, etc.).
  • If time allows, this information is updated in real time.
  • Jointly with this, we collect samples from real-world deployments to gather additional information about bandwidth usage.
  • To this end, we provide validators with clear guidelines to collect such information and share it with us.
@lasarojc lasarojc mentioned this issue Jun 30, 2023
3 tasks
@lasarojc lasarojc changed the title e2e: Improve telemetry of bandwidth usage e2e: Improve understanding of bandwidth usage Jun 30, 2023
@lasarojc lasarojc changed the title e2e: Improve understanding of bandwidth usage e2e: Improve telemetry of bandwidth usage Jun 30, 2023
@lasarojc lasarojc added the P:bandwidth-optimization Priority: Optimize bandwidth usage label Jun 30, 2023
@lasarojc lasarojc added this to the 2023-Q3 milestone Jul 4, 2023
@lasarojc
Copy link
Contributor Author

lasarojc commented Jul 5, 2023

For bandwidth, the obvious candidates are

  • already_received_txs, peer_receive_bytes_total, peer_send_bytes_total (ratios).
  • bther metrics about duplicate txs could be needed.

For latency, we can use the script to extract latencies in the QA process

  • a shorter goal is not to substantially change the current latency distribution
  • We may want also to know: the number of connected peers, the internal state of the mempool reactor (e.g. set of senders for each tx)

@lasarojc lasarojc added the e2e Related to our end-to-end tests label Jul 5, 2023
@lasarojc lasarojc changed the title e2e: Improve telemetry of bandwidth usage Improve telemetry of bandwidth usage Jul 5, 2023
@lasarojc lasarojc assigned lasarojc and unassigned otrack Jul 5, 2023
@lasarojc lasarojc removed this from the 2023-Q3 milestone Oct 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
e2e Related to our end-to-end tests P:bandwidth-optimization Priority: Optimize bandwidth usage
Projects
No open projects
Status: Todo
Development

No branches or pull requests

2 participants