Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PBTS: experimental values to be evaluated in QA experiments #2323

Closed
1 task
Tracked by #2100 ...
cason opened this issue Feb 13, 2024 · 9 comments
Closed
1 task
Tracked by #2100 ...

PBTS: experimental values to be evaluated in QA experiments #2323

cason opened this issue Feb 13, 2024 · 9 comments
Labels
metrics pbts qa Quality assurance
Milestone

Comments

@cason
Copy link
Contributor

cason commented Feb 13, 2024

Overview

To define good default synchronous parameters for PBTS, collecting some relevant metrics is essential.

Original issue: tendermint/tendermint#7202

Currently planned test cases

  1. QA/PBTS: Run 200-nodes test with v0.38.x's saturation point #2460 [Auxiliary run - not part of regular QA report]
  • Run on v1
  • PBTS enable height: 1
  • Run QA 200-node TC, but only with v0.38.x saturation point (r=200, c=2)
  • Run the experiment (90 secs each) 4-5 times
    • or just run the experiment for 3-4 mins
  • Using latency emulation
  • output: PBTS team can look at ProposalTimestampDifference to set default values
  1. QA/v1: Run 200-nodes test for final report without latency emulation #2461 [Part of regular QA report]
  • Run on v1
  • PBTS enable height: 1
  • Running saturation discovery
  • Duration ~40 mins
  • NOT Using latency emulation
  • output: QA report, 200-node test, with saturation point report section, and metrics report section (for sat point)
  • output: compare metrics between (v0.38.x, bft time) and (v1, PBTS)
  1. QA/v1: Run 200-nodes test for final report with latency emulation #2513 [Part of regular QA report]
  • Run on v1
  • PBTS enable height: 1
  • Running saturation discovery (mainly because with latencies the saturation point is likely to be different)
  • Duration ~40 mins
  • Using latency emulation
  • output: baseline for future releases (200-node without latency emulation is now deprecated)
  • output: compare metrics between (v1, no-lat-emulation) and (v1, lat-emulation)
  1. New test case: X nodes [PBTS specific - not part of the regular QA report]
  • Run on v1
  • PBTS enable height: 1
  • Duration ~Y mins
  • Using latency emulation
  • No Tx load
  • clock_skew enable on: close to 1/3 of voting power
  • clock_skew value: +5 seconds (i.e. 5 s in the future)
  • output: average (or histogram of) number of rounds to decide, and block latency
  • This test case is a DRAFT:
    • Play around with e2e on docker (Daniel volunteered 😄)
@cason cason added metrics qa Quality assurance pbts labels Feb 13, 2024
@cason cason assigned hvanz and unassigned hvanz Feb 13, 2024
@cason cason changed the title QA: define relevant metrics for evaluating PBTS in a distributed environment PBTS: define relevant metrics to be evaluated in QA experiments Feb 13, 2024
@hvanz hvanz mentioned this issue Feb 21, 2024
10 tasks
@cason cason linked a pull request Feb 27, 2024 that will close this issue
4 tasks
@cason
Copy link
Contributor Author

cason commented Feb 27, 2024

Some comments regarding the existing metrics:

  • QuorumPrevoteDelay and FullPrevoteDelay: is anyone using? What is the value of the data produced by it?
  • ProposalTimestampDifference: it is current enabled only when PBTS is enabled.
    • I wonder why not to enable it all the times, as it can provide useful data for chains that are considering to switch to PBTS.
    • I don't see much point on creating two data sets, for timely and for untimely proposals. The bounds for timely proposals are known, determined by consensus params. Moreover, considering the previous item, what value we put when PBTS is disabled?

@sergio-mena
Copy link
Contributor

@cason Regarding your comments to ProposalTimestampDifference. I fully agree with both

@cason
Copy link
Contributor Author

cason commented Feb 28, 2024

Ok, I will fix that in #2321

@cason
Copy link
Contributor Author

cason commented Feb 29, 2024

This issue mixes experiments to be performed with metrics to be implemented/reviewed.

Should we break the concerns into different issues?

@cason cason changed the title PBTS: define relevant metrics to be evaluated in QA experiments PBTS: experimental values and metrics to be evaluated in QA experiments Feb 29, 2024
cason added a commit that referenced this issue Feb 29, 2024
Contributes to #2323.

Add several buckets to better track `ProposalTimestampDifference` in QA
experiments.

Buckets: `-Inf, -1.5, -1.0, -0.5, 0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5,
4.0, 6.0, 8.0, 10.0, +Inf`

If they are too much, let me know.

---

#### PR checklist

- [ ] Tests written/updated
- [ ] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [ ] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
@cason
Copy link
Contributor Author

cason commented Feb 29, 2024

Metrics for QA experiments for PBTS were updated by #2479.

@cason cason changed the title PBTS: experimental values and metrics to be evaluated in QA experiments PBTS: experimental values to be evaluated in QA experiments Feb 29, 2024
@cason
Copy link
Contributor Author

cason commented Feb 29, 2024

This issue mixes experiments to be performed with metrics to be implemented/reviewed.

Should we break the concerns into different issues?

Created #2480 to track in production metrics.

@cason
Copy link
Contributor Author

cason commented Mar 28, 2024

Are we running scenario 4., namely experiments with clock skew?

@cason cason added this to the 2024-Q1 milestone Mar 28, 2024
@sergio-mena
Copy link
Contributor

@hvanz and I experimented with (a somewhat similar version of) scenario 4, when we were troubleshooting the several problems in our e2e nightlies. In particular, in one testnet where the clock-skewed validators amounted to more that 1/3 of the voting power. When troubleshooting those runs, we could see that the adaptive mechanism we put in place (#2432) is doing its job properly.

I think we can close this issue, as all the other 3 cases are being tracked by @hvanz in the Q1 tracking issue.

@cason
Copy link
Contributor Author

cason commented Mar 28, 2024

Ok, closing this issue as experiment 4. was performed and it worked. We didn't have the goal of publishing its results, as it was a proof of concept.

@cason cason closed this as completed Mar 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
metrics pbts qa Quality assurance
Projects
Status: Done
Development

No branches or pull requests

3 participants