Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[forge] long running tests get more storage #13904

Merged
merged 1 commit into from
Jul 3, 2024
Merged

Conversation

rustielin
Copy link
Contributor

@rustielin rustielin commented Jul 3, 2024

Description

Long running Forge tests need larger disks for performance reasons (larger disks perform higher / have higher IOPS) as well as have the potential to fill the disk. For realistic env tests only (first), if it's a long running test, give it 1TiB

Type of Change

  • New feature
  • Bug fix
  • Breaking change
  • Performance improvement
  • Refactoring
  • Dependency update
  • Documentation update
  • Tests

Which Components or Systems Does This Change Impact?

  • Validator Node
  • Full Node (API, Indexer, etc.)
  • Move/Aptos Virtual Machine
  • Aptos Framework
  • Aptos CLI/SDK
  • Developer Infrastructure
  • Other (specify)

How Has This Been Tested?

Trigger with adhoc forge run and observe disk size. As long as we're providing the helm values correctly in the Forge Test Runner, the helm chart is already tested and should create the spec required to get a larger disk

Test run: https://github.com/aptos-labs/aptos-core/actions/runs/9783196286/job/27011224899

Disks took a few min to provision, so we might need to increase the pod readiness timeout if this gets too long (e.g. for large scale tests)

PVCs show 1000Gi

image

Key Areas to Review

main.rs in Forge that creates the NodeResourceOverride

Checklist

  • I have read and followed the CONTRIBUTING doc
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I identified and added all stakeholders and component owners affected by this change as reviewers
  • I tested both happy and unhappy path of the functionality
  • I have made corresponding changes to the documentation

Copy link

trunk-io bot commented Jul 3, 2024

⏱️ 3h 39m total CI duration on this PR
Job Cumulative Duration Recent Runs
test-fuzzers 1h 55m 🟩🟥🟩🟩
rust-images / rust-all 24m 🟩🟩
forge-e2e-test / forge 15m 🟩
forge-compat-test / forge 15m 🟩
test-target-determinator 10m 🟩🟩
general-lints 7m 🟩🟩🟩
adhoc-forge-test / forge 7m
execution-performance / test-target-determinator 5m 🟩
check 4m 🟩
check-dynamic-deps 3m 🟩🟩🟩🟩
rust-move-tests 3m 🟩
rust-move-tests 3m 🟩
rust-move-tests 3m 🟩
semgrep/ci 2m 🟩🟩🟩🟩
file_change_determinator 43s 🟩🟩🟩🟩
file_change_determinator 42s 🟩🟩🟩🟩
file_change_determinator 35s 🟩🟩🟩
rust-move-tests 20s
permission-check 14s 🟩🟩🟩🟩
permission-check 11s 🟩🟩🟩🟩
permission-check 10s 🟩🟩🟩
permission-check 9s 🟩🟩🟩🟩
execution-performance / single-node-performance 9s 🟩
determine-docker-build-metadata 9s 🟩🟩🟩
permission-check 8s 🟩🟩🟩
forge-framework-upgrade-test / forge 7s 🟩
determine-forge-run-metadata 2s 🟩

🚨 2 jobs on the last run were significantly faster/slower than expected

Job Duration vs 7d avg Delta
forge-framework-upgrade-test / forge 7s 13m -99%
execution-performance / single-node-performance 9s 18m -99%

settingsfeedbackdocs ⋅ learn more about trunk.io

@rustielin rustielin added the CICD:build-images when this label is present github actions will start build+push rust images from the PR. label Jul 3, 2024
@rustielin rustielin enabled auto-merge (squash) July 3, 2024 18:45

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

github-actions bot commented Jul 3, 2024

✅ Forge suite realistic_env_max_load success on 443b8ec8d7920cda659b4c8c41dce6164ac14c65

two traffics test: inner traffic : committed: 8122.410732775274 txn/s, latency: 4808.287313780129 ms, (p50: 4600 ms, p90: 6300 ms, p99: 11000 ms), latency samples: 3524060
two traffics test : committed: 99.92624187773046 txn/s, latency: 2188.0021739130434 ms, (p50: 2000 ms, p90: 2400 ms, p99: 7300 ms), latency samples: 1840
Latency breakdown for phase 0: ["QsBatchToPos: max: 0.224, avg: 0.213", "QsPosToProposal: max: 0.121, avg: 0.108", "ConsensusProposalToOrdered: max: 0.314, avg: 0.294", "ConsensusOrderedToCommit: max: 0.405, avg: 0.371", "ConsensusProposalToCommit: max: 0.702, avg: 0.664"]
Max round gap was 1 [limit 4] at version 1732311. Max no progress secs was 4.964911 [limit 15] at version 1732311.
Test Ok

Copy link
Contributor

github-actions bot commented Jul 3, 2024

✅ Forge suite compat success on 1c2ee7082d6eff8c811ee25d6f5a7d00860a75d5 ==> 443b8ec8d7920cda659b4c8c41dce6164ac14c65

Compatibility test results for 1c2ee7082d6eff8c811ee25d6f5a7d00860a75d5 ==> 443b8ec8d7920cda659b4c8c41dce6164ac14c65 (PR)
1. Check liveness of validators at old version: 1c2ee7082d6eff8c811ee25d6f5a7d00860a75d5
compatibility::simple-validator-upgrade::liveness-check : committed: 7421.369099504519 txn/s, latency: 3829.999556213018 ms, (p50: 2700 ms, p90: 6600 ms, p99: 23200 ms), latency samples: 304200
2. Upgrading first Validator to new version: 443b8ec8d7920cda659b4c8c41dce6164ac14c65
compatibility::simple-validator-upgrade::single-validator-upgrading : committed: 3491.094276940928 txn/s, latency: 7468.001682134571 ms, (p50: 9000 ms, p90: 9900 ms, p99: 10200 ms), latency samples: 86200
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 3211.5867999402208 txn/s, latency: 9672.632114768425 ms, (p50: 9600 ms, p90: 14800 ms, p99: 15100 ms), latency samples: 137320
3. Upgrading rest of first batch to new version: 443b8ec8d7920cda659b4c8c41dce6164ac14c65
compatibility::simple-validator-upgrade::half-validator-upgrading : committed: 3207.8426645281324 txn/s, latency: 8204.258993255058 ms, (p50: 8800 ms, p90: 9700 ms, p99: 9900 ms), latency samples: 80060
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 3238.1198073931255 txn/s, latency: 9644.169195718212 ms, (p50: 9600 ms, p90: 14800 ms, p99: 15100 ms), latency samples: 138260
4. upgrading second batch to new version: 443b8ec8d7920cda659b4c8c41dce6164ac14c65
compatibility::simple-validator-upgrade::rest-validator-upgrading : committed: 3675.40719851435 txn/s, latency: 6879.006015118791 ms, (p50: 6600 ms, p90: 12800 ms, p99: 16000 ms), latency samples: 92600
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 5991.445109750072 txn/s, latency: 5418.850074048262 ms, (p50: 5100 ms, p90: 9400 ms, p99: 10300 ms), latency samples: 229580
5. check swarm health
Compatibility test for 1c2ee7082d6eff8c811ee25d6f5a7d00860a75d5 ==> 443b8ec8d7920cda659b4c8c41dce6164ac14c65 passed
Test Ok

@rustielin rustielin merged commit 6b6404d into main Jul 3, 2024
100 checks passed
@rustielin rustielin deleted the rustielin/forge-storage branch July 3, 2024 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CICD:build-images when this label is present github actions will start build+push rust images from the PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants