Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[consensus] fix the race condition with the order of operations #12251

Merged
merged 1 commit into from
Feb 27, 2024

Conversation

zekun000
Copy link
Contributor

We send out the epoch change notification before spawning the commit task, so there's race condition that epoch manager shuts down the buffer manager before the request created and we lose the final commit task.

Description

Test Plan

We send out the epoch change notification before spawning the commit task, so there's race condition that epoch manager
shuts down the buffer manager before the request created and we lose the final commit task.
Copy link

trunk-io bot commented Feb 27, 2024

@zekun000 zekun000 enabled auto-merge (rebase) February 27, 2024 02:32

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

✅ Forge suite compat success on aptos-node-v1.8.3 ==> 85943e27506be085ef3e8044bf5ede6208fab6cf

Compatibility test results for aptos-node-v1.8.3 ==> 85943e27506be085ef3e8044bf5ede6208fab6cf (PR)
1. Check liveness of validators at old version: aptos-node-v1.8.3
compatibility::simple-validator-upgrade::liveness-check : committed: 4526 txn/s, latency: 6382 ms, (p50: 5400 ms, p90: 11300 ms, p99: 17100 ms), latency samples: 190120
2. Upgrading first Validator to new version: 85943e27506be085ef3e8044bf5ede6208fab6cf
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 1786 txn/s, latency: 16138 ms, (p50: 19000 ms, p90: 22100 ms, p99: 22600 ms), latency samples: 92900
3. Upgrading rest of first batch to new version: 85943e27506be085ef3e8044bf5ede6208fab6cf
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 1242 txn/s, latency: 23050 ms, (p50: 24500 ms, p90: 35000 ms, p99: 37000 ms), latency samples: 60860
4. upgrading second batch to new version: 85943e27506be085ef3e8044bf5ede6208fab6cf
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 3054 txn/s, latency: 10121 ms, (p50: 9900 ms, p90: 14900 ms, p99: 18000 ms), latency samples: 128300
5. check swarm health
Compatibility test for aptos-node-v1.8.3 ==> 85943e27506be085ef3e8044bf5ede6208fab6cf passed
Test Ok

Copy link
Contributor

✅ Forge suite realistic_env_max_load success on 85943e27506be085ef3e8044bf5ede6208fab6cf

two traffics test: inner traffic : committed: 8627 txn/s, latency: 4516 ms, (p50: 4200 ms, p90: 5100 ms, p99: 9600 ms), latency samples: 3718260
two traffics test : committed: 100 txn/s, latency: 2126 ms, (p50: 2100 ms, p90: 2400 ms, p99: 2700 ms), latency samples: 1800
Latency breakdown for phase 0: ["QsBatchToPos: max: 0.237, avg: 0.200", "QsPosToProposal: max: 0.162, avg: 0.149", "ConsensusProposalToOrdered: max: 0.558, avg: 0.520", "ConsensusOrderedToCommit: max: 0.463, avg: 0.430", "ConsensusProposalToCommit: max: 0.988, avg: 0.950"]
Max round gap was 1 [limit 4] at version 1615160. Max no progress secs was 4.798673 [limit 10] at version 1615160.
Test Ok

@zekun000 zekun000 merged commit c52fdc5 into aptos-release-v1.9 Feb 27, 2024
88 of 98 checks passed
@zekun000 zekun000 deleted the zekun/cherrypick branch February 27, 2024 03:09
Copy link
Contributor

❌ Forge suite framework_upgrade failure on aptos-node-v1.8.3 ==> 85943e27506be085ef3e8044bf5ede6208fab6cf

Compatibility test results for aptos-node-v1.8.3 ==> 85943e27506be085ef3e8044bf5ede6208fab6cf (PR)
Upgrade the nodes to version: 85943e27506be085ef3e8044bf5ede6208fab6cf
Test Failed: API error: Unknown error error sending request for url (http://aptos-node-3-validator.forge-framework-upgrade-pr-12251.svc:8080/v1/estimate_gas_price): error trying to connect: dns error: failed to lookup address information: Name or service not known

Stack backtrace:
   0: <unknown>
   1: <unknown>
   2: <unknown>
   3: <unknown>
   4: <unknown>
   5: <unknown>
   6: <unknown>
   7: <unknown>
   8: <unknown>
   9: <unknown>
  10: <unknown>
  11: <unknown>
  12: __libc_start_main
  13: <unknown>
Trailing Log Lines:
   5: <unknown>
   6: <unknown>
   7: <unknown>
   8: <unknown>
   9: <unknown>
  10: <unknown>
  11: <unknown>
  12: __libc_start_main
  13: <unknown>


Swarm logs can be found here: See fgi output for more information.
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ApiError: namespaces "forge-framework-upgrade-pr-12251" not found: NotFound (ErrorResponse { status: "Failure", message: "namespaces \"forge-framework-upgrade-pr-12251\" not found", reason: "NotFound", code: 404 })

Caused by:
    namespaces "forge-framework-upgrade-pr-12251" not found: NotFound

Stack backtrace:
   0: <unknown>
   1: <unknown>
   2: <unknown>
   3: <unknown>
   4: <unknown>
   5: <unknown>
   6: <unknown>
   7: <unknown>
   8: <unknown>
   9: <unknown>
  10: <unknown>
  11: <unknown>
  12: <unknown>
  13: <unknown>
  14: <unknown>
  15: __libc_start_main
  16: <unknown>', testsuite/forge/src/backend/k8s/swarm.rs:676:18
stack backtrace:
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Debugging output:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants