Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

propose system transactions #10971

Merged
merged 24 commits into from
Nov 24, 2023
Merged

propose system transactions #10971

merged 24 commits into from
Nov 24, 2023

Conversation

zjma
Copy link
Contributor

@zjma zjma commented Nov 18, 2023

Description

This PR introduces a new feature ProposeSystemTransactions: when enabled, validator should do the following:

  • When proposing a block, use variant ProposalExt and include SystemTransactions from a list of SystemTransactionProviders it was given.
    • The proposal will be filled by in the following order: system_txn_provider[0] > ... > system_txn_provider[-1] > payload.
    • System transactions and user transactions will share txn_count_limit and block_size_limit.
  • When verifying a proposal, ensure it is of variant ProposalExt.

TODOs

  • Add system transactions into DAG node.

@zjma zjma added the CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR label Nov 18, 2023

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

consensus/consensus-types/src/block_data.rs Show resolved Hide resolved
consensus/src/liveness/proposal_generator.rs Outdated Show resolved Hide resolved
consensus/src/round_manager.rs Outdated Show resolved Hide resolved
);
}

let (num_sys_txns, sys_txns_total_bytes) = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

proposal.sys_txns().map_or(...)

and for the closure you can use reduce.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

address the comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed. i thought nit is fine xD

@@ -261,6 +261,8 @@ impl NodeSetup {
PipelineBackpressureConfig::new_no_backoff(),
ChainHealthBackoffConfig::new_no_backoff(),
false,
vec![],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should have some test coverage when it's enabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure. how do we want to do it: forking every test here with sys_txn_enabled = true?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prop test?

testing everything with true is probably also ok.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@grao1991 how to run prop tests? any example?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -132,6 +133,7 @@ pub struct EpochManager<P: OnChainConfigProvider> {
commit_state_computer: Arc<dyn StateComputer>,
storage: Arc<dyn PersistentLivenessStorage>,
safety_rules_manager: SafetyRulesManager,
sys_txn_providers: Vec<Arc<dyn SysTxnProvider>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you probably don't need shared ownership here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. But something might be easier if it's shared ownership.
E.g., at the beginning of every epoch, we need to send a clone of this vec from epoch manager to the newly created proposal manager.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can either send a &, or just let proposal manager owns it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

epochmgr and roundmgr (owner of proposal manager) each has their own thread, I don't know how to send a & here... maybe you can suggest a sub-PR :)

move it into roundmgr (then proposal manager) is possible, but then epochmgr needs to take it back before shutting down roundmgr... doesn't sound simpler than cloning Arcs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arc is what we've been doing unless we can figure out how to convince compiler the lifetime of round manager is shorter than epoch manager

@@ -93,6 +93,7 @@ pub enum FeatureFlag {
VMBinaryFormatV7,
ResourceGroupsChargeAsSizeSum,
CommissionChangeDelegationPool,
ProposeSystemTransactions,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better to put in OnChainConsensusConfig instead? @zekun000

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FeatureFlag makes more sense to me. It is not consensus specific feature.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh this is more like a consensus feature whether to propose system txn

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong opinions

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

@@ -633,10 +634,27 @@ impl RoundManager {
.author()
.expect("Proposal should be verified having an author");

if !self.onchain_config.should_propose_system_txns()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should add unit test for the new behavior

@zjma zjma enabled auto-merge (squash) November 24, 2023 22:08

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

✅ Forge suite compat success on aptos-node-v1.7.3 ==> fcfd1d395bc0a0b0b36773f933cde09da0ad943b

Compatibility test results for aptos-node-v1.7.3 ==> fcfd1d395bc0a0b0b36773f933cde09da0ad943b (PR)
1. Check liveness of validators at old version: aptos-node-v1.7.3
compatibility::simple-validator-upgrade::liveness-check : committed: 3121 txn/s, latency: 6693 ms, (p50: 6900 ms, p90: 10200 ms, p99: 10500 ms), latency samples: 190420
2. Upgrading first Validator to new version: fcfd1d395bc0a0b0b36773f933cde09da0ad943b
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 1850 txn/s, latency: 15622 ms, (p50: 19200 ms, p90: 22000 ms, p99: 22500 ms), latency samples: 92540
3. Upgrading rest of first batch to new version: fcfd1d395bc0a0b0b36773f933cde09da0ad943b
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 1617 txn/s, latency: 17559 ms, (p50: 19800 ms, p90: 26800 ms, p99: 27900 ms), latency samples: 77660
4. upgrading second batch to new version: fcfd1d395bc0a0b0b36773f933cde09da0ad943b
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 3296 txn/s, latency: 9466 ms, (p50: 9900 ms, p90: 12500 ms, p99: 16400 ms), latency samples: 128580
5. check swarm health
Compatibility test for aptos-node-v1.7.3 ==> fcfd1d395bc0a0b0b36773f933cde09da0ad943b passed
Test Ok

Copy link
Contributor

✅ Forge suite realistic_env_max_load success on fcfd1d395bc0a0b0b36773f933cde09da0ad943b

two traffics test: inner traffic : committed: 7404 txn/s, latency: 5284 ms, (p50: 5100 ms, p90: 6600 ms, p99: 10500 ms), latency samples: 3213600
two traffics test : committed: 100 txn/s, latency: 2332 ms, (p50: 2300 ms, p90: 2700 ms, p99: 3200 ms), latency samples: 1900
Latency breakdown for phase 0: ["QsBatchToPos: max: 0.204, avg: 0.196", "QsPosToProposal: max: 0.188, avg: 0.173", "ConsensusProposalToOrdered: max: 0.684, avg: 0.633", "ConsensusOrderedToCommit: max: 0.554, avg: 0.530", "ConsensusProposalToCommit: max: 1.237, avg: 1.162"]
Max round gap was 1 [limit 4] at version 1538509. Max no progress secs was 4.191058 [limit 10] at version 1538509.
Test Ok

@zjma zjma merged commit 3328b61 into main Nov 24, 2023
53 of 54 checks passed
@zjma zjma deleted the propose-system-transactions branch November 24, 2023 23:18
Copy link
Contributor

✅ Forge suite framework_upgrade success on aptos-node-v1.7.3 ==> fcfd1d395bc0a0b0b36773f933cde09da0ad943b

Compatibility test results for aptos-node-v1.7.3 ==> fcfd1d395bc0a0b0b36773f933cde09da0ad943b (PR)
Upgrade the nodes to version: fcfd1d395bc0a0b0b36773f933cde09da0ad943b
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 7019 txn/s, latency: 4800 ms, (p50: 4800 ms, p90: 7700 ms, p99: 8300 ms), latency samples: 245680
5. check swarm health
Compatibility test for aptos-node-v1.7.3 ==> fcfd1d395bc0a0b0b36773f933cde09da0ad943b passed
Test Ok

bowenyang007 pushed a commit that referenced this pull request Dec 7, 2023
* propose system transaction, initial

* update feature name

* fmt

* update

* update

* format update

* updapte

* fmt

* filter out pending sys txns

* UTs; resolve comments

* comments

* sys txn pool

* update

* feature flag -> consensus config

* remove feature flag

* nitfix

* dummytopic as test-only

* bugfix

* update existing tests

* uts for receiving limits
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants