Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cache the VK when the VM is created #12975

Merged
merged 3 commits into from
Apr 25, 2024
Merged

cache the VK when the VM is created #12975

merged 3 commits into from
Apr 25, 2024

Conversation

alinush
Copy link
Contributor

@alinush alinush commented Apr 22, 2024

Description

This optimization defers updating the on-chain Groth16 VK to the end of a (consensus) epoch using @zjma's ConfigBuffer. (Originally, tried a different approach in this PR).

This is done for two reasons:

  1. It is the new correct way of updating on-chain configs, post randomness / DKGs.
  2. This allows us to maintain a fresh view of the deserialized VK in Rust when creating a new VM, and speed up TXN validation.

This PR also contains a few minor changes:

  1. Small refactoring
  2. Adds a few tests and benchmarks

Authenticator size tests

Tested the byte sizes of a serialized keyless pubkey and signature:

ZKP-less TXN sizes
------------
KeylessSignature BCS size: 1033
KeylessPublicKey BCS size: 61

(Signature size would have been 294 if the extra_field was set to None.)

ZKP-based TXN sizes
------------
With Google as the OIDC provider:
 - KeylessPublicKey BCS size: 61

With extra_field set to "family_name":"Straka",
 - KeylessSignature BCS size: 318

Without extra_field:
 - KeylessSignature BCS size: 294

Keyless TXN verification times

Before this change, for each TXN verification we incurred:

VK deserialization time: 1.92ms
 + If we avoid point validation: 1.00ms
Public inputs hash time: 1.609917ms
Proof deserialization time: 418.5µs
Proof verification time: 1.069708ms
------------------------------------
Total time: 5.01ms

After this change:

VK deserialization time: 0
Public inputs hash time: 1.609917ms
Proof deserialization time: 418.5µs
Proof verification time: 1.069708ms
------------------------------------
Total time: 3.10ms

Type of Change

  • New feature
  • Bug fix
  • Breaking change
  • Performance improvement
  • Refactoring
  • Dependency update
  • Documentation update
  • Tests

Which Components or Systems Does This Change Impact?

  • Validator Node
  • Full Node (API, Indexer, etc.)
  • Move/Aptos Virtual Machine
  • Aptos Framework
  • Aptos CLI/SDK
  • Developer Infrastructure
  • Other (specify)

How Has This Been Tested?

  1. VM tests in aptos-move/e2e-move-tests
  2. Smoke tests in testsuite/smoke

In both test suites, we submit keyless TXNs to test that:

  1. Old proof verifies for old VK
  2. New proof does not verify for old VK
  3. Change the old VK to the new VK via a governance proposal
  4. Old proof does not verify for new VK
  5. New proof verifies for new VK

Key Areas to Review

  1. The changes to Aptos VM; i.e., reading the on-chain VK and remembering it inside the VM
  2. The changes to keyless_account.move; i.e., are properly deferring all on-chain config updates via config_buffer.move
  3. The early-abort in AptosVM::validate_signed_transaction when no VK is cached in the VM (in aptos-move/aptos-vm/src/aptos_vm.rs)

Checklist

  • I have read and followed the CONTRIBUTING doc
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I identified and added all stakeholders and component owners affected by this change as reviewers
  • I tested both happy and unhappy path of the functionality
  • I have made corresponding changes to the documentation

Copy link

trunk-io bot commented Apr 22, 2024

⏱️ 26h 32m total CI duration on this PR
Job Cumulative Duration Recent Runs
rust-targeted-unit-tests 5h 43m 🟩🟩🟩 (+15 more)
rust-move-unit-coverage 4h 37m 🟩🟩🟩 (+14 more)
rust-move-tests 2h 58m 🟩🟩🟩🟩 (+15 more)
windows-build 1h 58m 🟩🟩🟩
rust-lints 1h 46m 🟩🟩🟩🟩 (+15 more)
run-tests-main-branch 1h 38m 🟩🟩🟩🟩 (+18 more)
rust-smoke-tests 1h 23m 🟩🟩
rust-unit-tests 1h 22m 🟥🟩
execution-performance / single-node-performance 1h 15m 🟩🟩🟩
rust-images / rust-all 42m 🟩🟩
general-lints 35m 🟩🟩🟩🟩 (+15 more)
forge-e2e-test / forge 34m 🟩🟩
check-dynamic-deps 32m 🟩🟩🟩🟩🟩 (+15 more)
forge-compat-test / forge 24m 🟩🟩
rust-build-cached-packages 14m 🟩🟩🟩
check 14m 🟩🟩🟩
cli-e2e-tests / run-cli-tests 12m 🟩🟩
semgrep/ci 9m 🟩🟩🟩🟩🟩 (+14 more)
file_change_determinator 4m 🟩🟩🟩🟩🟩 (+17 more)
file_change_determinator 4m 🟩🟩🟩🟩🟩 (+14 more)
node-api-compatibility-tests / node-api-compatibility-tests 2m 🟩🟩
permission-check 1m 🟩🟩🟩🟩🟩 (+17 more)
permission-check 1m 🟩🟩🟩🟩🟩 (+16 more)
permission-check 1m 🟩🟩🟩🟩🟩 (+17 more)
permission-check 1m 🟩🟩🟩🟩🟩 (+17 more)
file_change_determinator 32s 🟩🟩🟩
execution-performance / file_change_determinator 25s 🟩🟩🟩
permission-check 9s 🟩🟩🟩
determine-docker-build-metadata 8s 🟩🟩🟩

🚨 4 jobs on the last run were significantly faster/slower than expected

Job Duration vs 7d avg Delta
rust-targeted-unit-tests 22m 16m +34%
rust-move-tests 12m 10m +28%
forge-e2e-test / forge 18m 15m +24%
windows-build 33m 27m +21%

settingsfeedbackdocs ⋅ learn more about trunk.io

Copy link

codecov bot commented Apr 22, 2024

Codecov Report

Attention: Patch coverage is 0% with 209 lines in your changes are missing coverage. Please review.

Project coverage is 62.2%. Comparing base (3b3edf1) to head (69c706c).
Report is 2 commits behind head on main.

Files Patch % Lines
types/src/keyless/circuit_testcases.rs 0.0% 80 Missing ⚠️
aptos-move/aptos-vm/src/keyless_validation.rs 0.0% 52 Missing ⚠️
types/src/keyless/test_utils.rs 0.0% 24 Missing ⚠️
aptos-move/aptos-vm/src/aptos_vm.rs 0.0% 18 Missing ⚠️
types/src/keyless/groth16_sig.rs 0.0% 16 Missing ⚠️
types/src/keyless/groth16_vk.rs 0.0% 15 Missing ⚠️
types/src/keyless/bn254_circom.rs 0.0% 2 Missing ⚠️
types/src/jwks/rsa/mod.rs 0.0% 1 Missing ⚠️
types/src/keyless/openid_sig.rs 0.0% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##             main   #12975     +/-   ##
=========================================
- Coverage    62.3%    62.2%   -0.1%     
=========================================
  Files         828      828             
  Lines      185844   185996    +152     
=========================================
  Hits       115858   115858             
- Misses      69986    70138    +152     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@alinush alinush force-pushed the alin/keyless-vk-cache branch 7 times, most recently from 1587f53 to 96f2bf2 Compare April 24, 2024 16:28
@alinush alinush marked this pull request as ready for review April 24, 2024 16:30
Copy link
Contributor

@igor-aptos igor-aptos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have a lot of commented out printlns throughout the code, is that intentional?

aptos-move/aptos-vm/src/aptos_vm.rs Show resolved Hide resolved
Copy link
Contributor

@igor-aptos igor-aptos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely missed the keyless_account.move change. looks good now

@@ -31,7 +44,7 @@ module aptos_framework::keyless_account {
}

#[resource_group_member(group = aptos_framework::keyless_account::Group)]
struct Configuration has key, store {
struct Configuration has key, store, drop, copy {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think an initialization func for this resource is missing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess update_configuration() played the role before but not any more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually not an issue because on_new_epoch plays this role now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm but you will still need an initialize() for genesis...

Copy link
Contributor Author

@alinush alinush Apr 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. The genesis initialization will merely queue up the changes... A reconfiguration will be needed to apply the buffer.

/// Pre-validate the VK to actively-prevent incorrect VKs from being set on-chain.
fun validate_groth16_vk(vk: &Groth16VerificationKey) {
// Could be leveraged to speed up the VM deserialization of the VK by 2x, since it can assume the points are valid.
assert!(option::is_some(&crypto_algebra::deserialize<bn254_algebra::G1, bn254_algebra::FormatG1Compr>(&vk.alpha_g1)), E_INVALID_BN254_G1_SERIALIZATION);
Copy link
Contributor

@zjma zjma Apr 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if bn254Structures is disabled, we can't update vk?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Is that a problem? I figured it would be wise to check validity here & preempt bad VKs from being set.

Cargo.toml Outdated Show resolved Hide resolved
Copy link
Contributor

@sitalkedia sitalkedia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nits.

aptos-move/aptos-vm/src/aptos_vm.rs Show resolved Hide resolved
aptos-move/aptos-vm/src/aptos_vm.rs Show resolved Hide resolved
aptos-move/aptos-vm/src/keyless_validation.rs Show resolved Hide resolved
aptos-move/aptos-vm/src/keyless_validation.rs Show resolved Hide resolved
aptos-move/aptos-vm/src/keyless_validation.rs Show resolved Hide resolved
let result = info
.client()
.submit_without_serializing_response(&signed_txn)
.await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can just unwrap here.

Copy link
Contributor

@zekun000 zekun000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's too many debug code in the pr

config.max_exp_horizon_secs = max_exp_horizon_secs;

update_configuration(fx, config);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not used in the cache right?

@alinush
Copy link
Contributor Author

alinush commented Apr 24, 2024

One problem left in this PR: setting the VK and Configuration in genesis for devnet will no longer work instantly. The updates will only get applied after an epoch change... Might want to fix that to avoid surprises when we redeploy devnet.

@igor-aptos
Copy link
Contributor

first two epochs are only 1 transaction long, so that should be fine? can we confirm that from the forge run, are the grafana dashboards that you can see when VK kicks in?

@alinush alinush force-pushed the alin/keyless-vk-cache branch 2 times, most recently from 0e79cf1 to 1c88448 Compare April 24, 2024 23:20
@@ -63,8 +66,8 @@ pub(crate) static SAMPLE_JWT_PAYLOAD_JSON: Lazy<String> = Lazy::new(|| {
{}
"locale":"en",
"iat":1700255944,
"exp":2700259544,
"nonce":"{}"
"nonce":"{}",
Copy link
Contributor Author

@alinush alinush Apr 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @heliuchuan, not sure how you succeeded in generating a proof in the past for this since there was no comma after nonce... nor a }

@alinush alinush force-pushed the alin/keyless-vk-cache branch 2 times, most recently from cd98e22 to 0fca756 Compare April 25, 2024 17:05
@alinush alinush enabled auto-merge (squash) April 25, 2024 17:05
@alinush
Copy link
Contributor Author

alinush commented Apr 25, 2024

One problem left in this PR: setting the VK and Configuration in genesis for devnet will no longer work instantly. The updates will only get applied after an epoch change... Might want to fix that to avoid surprises when we redeploy devnet.

Fixed this by directly writing things in genesis.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

this optimization defers the update of the VK to the end of an epoch, which allows us to maintain a fresh view of the deserialized VK in Rust when creating a new VM

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

✅ Forge suite compat success on aptos-node-v1.10.1 ==> 69c706c88808ab17bcecd2276be4f8ee16b6417a

Compatibility test results for aptos-node-v1.10.1 ==> 69c706c88808ab17bcecd2276be4f8ee16b6417a (PR)
1. Check liveness of validators at old version: aptos-node-v1.10.1
compatibility::simple-validator-upgrade::liveness-check : committed: 6167 txn/s, latency: 5320 ms, (p50: 4800 ms, p90: 9900 ms, p99: 12100 ms), latency samples: 228180
2. Upgrading first Validator to new version: 69c706c88808ab17bcecd2276be4f8ee16b6417a
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 1338 txn/s, latency: 18220 ms, (p50: 19400 ms, p90: 28600 ms, p99: 28900 ms), latency samples: 85640
3. Upgrading rest of first batch to new version: 69c706c88808ab17bcecd2276be4f8ee16b6417a
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 1575 txn/s, latency: 18178 ms, (p50: 19000 ms, p90: 28000 ms, p99: 28300 ms), latency samples: 91380
4. upgrading second batch to new version: 69c706c88808ab17bcecd2276be4f8ee16b6417a
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 3394 txn/s, latency: 9313 ms, (p50: 9700 ms, p90: 12600 ms, p99: 12900 ms), latency samples: 142560
5. check swarm health
Compatibility test for aptos-node-v1.10.1 ==> 69c706c88808ab17bcecd2276be4f8ee16b6417a passed
Test Ok

Copy link
Contributor

✅ Forge suite realistic_env_max_load success on 69c706c88808ab17bcecd2276be4f8ee16b6417a

two traffics test: inner traffic : committed: 8117 txn/s, latency: 4830 ms, (p50: 4600 ms, p90: 5700 ms, p99: 10200 ms), latency samples: 3506920
two traffics test : committed: 100 txn/s, latency: 1997 ms, (p50: 1900 ms, p90: 2200 ms, p99: 6300 ms), latency samples: 1840
Latency breakdown for phase 0: ["QsBatchToPos: max: 0.209, avg: 0.205", "QsPosToProposal: max: 0.217, avg: 0.204", "ConsensusProposalToOrdered: max: 0.425, avg: 0.400", "ConsensusOrderedToCommit: max: 0.410, avg: 0.384", "ConsensusProposalToCommit: max: 0.812, avg: 0.784"]
Max round gap was 1 [limit 4] at version 1717. Max no progress secs was 4.8698587 [limit 15] at version 2049597.
Test Ok

@alinush alinush merged commit 750f9ae into main Apr 25, 2024
53 checks passed
@alinush alinush deleted the alin/keyless-vk-cache branch April 25, 2024 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants