Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[vm] Resource access control: runtime engine #10544

Merged
merged 6 commits into from
Feb 21, 2024
Merged

[vm] Resource access control: runtime engine #10544

merged 6 commits into from
Feb 21, 2024

Conversation

wrwg
Copy link
Contributor

@wrwg wrwg commented Oct 17, 2023

Implements the runtime engine for resource access control:

  • a representation of access control specifiers in loaded_data::runtime_access_specifiers.
  • a loader for access specifiers in runtime::loader::access_specifier_loader.
  • a new stateful object representing the access control logic in runtime::access_control.
  • finally the use of the AccessControlState in runtime::interpreter
  • unit test and combinatorial proptests

crates/aptos-admin-service/src/server/mod.rs Outdated Show resolved Hide resolved
self.gen_select(target, id, mid.qualified_inst(*sid, inst), *fid, &arg)
// Get the instantiation of the struct. It's not contained in the select
// expression but in the type of it's operand.
if let Some((_, inst)) = self
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lines 528-532 are nearly identical (modulo "arg" and "oper") to lines 944-948. Maybe a helper function with a meaningful name would be useful for readability?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to extract it into a function but it is not entirely possible because of lifetime issues. env().get_node_type() returns a temporary type on the stack from which you can't borrow inside of a helper function. However, I introduced a helper for get_node_type reducing a little of the boilerplate, and applied the new helper over the code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I now understand this pain of Rust. (Lifetimes block abstraction.)

if callee.module_id == module.get_id() {
let count = usage.len();
usage.extend(usage_map[&callee.id].iter().cloned());
if usage.len() > count {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can move the stuff involving count out of the callee loop here, just test once per fun. It also simplifies the code, not just an opt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, done.

let mut usage = usage_map[&fun.get_id()].clone();
let mut local_changes = false;
for callee in callees {
if callee.module_id == module.get_id() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth a comment to explain why we don't care about calls to other modules.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

usage_map
}

fn get_function_acquires(&self, fun: &FunctionEnv) -> BTreeSet<StructId> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Include "local" in the name so we know this is not including calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
}

/// Returns true of the concrete access instance is enabled.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"of" -> "if"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

//! There is potential for optimization by simplifying exclusions are effectively negations,
//! such a simplification is not trivial and may require recursive specifiers, which
//! we like to avoid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do access control specifiers exactly specify the set of resources which may be accessed, or are they an approximation? If an approximation, which way can errors go?

If this is correct, can we specify it explicitly? 'join' operation corresponds to intersection of the sets of possibly accessed resources.

What does "specialize" mean? I'm guessing, that any address parameters mentioned in an AccessSpecifier are made specific given the address values available at runtime. Can we have an explicit comment about that somewhere? Perhaps on the "specialize" method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to reformulate and be a bit more precise, take a look again.

rtso pushed a commit that referenced this pull request Nov 22, 2023
* [compiler v2] Pulls out fixes from #10544 to umblock testing

There are fixes how to deal with global resources in #10544, pulling them out to unblock testing.

* Addressing reviewer comments, fixing tests, adding test from original PR
return Err(PartialVMError::new(StatusCode::UNKNOWN_VERSION)
.with_message(format!("bytecode version {} unsupported", version)));
} else if version == VERSION_NEXT
&& !cfg!(test)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think we can use all() or any() to combine these

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

bowenyang007 pushed a commit that referenced this pull request Dec 7, 2023
* [compiler v2] Pulls out fixes from #10544 to umblock testing

There are fixes how to deal with global resources in #10544, pulling them out to unblock testing.

* Addressing reviewer comments, fixing tests, adding test from original PR
Copy link
Contributor

@brmataptos brmataptos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a lot more to read, but here are a few more comments.

@@ -13,7 +13,7 @@ edition = "2021"
anyhow = "1.0.52"
clap = { version = "4.3.9", features = ["derive"] }
colored = "2.0.0"
move-binary-format = { path = "../../move-binary-format" }
move-binary-format = { path = "../../move-binary-format", features = ["testing"] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a "testing" feature used anywhere in move-binary-format. What does it do? Where is it defined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its used to double guard v7 file format, only with this feature one can use v7

if let Some(frame) = self.call_stack.pop() {
// Note: the caller will find the callee's return values at the top of the shared operand stack
current_frame = frame;
current_frame.pc += 1; // advance past the Call instruction in the caller
} else {
// end of execution. `self` should no longer be used afterward
// Clean up access control
self.access_control
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't you just do this on line 170-172?

Copy link
Contributor

@vgao1996 vgao1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2/n Feature gating

Comment on lines 531 to 532
&& !cfg!(feature = "testing")
&& !cfg!(feature = "fuzzing")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd strongly recommend we move away from gating critical features by the crate feature flags, due to cargo's handling of feature unification.

  • Right now there is a single fragile test during node startup ensuring that the node binary is not built with the "testing" feature flag
  • The test does not check the "fuzzing" feature flag

It would be a lot safer to pass in a bool variable indicating whether the next version shall be accepted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed offline, this is not feature gating. Its a double safeguard that experimental code cannot run in production. The actual feature gating of the maximally supported bytecode version continues to be done via regular feature gating and on-chain config.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual feature gating of the maximally supported bytecode version continues to be done via regular feature gating and on-chain config.

I don't see this in the current implementation though. Have I missed anything?

My worry right now is just that if someone sends a move binary with NEXT_VERSION against the current implementation, it will be accepted if the cargo features are misconfigured, which sounds dangerous for production.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay now I see it. So the logic is that even if a binary w/ VERSION_NEXT passes the first check based on dynamic configuration, it will still fail the second check if we're not in any of the test modes. Indeed this is a double safeguard.

@wrwg wrwg force-pushed the wg-access-runtime branch 2 times, most recently from 011237c to 26bd182 Compare February 5, 2024 16:54
Copy link
Contributor

@vgao1996 vgao1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me now!

Implements the runtime engine for resource access control:

- a representation of access control specifiers in `loaded_data::runtime_access_specifiers`.
- a loader for access specifiers in `runtime::loader::access_specifier_loader`.
- a new stateful object representing the access control logic in `runtime::access_control`.
- finally the use of the `AccessControlState` in `runtime::interpreter`.

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

✅ Forge suite compat success on aptos-node-v1.9.5 ==> faba733a804774ca13112e3f1555e894f8148082

Compatibility test results for aptos-node-v1.9.5 ==> faba733a804774ca13112e3f1555e894f8148082 (PR)
1. Check liveness of validators at old version: aptos-node-v1.9.5
compatibility::simple-validator-upgrade::liveness-check : committed: 6200 txn/s, latency: 5391 ms, (p50: 4800 ms, p90: 9000 ms, p99: 17200 ms), latency samples: 217000
2. Upgrading first Validator to new version: faba733a804774ca13112e3f1555e894f8148082
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 1737 txn/s, latency: 16328 ms, (p50: 17800 ms, p90: 22300 ms, p99: 22900 ms), latency samples: 90360
3. Upgrading rest of first batch to new version: faba733a804774ca13112e3f1555e894f8148082
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 137 txn/s, submitted: 446 txn/s, expired: 308 txn/s, latency: 50406 ms, (p50: 48500 ms, p90: 58500 ms, p99: 86900 ms), latency samples: 12417
4. upgrading second batch to new version: faba733a804774ca13112e3f1555e894f8148082
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 2110 txn/s, latency: 13872 ms, (p50: 13600 ms, p90: 19600 ms, p99: 19900 ms), latency samples: 103400
5. check swarm health
Compatibility test for aptos-node-v1.9.5 ==> faba733a804774ca13112e3f1555e894f8148082 passed
Test Ok

Copy link
Contributor

✅ Forge suite realistic_env_max_load success on faba733a804774ca13112e3f1555e894f8148082

two traffics test: inner traffic : committed: 7636 txn/s, latency: 5147 ms, (p50: 5000 ms, p90: 6000 ms, p99: 14700 ms), latency samples: 3291400
two traffics test : committed: 100 txn/s, latency: 2247 ms, (p50: 2100 ms, p90: 2300 ms, p99: 11000 ms), latency samples: 1780
Latency breakdown for phase 0: ["QsBatchToPos: max: 0.238, avg: 0.208", "QsPosToProposal: max: 0.514, avg: 0.365", "ConsensusProposalToOrdered: max: 0.605, avg: 0.546", "ConsensusOrderedToCommit: max: 0.496, avg: 0.468", "ConsensusProposalToCommit: max: 1.054, avg: 1.014"]
Max round gap was 1 [limit 4] at version 1431321. Max no progress secs was 9.555537 [limit 15] at version 1431321.
Test Ok

@wrwg wrwg merged commit f325f52 into main Feb 21, 2024
87 of 100 checks passed
@wrwg wrwg deleted the wg-access-runtime branch February 21, 2024 21:40
zjma added a commit that referenced this pull request Feb 22, 2024
* Fix `iss`-related bug in Groth16 path & refactor (#12017)

Co-authored-by: Oliver <heliuchuan@gmail.com>

* [aptosvm] Simplify VM flows (#11888)

* Duplicated logic for creating the gas meter for view functions has been removed.
* Duplicated logic for calculating gas used for view functions has been removed.
* There was unreachable code in failure transaction cleanup, where the discarded
status has been returned immediately, but then re-checked again. The first check
is shifted inside.
* No more default transaction metadata.
* Scripts are now validated consistently.
* Simplifies transaction execution function signature to avoid `Option<String>`.
* Removes duplicated features from `AptosVM` and keeps them in `MoveVMExt`.
* Fixes a bug when script hash was not computed for `RunOnAbort`.

Related tests are moved  to `move-e2e-tests`.

* [Compiler V2] Critical edge elimination (#11894)

Implement a pass to eliminate critical edges by splitting them with empty blocks

* [consensus configs] reduce sending block size from 2500 to 1900 (#12091)

### Description

The block output limit is no longer hit with p2p txns.

### Test Plan

Forge `realistic_env_max_load` TPS improves.

* [Indexer-grpc] Add profiling support. (#12034)

* Minor aggregator cleanup (#12013)

* Minor aggregator cleanup

* Addressing PR comments

* [move] rotate_authentication_key_call should not modify OriginatingAddress (#12108)

Co-authored-by: Alin Tomescu <tomescu.alin@gmail.com>

* [Data Streaming Service] Add dynamic prefetching support

* [Data Streaming Service] Add dynamic prefetching unit tests.

* [Data Streaming Service] Update existing integration tests.

* [State Sync] Add backpressure to fast sync receiver.

* Update perf baseline for gas charging coverage improvements (reducing throughput) (#12124)

* Reduce latency of cloning network sender using Arc pointers (#12103)

* Avoid cloning network sender using Arc pointers

* Removing a clone

* 100 node sweep test

* Removing a few clone operations

* reset forge test

* Removing some clones

* Removing clones

* adopt AIP-61 terminology for consistency (#12123)

adopt AIP-61 terminology for consistency

* [Consensus] Remove non-decoupled execution and refactor for cleaner interfaces (#12104)

* fix jwk key logging (#12090)

* remove spurious error lines (#12137)

* randomness #1: types update from randomnet (#12106)

* types update from randomnet

* update

* lint

* lint

* All validators broadcast commit vote messages (#12059)

* All validators broadcast commit messages

* Forge testing

* Increase timeout for forge

* test forge realistic_env_workload_sweep_test

* run realistic_env_workload_sweep_test

* run realistic_env_workload_sweep_test

* run sweep test

* increase forge runner duration

* forge testing

* Letting the proposer also broadcast commit decision for backward compatibility

* removing forge changes

* Added a TODO

* [vm] Resource access control: runtime engine (#10544)

* [vm] Resource access control: runtime engine

Implements the runtime engine for resource access control:

- a representation of access control specifiers in `loaded_data::runtime_access_specifiers`.
- a loader for access specifiers in `runtime::loader::access_specifier_loader`.
- a new stateful object representing the access control logic in `runtime::access_control`.
- finally the use of the `AccessControlState` in `runtime::interpreter`.

* Addressing reviewer comments.

* Addressing reviewer comments.

* typo: PTLA -> PTAL

* Rebasing: adjusting to upstream changes

* Rebasing

* ObjectCodeDeployment API cleanup update (#12133)

* ObjectCodeDeployment API cleanup update (#12141)

* [Compiler-v2] porting more V1 unit tests to V2 (#12085)

* update tests

* fix bug

* fix-12116

* fix missing space

* add expected got

* remove live-var tests

* fix had_erros

* fix

* Enable the max object nesting check (#12129)

* Resolved the warning for unused variable (#12157)

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* Squashed commit of the following:

commit a50ffec
Author: Zhoujun Ma <zhoujun@aptoslabs.com>
Date:   Thu Feb 22 21:10:12 2024 +0000

    lint

commit 388350f
Author: zhoujun.ma <zhoujun@aptoslabs.com>
Date:   Thu Feb 22 13:04:28 2024 -0800

    update

commit 76f7eca
Author: zhoujun.ma <zhoujun@aptoslabs.com>
Date:   Thu Feb 22 12:56:04 2024 -0800

    update

commit a663542
Author: zhoujun.ma <zhoujun@aptoslabs.com>
Date:   Thu Feb 22 12:54:18 2024 -0800

    update

commit b439449
Author: zhoujun.ma <zhoujun@aptoslabs.com>
Date:   Thu Feb 22 12:34:14 2024 -0800

    update

commit 3378ceb
Author: zhoujun.ma <zhoujun@aptoslabs.com>
Date:   Thu Feb 22 12:17:06 2024 -0800

    update

commit 6cd6685
Author: zhoujun.ma <zhoujun@aptoslabs.com>
Date:   Thu Feb 22 12:15:05 2024 -0800

    update

commit 6d89f37
Author: zhoujun.ma <zhoujun@aptoslabs.com>
Date:   Thu Feb 22 12:13:51 2024 -0800

    update

commit 980f257
Author: zhoujun.ma <zhoujun@aptoslabs.com>
Date:   Thu Feb 22 12:12:04 2024 -0800

    update

commit 16e9349
Author: Zhoujun Ma <zhoujun@aptoslabs.com>
Date:   Thu Feb 22 18:25:08 2024 +0000

    lint

---------

Co-authored-by: Alin Tomescu <tomescu.alin@gmail.com>
Co-authored-by: Oliver <heliuchuan@gmail.com>
Co-authored-by: George Mitenkov <georgemitenk0v@gmail.com>
Co-authored-by: Zekun Wang <41706692+fEst1ck@users.noreply.github.com>
Co-authored-by: Brian (Sunghoon) Cho <brian@aptoslabs.com>
Co-authored-by: Guoteng Rao <3603304+grao1991@users.noreply.github.com>
Co-authored-by: Satya Vusirikala <satyasatya123456@gmail.com>
Co-authored-by: David Wolinsky <isaac.wolinsky@gmail.com>
Co-authored-by: Josh Lind <josh.lind@hotmail.com>
Co-authored-by: igor-aptos <110557261+igor-aptos@users.noreply.github.com>
Co-authored-by: Sital Kedia <sitalkedia@users.noreply.github.com>
Co-authored-by: Wolfgang Grieskamp <wg@aptoslabs.com>
Co-authored-by: Teng Zhang <rahxephon89@163.com>
Co-authored-by: Junkil Park <jpark@aptoslabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

6 participants