Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[compiler v2] Resource access control (read-write sets) #10480

Merged
merged 6 commits into from
Oct 29, 2023
Merged

[compiler v2] Resource access control (read-write sets) #10480

merged 6 commits into from
Oct 29, 2023

Conversation

wrwg
Copy link
Contributor

@wrwg wrwg commented Oct 12, 2023

Description

This is an e2e implementation of resource access control for Move, with most parts in place:

  • Replaces the acquires syntax in a downwards-compatible way
  • The extended syntax is only available in compiler v2
  • One can now specify acquires, reads, and writes
  • One can specify the address of a resource in dependency of parameters
  • Multiple levels of wildcards are allowed, e.g. acquires *(object::address_of(param)) specifies that all resources at the given address are read or written.
  • Implements parsing->expansion->move model->file format generator
  • Extends file_format::FunctionHandle to carry the new information, introducing bytecode version v7. v7 became the new experimental version only available in test code for now.
  • TODO: dynamic runtime checking of resource access. Static analysis is also on the horizon, but not needed for an MVP of this feature.
  • TODO: bytecode verification of access specifiers

An AIP for this new feature will be filed soon.

As an example, here is some extract from the tests:

module 0x42::m {

    struct S has store {}
    struct R has store {}
    struct T has store {}
    struct G<T> has store {}

    fun f1() acquires S {
    }

    fun f2() reads S {
    }

    fun f3() writes S {
    }

    fun f4() acquires S(*) {
    }

    fun f_multiple() acquires R reads R writes T, S reads G<u64> {
    }

    fun f5() acquires 0x42::*::* {
    }

    fun f6() acquires 0x42::m::R {
    }

    fun f7() acquires *(*) {
    }

    fun f8() acquires *(0x42) {
    }

    fun f9(a: address) acquires *(a) {
    }

    fun f10(x: u64) acquires *(make_up_address(x)) {
    }

    fun make_up_address(x: u64): address {
        @0x42
    }
}

Test Plan

Unit tests for v2 and v1 compiler.

Copy link
Contributor

@brmataptos brmataptos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will have more later, but send what I have.

@@ -100,7 +100,7 @@ impl<'e, E: ExecutorView> StorageAdapter<'e, E> {
self.accurate_byte_count = true;
}
self.max_binary_format_version =
get_max_binary_format_version(features, gas_feature_version);
get_max_binary_format_version(features, Some(gas_feature_version));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this have to do with Read/Write sets?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We constantly improve the code while we are visiting it. Here I was improving the way how the the current default binary format version was determined, since I'm adding a new version. There were multiple places how the default was retrieved ad hoc, and it took me some to figure this out, so I unified them into this function here. However, for the unification I needed to make the 2nd parameter optional.

// For historical reasons, we support still < gas version 5, but if a new caller don't specify
// the gas version, we default to 5, which was introduced in late '22.
let gas_feature_version = gas_feature_version_opt.unwrap_or(5);
if gas_feature_version < 5 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also seems like unrelated code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same thing as above, dealing with a new file format version in a better way.

Copy link
Contributor

@brmataptos brmataptos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Atomically generated lattices are useful, since they are closed under negation, union, intersection. If we can represent subsets of A, B, C as well as * which includes "unknown others", it's best to also include a rep for "unknown others" (to give flexibility in representing unions without jumping to *, and more precision in intersection and negation).

/// // All resources in the module
/// reads 0xcafe::my_module::*;
/// // The given resource in the module, at arbitray address
/// reads 0xcafe::my_module::R(*);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a way to represent "Any address except an address of an object or other cryptographically unique namespace" as opposed to "Any address at all". Not sure how to do that in a simple way. We can already write "any object field of type R" R(object::address_of(*)) but we need "any non-object field of type R", e.g., R(literal(*)). But we should probably make the most common one * and less common one wordier.

Copy link
Contributor Author

@wrwg wrwg Oct 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds interesting. With the recently added negation, we can at least say !acquires R(object::owner(o)). But being able to specify cryptographic namespaces is challenging. At least currently, object addresses are encrypted, and is not possible to test whether an object address is derived from a seed, address, or txn hash. We would need to actually divide addresses into components.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Though if the the cryptographically unique addresses of object::owner(o) can't be leaked out of runtime, then any address coming in shouldn't alias with that. But in any case, try to make sure the lattice is complete at each level so you can have closure under negation, join, and intersection.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a chat with Aaron last week I now understand that object::owner(o) can be leaked, so could alias something else unless it's just been created.

@wrwg wrwg force-pushed the wg-rw branch 5 times, most recently from a65bbd7 to 94eccdd Compare October 17, 2023 17:40
@wrwg wrwg force-pushed the wg-rw branch 5 times, most recently from a1a82e4 to e6efadf Compare October 19, 2023 23:47
@wrwg wrwg added the CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR label Oct 20, 2023
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

This is an e2e implementation of resource access control for Move, with most parts in place:

- Replaces the acquires syntax in a downwards-compatible way
- The extended syntax is only available in compiler v2
- One can now specify `acquires`, `reads`, and `writes`
- One can specify the address of a resource in dependency of parameters
- Multiple levels of wildcards are allowed, e.g. `acquires *(object::address_of(param))` specifies that all resources at the given address are read or written.
- Implements parsing->expansion->move model->file format generator
- Extends `file_format::FunctionHandle` to carry the new information, introducing bytecode version v7. v7 became the new experimental version only available in test code for now.
- TODO: dynamic runtime checking of resource access. Static analysis is also on the horizon, but not needed for an MVP of this feature.
- TODO: bytecode verification of access specifiers

An AIP for this new feature will be filed soon.

As an example, here is some extract from the tests:

```move
module 0x42::m {

    struct S has store {}
    struct R has store {}
    struct T has store {}
    struct G<T> has store {}

    fun f1() acquires S {
    }

    fun f2() reads S {
    }

    fun f3() writes S {
    }

    fun f4() acquires S(*) {
    }

    fun f_multiple() acquires R reads R writes T, S reads G<u64> {
    }

    fun f5() acquires 0x42::*::* {
    }

    fun f6() acquires 0x42::m::R {
    }

    fun f7() acquires *(*) {
    }

    fun f8() acquires *(0x42) {
    }

    fun f9(a: address) acquires *(a) {
    }

    fun f10(x: u64) acquires *(make_up_address(x)) {
    }

    fun make_up_address(x: u64): address {
        @0x42
    }
}
```
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

}

impl SerializedOption {
fn from_u8(value: u8) -> BinaryLoaderResult<bool> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't great. What do true and false mean as Option enum discriminants? Better if you can have an enum somewhere for the values.

I found this related crate: https://docs.rs/enum-kinds/latest/enum_kinds/ but you can't use it for Option, anyway, so it wouldn't be useful even if you wanted to use it. std::mem::Discriminant<Option<X>> might make sense here, but without X I don't think it's meaningful. But using 'bool' is too confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a boolean indicating the presence or the absence of the option. I think is fine.

@@ -849,6 +858,116 @@ impl Arbitrary for AbilitySet {
}
}

/// An `AccessSpecifier` describes an approximation of resources accessed by a function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "approximation" mean here? Overestimate? Underestimate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documented, Done

/// chain can be wildcards (`*`).
#[derive(Debug, Clone, PartialEq)]
pub enum AccessSpecifier_ {
Acquires(bool, NameAccessChain, Option<Vec<Type>>, AddressSpecifier),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Vineeth that this is nonoptimal, as we don't get to abstract away the kind of access from the access specification. Using Rust's pattern matching is convenient, but wordy. Right now we have 3 clauses to match these three cases, and each clause has to mention all 4 data fields. If the 4 data fields were grouped as a type, then processing the fields would be abstracted away. If we decide to add more entries to Acquires, Reads, and Writes in the future (I can imagine something related to Aggregations/Reductions and PrivatizableResources) then we duplicate the processing.

BUT: This can be refactored later, this is good enough for now.

@@ -267,7 +291,7 @@ pub struct Function {
pub visibility: Visibility,
pub entry: Option<Loc>,
pub signature: FunctionSignature,
pub acquires: Vec<NameAccessChain>,
pub access_specifiers: Option<Vec<AccessSpecifier>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the access_specifiers field, what does None mean as opposed to Some(empty vector) ?

@@ -77,6 +78,8 @@ pub(crate) struct ModuleBuilder<'env, 'translator> {
pub inline_spec_builder: Spec,
/// Translated function definitions, if we are compiling Move code
pub fun_defs: BTreeMap<Symbol, Exp>,
/// Translated access specifiers, if we are compiling Move code
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how fun everything is in Move.

@brmataptos
Copy link
Contributor

brmataptos commented Oct 29, 2023 via email

@wrwg
Copy link
Contributor Author

wrwg commented Oct 29, 2023

Maybe if you say that in a comment. It's obvious to you, maybe...

Done

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions
Copy link
Contributor

✅ Forge suite compat success on aptos-node-v1.7.2 ==> 88c057ab9bc82a6cb33c994b8ec4c3f91a1d6d92

Compatibility test results for aptos-node-v1.7.2 ==> 88c057ab9bc82a6cb33c994b8ec4c3f91a1d6d92 (PR)
1. Check liveness of validators at old version: aptos-node-v1.7.2
compatibility::simple-validator-upgrade::liveness-check : committed: 4913 txn/s, latency: 6711 ms, (p50: 6900 ms, p90: 10500 ms, p99: 11700 ms), latency samples: 186700
2. Upgrading first Validator to new version: 88c057ab9bc82a6cb33c994b8ec4c3f91a1d6d92
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 1285 txn/s, latency: 16068 ms, (p50: 19000 ms, p90: 22000 ms, p99: 22600 ms), latency samples: 92520
3. Upgrading rest of first batch to new version: 88c057ab9bc82a6cb33c994b8ec4c3f91a1d6d92
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 1441 txn/s, latency: 19160 ms, (p50: 18900 ms, p90: 24200 ms, p99: 26800 ms), latency samples: 74980
4. upgrading second batch to new version: 88c057ab9bc82a6cb33c994b8ec4c3f91a1d6d92
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 3707 txn/s, latency: 8760 ms, (p50: 9900 ms, p90: 12300 ms, p99: 12600 ms), latency samples: 144580
5. check swarm health
Compatibility test for aptos-node-v1.7.2 ==> 88c057ab9bc82a6cb33c994b8ec4c3f91a1d6d92 passed
Test Ok

@github-actions
Copy link
Contributor

✅ Forge suite realistic_env_max_load success on 88c057ab9bc82a6cb33c994b8ec4c3f91a1d6d92

two traffics test: inner traffic : committed: 8110 txn/s, latency: 4836 ms, (p50: 4500 ms, p90: 5700 ms, p99: 9900 ms), latency samples: 3503600
two traffics test : committed: 100 txn/s, latency: 2153 ms, (p50: 2100 ms, p90: 2500 ms, p99: 5200 ms), latency samples: 1800
Latency breakdown for phase 0: ["QsBatchToPos: max: 0.200, avg: 0.196", "QsPosToProposal: max: 0.163, avg: 0.151", "ConsensusProposalToOrdered: max: 0.603, avg: 0.577", "ConsensusOrderedToCommit: max: 0.520, avg: 0.501", "ConsensusProposalToCommit: max: 1.116, avg: 1.079"]
Max round gap was 1 [limit 4] at version 1745846. Max no progress secs was 4.123938 [limit 10] at version 1745846.
Test Ok

@wrwg wrwg merged commit 58271fd into main Oct 29, 2023
47 checks passed
@wrwg wrwg deleted the wg-rw branch October 29, 2023 21:52
@github-actions
Copy link
Contributor

✅ Forge suite framework_upgrade success on aptos-node-v1.7.2 ==> 88c057ab9bc82a6cb33c994b8ec4c3f91a1d6d92

Compatibility test results for aptos-node-v1.7.2 ==> 88c057ab9bc82a6cb33c994b8ec4c3f91a1d6d92 (PR)
Upgrade the nodes to version: 88c057ab9bc82a6cb33c994b8ec4c3f91a1d6d92
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 7065 txn/s, latency: 4807 ms, (p50: 4800 ms, p90: 7800 ms, p99: 8300 ms), latency samples: 247280
5. check swarm health
Compatibility test for aptos-node-v1.7.2 ==> 88c057ab9bc82a6cb33c994b8ec4c3f91a1d6d92 passed
Test Ok

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CICD:run-coverage run tests with test coverage instrumentation CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants