New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bytecode-verifier] Implement phantom params in bytecode #8599
Conversation
Thank you for your submission. We require all contributors to Have you signed the CLA already, but your status is still pending? Recheck CLA |
ec5b275
to
02f367f
Compare
@@ -14,6 +14,7 @@ pub struct StructType { | |||
pub fields: Vec<Type>, | |||
pub abilities: AbilitySet, | |||
pub type_parameters: Vec<AbilitySet>, | |||
pub type_param_kinds: Vec<TypeParamKind>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have a discussion about whether to prefer AoS (Array of Structs) or SoA (Struct of Arrays) here. SoA is what the PR currently implements. Following is an example of AoS.
struct TypeParameter {
is_phantom: bool,
abilities: AbilitySet,
}
struct StructType {
..
pub type_parameters: Vec<TypeParameter>,
..
}
AoS seems more intuitive and unlike SoA, we don't have to check that the two arrays have the same length. However one advantage SoA has is that we will continue to be able to pass the old list of ability sets to functions that expect &[AbilitySet]
. Although with AoS, the issue can be fixed by defining a getter trait.
trait TypeParamGetAbilitySet {
fn ability_set() -> &AbilitySet;
}
fn consumer(ty_params: &[AbilitySet]);
// changes to
fn consumer(ty_params: &[impl TypeParamGetAbilitySet]);
@tnowacki thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We briefly discussed SoA with @tnowacki that's why I ended up implementing it. But after implementing it I think I lean towards AoS. Maintaining the invariant that both array are the same length can get very nasty and it's not very clear where to add the checks (I didn't find any other similar well-formed checks). The getter trait sounds like a good middle ground, didn't think of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can do what you want here. It might make sense here to do Vec<TypeParameterType>
or something.
But in the file format I think it is important to keep them separate. See that discussion in the serializer.
serialize_ability_sets(binary, &struct_handle.type_parameters)?; | ||
serialize_type_param_kinds(binary, &struct_handle.type_param_kinds) | ||
} | ||
|
||
fn serialize_type_param_kinds(binary: &mut BinaryData, kinds: &[TypeParamKind]) -> Result<()> { | ||
for kind in kinds { | ||
serialize_type_param_kind(binary, *kind)?; | ||
} | ||
Ok(()) | ||
} | ||
|
||
fn serialize_type_param_kind(binary: &mut BinaryData, kind: TypeParamKind) -> Result<()> { | ||
match kind { | ||
TypeParamKind::Phantom => binary.push(SerializedTypeParamKindFlag::PHANTOM as u8), | ||
TypeParamKind::NonPhantom => binary.push(SerializedTypeParamKindFlag::NON_PHANTOM as u8), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, I think a discussion about AoS (Array of Structs) vs SoA (Struct of Arrays) should happen here.
Do we want
[length] [abi 1] [abi 2] [abi 3] [phantom 1] [phantom 2] [phantom 3]
or
[length] [abi 1] [phantom 1] [abi 2] [phantom 2] [abi 3] [phantom 3]
?
Additionally
- In the former case, do we want to compress the phantom bits into a bit vector?
- In the latter case, do we want to compress the phantom bit and the ability set into the same bit vector?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with "[length] [abi 1] [phantom 1] [abi 2] [phantom 2] [abi 3] [phantom 3]" is we don't want to make this a breaking change.
So that gives you:
- If it is not present, it is assumed that all the type parameters are non-phantom
- if it is present, you specify the status/usage for each one
This way you don't need to make a breaking change... but then again you are going to have to gate this based on the version of the file format, so maybe the breaking change isn't a big deal and doing the AoS solution would work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the mroe I think about it since you have to gate based on the file format version, we might as well just do the cleaner thing and do [length] [abi 1] [phantom 1] [abi 2] [phantom 2] [abi 3] [phantom 3]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I originally also wanted compare the schemas in terms of binary compatibility, but then came into the same conclusion that any addition to the file format requires a new version number, making it meaningless to talk about backward compatibility, unless we come up with some notation that would allow us to say something like "version 5 is compatible with (can be read as) version 6" or equivalently introduce some sort of ".1" semantics, which I don't think we'll do.
Thinking about binary compatibility might help make it easier for us to implement ser/de though, as minimizing the difference between the old and new format is one of the goals. Still, I don't think this is a major concern and I'd prefer that we simply take the new version as a chance to evolve the file format.
❗ Invalid command Bors help and documentationBors is a Github App used to manage merging of PRs in order to ensure that CI is always green. It does so by maintaining a Merge Queue. Once a PR reaches the head of the Merge Queue it is rebased on top of the latest version of the PR's General
CommandsBors actions can be triggered by posting a comment which includes a line of the form
OptionsOptions for Pull Requests are configured through the application of labels. |
serialize_ability_sets(binary, &struct_handle.type_parameters)?; | ||
serialize_type_param_kinds(binary, &struct_handle.type_param_kinds) | ||
} | ||
|
||
fn serialize_type_param_kinds(binary: &mut BinaryData, kinds: &[TypeParamKind]) -> Result<()> { | ||
for kind in kinds { | ||
serialize_type_param_kind(binary, *kind)?; | ||
} | ||
Ok(()) | ||
} | ||
|
||
fn serialize_type_param_kind(binary: &mut BinaryData, kind: TypeParamKind) -> Result<()> { | ||
match kind { | ||
TypeParamKind::Phantom => binary.push(SerializedTypeParamKindFlag::PHANTOM as u8), | ||
TypeParamKind::NonPhantom => binary.push(SerializedTypeParamKindFlag::NON_PHANTOM as u8), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with "[length] [abi 1] [phantom 1] [abi 2] [phantom 2] [abi 3] [phantom 3]" is we don't want to make this a breaking change.
So that gives you:
- If it is not present, it is assumed that all the type parameters are non-phantom
- if it is present, you specify the status/usage for each one
This way you don't need to make a breaking change... but then again you are going to have to gate this based on the version of the file format, so maybe the breaking change isn't a big deal and doing the AoS solution would work
@@ -14,6 +14,7 @@ pub struct StructType { | |||
pub fields: Vec<Type>, | |||
pub abilities: AbilitySet, | |||
pub type_parameters: Vec<AbilitySet>, | |||
pub type_param_kinds: Vec<TypeParamKind>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can do what you want here. It might make sense here to do Vec<TypeParameterType>
or something.
But in the file format I think it is important to keep them separate. See that discussion in the serializer.
@tnowacki If I read #8599 (comment) correctly then you agree on going AoS all the way and in file_format have struct TypeParameter {
constraint: AbilitySet,
is_phantom: bool,
}
struct StructHandle {
...
type_parameters: Vec<TypeParameter>
...
} and serialize it like:
? If so, the only thing remaining is whether to serialize |
@nilehmann This The fileformat is not hyper-optimized already so I would just serialize each field of phantom and abilities separately. We can always piggyback on the phantom u8 for other flags in the future if needed And just looking ahead, definitely don't use |
912fd99
to
24cb27d
Compare
24cb27d
to
4d68793
Compare
4d68793
to
597f41c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking really great! This AoS solution is looking a lot mroe slick. I have mostly focused on the vm bits and will take a deeper look at the other parts later. But some more minor things to change up at least at the moment :)
/// Struct type parameters can be declared as phantom while | ||
/// function type parameters cannot. This getter trait | ||
/// is used to work over both of them uniformily. | ||
pub trait TypeParamGetConstraints { | ||
fn constraints(&self) -> &AbilitySet; | ||
} | ||
|
||
impl TypeParamGetConstraints for TypeParameter { | ||
fn constraints(&self) -> &AbilitySet { | ||
&self.constraints | ||
} | ||
} | ||
|
||
impl TypeParamGetConstraints for AbilitySet { | ||
fn constraints(&self) -> &AbilitySet { | ||
self | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is nice in some spots? but to me this feels really unnecessary, especially since the field is public.
I think using impl IntoIterator
in a few spots should clean this up for you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main motivation for this is to use it in BinaryIndexedView::abilities
where we need the indexing. So yeah, we can get away with impl IntoIterator
in many places, but I don't know of a better solution for this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IntoIterator should work for that case too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't see how IntoIterator
could work in this case as we have to access arbitrary indices.
TypeParameter(idx) => *constraints[*idx as usize].constraints(),
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I would say the trait is for cases where you need random access, e.g. checking type arguments against type parameters. In such cases, collecting into a vector could result in a regression in terms time complexity.
@nilehmann could you confirm if such cases actually exist in our codebase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vgao1996 BinaryIndexedView::abilities
is the only place where random access is used. If we keep that as a &[AbilitySet]
I think the only place that needs an extra collect
is here
That random access is also duplicated for test generation here.
There are also other places that access the length, but I think those are not too problematic.
fn get_signature_from_type_params(&mut self, abilities: &[AbilitySet]) -> Signature { | ||
fn get_signature_from_type_params( | ||
&mut self, | ||
abilities: &[impl TypeParamGetConstraints], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one thing you might want to do instead is impl IntoIterator<Item = &AbilitySet>
that way you won't need this other trait
.map(|(idx, ty_param)| { | ||
format!( | ||
"{}{}{}", | ||
if ty_param.is_phantom { "phantom " } else { "" }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is probably fine. but would also be fine with doing an assert!(ty_param.is_phantom())
or something for now. Up to you though since you will likely be doing both parts :)
fn verify_ty_args(&self, constraints: &[AbilitySet], ty_args: &[Type]) -> PartialVMResult<()> { | ||
fn verify_ty_args( | ||
&self, | ||
constraints: &[impl TypeParamGetConstraints], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as the other spot impl IntoIterator<Item = &AbilitySet>
will fix this. Also maybe just impl IntoIterator<Item = AbilitySet>
since Ability set is just a u8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function in particular accesses the length so the implementation would also need to change. I think the trait is cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we can duplicate the function :) Or just copy them out into another vec. Both of these seem better to me than the trait
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So one thing we could do is leave everything as it was with &[AbilitySet]
everywhere and add an extra collect
at the call site when calling from a "struct context". I guess you'd be trading extra allocations for the complexity of adding the trait. I particularly like more the trait as I think it's the right abstraction, it captures what's common about function type parameters and struct type parameters, but I'm fine either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also have two different entry points. and then have a shared _impl
that takes a size and the iterator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I also don't think the extra allocations will be a big deal, like at all. In practice this vector will have like 5 elements at most.
)), | ||
Type::Struct(idx) => Ok(self.module_cache.read().struct_at(*idx).abilities), | ||
Type::StructInstantiation(idx, type_args) => { | ||
let declared_abilities = self.module_cache.read().struct_at(*idx).abilities; | ||
let struct_type = self.module_cache.read().struct_at(*idx); | ||
let type_argument_abilities = type_args |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wild, didn't notice before. Or I did and looked it up then. But do we check some where that the correct number of type arguments were passed in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of the code I touched seems to check it, but there may be a check somewhere else? I just went on and zip the args and params.
c2c67cc
to
de64021
Compare
Definitely love the Iterator traits! I do find it funny how much Rust makes you ("you" being any Rust programmer) feel bad for extra allocations, when most of the time you probably won't see a big performance difference :P A very amusing fallout from some choices around clone and the like |
Basically revert all changes under diem-releases/artifacts folder and check if the failed test would pass or not. And create a new PR that includes those changes so that we will have recompiled DiemFramework in the main branch |
I don't think you can do that. There is another test that stops you from landing anything unless artifacts is up to date. At least that is what @tzakian told me. |
@tnowacki yeah there's a test that checks the working directory doesn't have unstaged changes. That's how I realized artifact even existed in the first place. |
I think Runtian's suggestion is the right course of action if we want to get this in soon. You can turn off the test to make sure the files are up-to-date by putting a |
60f2cd1
to
add6060
Compare
Double checked the hypothesis why LBT failed using following command:
This command start a cluster test runner from main branch and update nodes to main+PR image. The test was exxecuted successfully with following output:
This proved that the PR has no compatibility issue with the older file format. With this we can proceed landing this code with cluster test disabled @sausagee . |
This PR will change the file format so the next Diem Framework release will need to be dependent on the Diem Core release. |
add6060
to
1660f39
Compare
/land priority=high |
@sausagee ❗ This PR is still missing approvals, unable to queue for landing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do this!
/land priority=high |
1660f39
to
5c45fd2
Compare
This PR implements phantom type parameters in the bytecode. A type argument instantiating a phantom type parameter is not considered when deriving the abilities for the struct instantiation. For this to be sound, in a struct declaration a phantom type parameter can only appear in phantom position (or not appear at all), i.e., as an argument to a phantom type parameter.
Motivation
Avoid spurious ability annotations for marker types like XUS.
Have you read the Contributing Guidelines on pull requests?
Yes
Test Plan
Tests will be coming in the next PR.