Skip to content
This repository has been archived by the owner on Jan 4, 2023. It is now read-only.

Move UtxoId from fuel-tx to fuel-types #51

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

xgreenx
Copy link
Contributor

@xgreenx xgreenx commented Sep 22, 2022

In the fuel-core we have custom AsRef<[u8]> for UtxoId. We represent UtxoId as 33 bytes and use it as a key for data.

fuel-tx serialize it as 40 bytes where 32 bytes it TxId and 8 bytes are output_index: u8. It is because each separate field in the struct should be WORD_SIZE during serialization in the fuel-tx.

We can represent UtxoId as 33 bytes and know that the last byte is an output_index. We don't need to make it a separate field. UtxoId can provide two getter methods for tx_id and output_index.

It is breaking change but:

  • It reduces the size of all transactions (instead of 40 bytes, we need 33 bytes).
  • Makes UtxoId the same as other ids(ContractId, MessageId).
  • We have one serialization model in fuel-tx and fuel-core.

@xgreenx xgreenx added the breaking A breaking api change label Sep 22, 2022
@xgreenx xgreenx self-assigned this Sep 22, 2022
@Voxelot
Copy link
Member

Voxelot commented Sep 22, 2022

This is 40 bytes when doing canonical serialization in the vm so that all types are 64bit aligned. Word alignment isn't required when storing to disk though.

@xgreenx
Copy link
Contributor Author

xgreenx commented Sep 22, 2022

Where exactly do we need 64-alignment? (Maybe you can put the link on the code)
We also have Base4 and Base20 that are not 64-bit aligned.

@Voxelot
Copy link
Member

Voxelot commented Sep 22, 2022

The word alignment is required here when initializing the VM memory: https://github.com/FuelLabs/fuel-vm/blob/v0.18.2/src/interpreter/initialization.rs#L36-L47

*i = rng.gen();
}

default
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we want to generate bytes individually instead of delegating the implementation to R so he is aware we are in fact generating n bytes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because Distribution is not implemented for unusual arrays as [u8; 33].

/// FuelVM atomic type.
pub struct $i([u8; $s]);

impl Default for $i {
fn default() -> $i {
$i([0u8; $s])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove the proc macro default implementation to add more code that will do the same - but manually instead of relying in the language defaults?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because Default is not implemented for unusual arrays as [u8; 33].

@@ -24,16 +24,29 @@ const fn hex_val(c: u8) -> Option<u8> {

macro_rules! key {
($i:ident, $s:expr) => {
#[derive(Default, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
#[repr(transparent)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why adding transparent here is required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because UtxoId actually has two parts, 32 bytes and 1 byte, it can be helpful if someone wants to mem::transmute part of the 33 bytes array into TxId or output_index.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think its a good idea to mix the output index with the bytes representation. If its a matter of encoding / decoding, we can surely optimize this there but then not in the type representation and memory layout

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not a matter of encoding/decoding. UtxoId is an identifier as MessageId and ContractId, so it should have all methods available for other identifiers.

If we use representation:

pub struct UtxoId([u8; 32], u8);

We need to implement all traits manually. But UtxoId(Bytes32, u8) equevalent to UtxoId([u8; 33]). AsRef<[u8]> implementation for UtxoId(Bytes32, u8) requires usage of unsafe { mem::transmute } when it is natively available for UtxoId([u8; 33]).

Also, information about the output index matters during the creation of the UtxoId, but after it is used not often.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked our implementation of "unsafe" methods, and it is required for memory layout correctness in our unsafe code

@@ -271,6 +284,9 @@ key!(Bytes20, 20);
key!(Bytes32, 32);
key!(MessageId, 32);
key!(Salt, 32);
key!(UtxoId, 33);

pub type TxId = Bytes32;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not follow the pattern of the other types instead of aliasing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think so, but in the fuel-tx it is an alias. I moved it from fuel-tx here. But I also for the creation of a separate type. If you are okay with it(it is a breaking change), I will do it=)

r[8], r[9], r[10], r[11], r[12], r[13], r[14], r[15],
r[16], r[17], r[18], r[19], r[20], r[21], r[22], r[23],
r[24], r[25], r[26], r[27], r[28], r[29], r[30], r[31],
];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why to do it like this. Its harder to read/understand without a clear gain

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because slice_copy is not const operation=)
So I can do copy only manually byte by byte=(

}

pub const fn output_index(&self) -> u8 {
self.0[TxId::LEN]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole behavior looks like some magic. Instead of having hardcoded positions in the array, maybe we want to create a field with the output index?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of this change is to represent the UtxoId as an array of 33 bytes=) Because on the fuel-core level, we work with arrays and slices.

It can be a struct with two fields, but then in the AsRef<[u8]> implementation, we need to use unsafe{ mem::transmute } to convert &UtxoId into &[u8; 33] or &[u8].

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's a good approach to let serialization concerns affect how our core domain types are structured. It's a bit like the tail wagging the dog tbh.

On the other hand, copying these utxoIds every time we go to storage isn't great either.

Maybe we should consider using a vetted zerocopy lib instead of implementing transmute ourselves? e.g. https://docs.rs/zerocopy/latest/zerocopy/trait.AsBytes.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did an alter implementation #52

For me, the approach with [u8; 33] looks cleaner=)

@@ -24,16 +24,29 @@ const fn hex_val(c: u8) -> Option<u8> {

macro_rules! key {
($i:ident, $s:expr) => {
#[derive(Default, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
#[repr(transparent)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think its a good idea to mix the output index with the bytes representation. If its a matter of encoding / decoding, we can surely optimize this there but then not in the type representation and memory layout

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
breaking A breaking api change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants