Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extended packing and extracting library for value types #5056

Merged
merged 25 commits into from
Jun 11, 2024

Conversation

Amxx
Copy link
Collaborator

@Amxx Amxx commented May 27, 2024

Fixes #5053
Alternative to #5051

Missing tests for the replace functions.

PR Checklist

  • Tests
  • Documentation
  • Changeset entry (run npx changeset add)

Copy link

changeset-bot bot commented May 27, 2024

⚠️ No Changeset found

Latest commit: 038e2dc

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes changesets to release 1 package
Name Type
openzeppelin-solidity Minor

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@Amxx Amxx changed the title Extended packing and extracting library for bytesXx Extended packing and extracting library for bytesXX May 27, 2024
@Amxx Amxx changed the title Extended packing and extracting library for bytesXX Extended packing and extracting library for value types May 27, 2024
@Amxx Amxx requested review from ernestognw May 27, 2024 18:41
Copy link
Member

@ernestognw ernestognw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the approach and how it looks. However, the Packing.sol file itself is already 437 kb (around 25% the current size of the library). Let alone the file, I guess the compilation artifacts we distribute will grow as well so probably more.

In a past design, I was suggesting the following size combinations I thought were enough:

const BYTES_PACK_SIZES = {
  32: [
    ['uint128', 'uint128'],
    ['uint192', 'uint64'],
    ['uint64', 'uint64', 'uint64', 'uint64'],
  ],
  24: [
    ['uint96', 'uint96'],
    ['uint144', 'uint48'],
    ['uint48', 'uint48', 'uint48', 'uint48'],
  ],
  16: [
    ['uint64', 'uint64'],
    ['uint96', 'uint32'],
    ['uint32', 'uint32', 'uint32', 'uint32'],
  ],
  12: [
    ['uint48', 'uint48'],
    ['uint72', 'uint24'],
    ['uint24', 'uint24', 'uint24', 'uint24'],
  ],
  8: [
    ['uint32', 'uint32'],
    ['uint48', 'uint16'],
    ['uint16', 'uint16', 'uint16', 'uint16'],
  ],
  4: [
    ['uint16', 'uint16'],
    ['uint24', 'uint8'],
    ['uint8', 'uint8', 'uint8', 'uint8'],
  ],
  2: [['uint8', 'uint8']],
};

The idea is to split the most common byte types (2,4,8,12,16,24,32) in 4 (equal) and 2 (unequal) chunks. I realized pairs like [uint48,uint16] and [uint16,uint48] were redundant.

The current design feels the right approach but we can reasonably shave off a lot of redundancy from duplicated pack functions (eg. bytes3 | bytes4 vs bytes4 | bytes3), types such as PackedBytes1 that aren't needed in my opinion, or even extremely specific uses like bytes 2 | bytes27 that aren't worth putting in the library.

What do you think of starting with selecting the combination of types we think are useful and keep the product of them? My list is based on the following?:

  • Bytes (32, 16, 8, 4, 2, 1) as starting point. Users may be able to compose them
  • Add extra common use cases like ERC4337 nonce packing (bytes24 | bytes8), in that case 24 is missing
  • 12 and 6 to split 24 (we can get rid of these)

Additionally, we can add 20 and 12 for addresses.

With this setup, it's possible that we can reduce this file size substantially while keeping enough usage freedom.

@Amxx
Copy link
Collaborator Author

Amxx commented May 29, 2024

I guess the compilation artifacts we distribute will grow as well so probably more.
Compilation artifacts for libraries should be small. All the functions are internals.

I checked, and the Packing.json artifact is only 720bytes. (Math.json is similar, Strings.json is 1.1k)

About removing some variants to reduce size:

  • IMO symetric construction such as bytes8 | bytes24 vs bytes24 | bytes4 are not a duplication. If you want to build a "meta nonce" (as in 4337) from a byte8 key, and a bytes24 value ... the bytes24 | bytes8 helper is not going to help.
  • If we remove some variants, we should be ready when people come and ask us "why is this variant that I want not available". Not having some is an opinionated choice. Weither we want to make it or not, I'm not 100% sure.
  • About the PackedBytes1, I'm not sure what the impact of removing it would be. Having it makes things more uniform:
    Just like you would do "I want to extract 20 bytes from this thing, at this location, and interpret it as an address", you would also do "I want to extract 1 byte from this thing, at this location, and interpret it as a uint8". IMO there is value to keeping it.

Something that we never considered, but that may be an alternative is to:

  • not ship the procedurally generated solidity files in the npm package
  • ship the generation code in the npm package
  • automatically run the generation code when the packages is pulled.
    That would reduce the size of the package, but also add some moving pieces. We would have to be super carefull about dependencies.

IMO next steps are

  • determine which version we want to keep (if not all)
  • decide on the delivery (if we feel what want to distribute it too big)

@Amxx
Copy link
Collaborator Author

Amxx commented May 29, 2024

Proposal:

  • include types:
    • bytes1
    • bytes2
    • bytesXX with X a multiple of 4, from bytes4 to bytes32
  • packing:
    • 1 + 1 <> 2
    • 2 + 2 <> 4
    • X + Y <> Z, for X and Y multiples of 4
      • this includes 16+16<>32, but also 12+20<>32, and 4+8<>12

(that is equivalent to saying pack X+Y<>Z if and only if all X, Y and Z are included types)

  • extract / replace
    • extract / replace any of the included type from any of the included types that are bigger

This result in a file of 1165 lines (41k). SafeCast.sol is 35k, Math.sol is 28k, ...

@Amxx Amxx requested a review from frangio May 29, 2024 08:29
@frangio
Copy link
Contributor

frangio commented May 29, 2024

I'm not a big fan of the reliance on user defined types in this library. I understand that overloaded function names doesn't work, but it seems very uncomfortable for users to have to wrap and unwrap values to use the library.

Have you considered disambiguating by naming convention? For example, extract_8_2, replace_8_2, pack_8_12.

@Amxx
Copy link
Collaborator Author

Amxx commented May 29, 2024

Have you considered disambiguating by naming convention? For example, extract_8_2, replace_8_2, pack_8_12

I think we liked the idea of having the same name when possible, and deducing the right version from the input parameters. We also think that this will possibly be used with both bytesXX and uintXXX (and address) as input and ouput, so the UDVT would be a good intermediary representation.

I guess that is not set in stone, we could replace the UDVT with bytesXX, add bytesXX<>uintXXX conversion/casting function, and specialize all the names ...

I prefer the current way, but its not a very strong opinion.

@frangio
Copy link
Contributor

frangio commented May 29, 2024

I think the benefit of having the same name may be lost if it requires a high level of verbosity like this:

Packing.pack(left.asPackedBytes2(), right.asPackedBytes2()).extract2(0).asUint16()

IMO simpler usage should be prioritized:

uint16(Packing.pack_2_2(left, right).extract_4_2(0))

The cast is not ideal. Perhaps extractBytes2_4(uint8) -> bytes2 and extractUint16_4(uint8) -> uint16 should be separate functions.

@Amxx
Copy link
Collaborator Author

Amxx commented May 29, 2024

The cast is not ideal. Perhaps extractBytes2_4(uint8) -> bytes2 and extractUint16_4(uint8) -> uint16 should be separate functions.

But then if you want to extract a uint from a bytes, or vice versa, that is 4 functions you need ...

Usecase: accountGasLimits and gasFee in PackedUserOperation.sol are actually two uint128 packed in a bytes32.

(cf this utils)


Packing.pack(left.asPackedBytes2(), right.asPackedBytes2()).extract2(0).asUint16()

This is testing code, I don't expect that to ever be done in production. IMO productions cases are:

// somewhere in contract X
somevariable = Packing.pack(x.asPackedBytes16(), y.asPackedBytes16()).asBytes32();

// somewhere in another contract (or at least another function)
parsed = somevariable.asPackedBytes32().extract16(0).asUint128();

would we prefer ?

// somewhere in contract X
somevariable = Packing.pack_16_16(bytes16(x), bytes16(y));

// somewhere in another contract (or at least another function)
parsed = uint128(somevariable.extract_32_16(0));

I'm honestly not sure.

@frangio
Copy link
Contributor

frangio commented May 29, 2024

somevariable = Packing.pack_16_16(bytes16(x), bytes16(y));

Why do you need the cast there?

@Amxx
Copy link
Collaborator Author

Amxx commented May 29, 2024

X and y may be uint128

@frangio
Copy link
Contributor

frangio commented May 30, 2024

@Amxx @ernestognw How do you feel about the new API?

@ernestognw
Copy link
Member

Something that we never considered, but that may be an alternative is to:

  • not ship the procedurally generated solidity files in the npm package
  • ship the generation code in the npm package
  • automatically run the generation code when the packages is pulled.
    That would reduce the size of the package, but also add some moving pieces. We would have to be super carefull about dependencies.

Yeah this is a great idea but I share the concerns with dependencies. I've seen people modifying our procedurally generation scripts for custom types. I don't remember the contest but I think it was Checkpoints with a different key size (eg. uint64). So I guess we don't need to distribute them formally.

IMO next steps are

  • determine which version we want to keep (if not all)
  • decide on the delivery (if we feel what want to distribute it too big)

I do prefer to distribute a single and "small enough" single library in a Solidity file. I like the current API since I felt initially that we were creating UDVTs for types already defined in the language. I couldn't formulate a concern from that initially becuase I tend to prefer verbosity.

Reading the current API is less clear in my opinion, especially considering @Amxx example:

parsed = uint128(somevariable.extract_32_16(0));
// vs
parsed = somevariable.asPackedBytes32().extract16(0).asUint128();

I think the second is clearer but honestly it's a matter of taste. I'd be ok shipping any of both

The cast is not ideal. Perhaps extractBytes2_4(uint8) -> bytes2 and extractUint16_4(uint8) -> uint16 should be separate functions.

But then if you want to extract a uint from a bytes, or vice versa, that is 4 functions you need ...

I also strongly agree with this. Providing the casting variants would be too much in my opinion

@Amxx
Copy link
Collaborator Author

Amxx commented May 30, 2024

We could replicate the second syntax using the current library if we add (in another PR, let's not do that now) a Cast library that provide all the asXXX primitive you would need

library Cast {
    function asUint256(bytes32 self) internal pure returns (uint256) { return uint256(self); }
    function asBytes32(uint256 self) internal pure returns (bytes32) { return bytes32(self); }
    function asAddress(bytes20 self) internal pure returns (address) { return address(self); }
    // ...
}

@Amxx Amxx added this to the 5.1 milestone Jun 3, 2024
scripts/generate/templates/Packing.js Outdated Show resolved Hide resolved
) internal pure returns (bytes${outer} result) {
bytes${inner} oldValue = extract_${outer}_${inner}(self, offset);
assembly ("memory-safe") {
result := xor(self, shr(mul(8, offset), xor(oldValue, value)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The double xor trick might require an explanation somewhere, but I get that we're avoiding comments on purpose.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record:

  • the "inner" xor (xor(oldValue, value)) computes the "difference" between the old and the new values.
  • then we shift this difference to place it a the right location
  • then the "outer" xor applies this difference to self

scripts/generate/templates/Packing.js Show resolved Hide resolved
@ernestognw ernestognw requested a review from cairoeth June 4, 2024 00:22
Copy link
Member

@ernestognw ernestognw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I added a small guide in the utilities docs and an usage example.

@frangio
Copy link
Contributor

frangio commented Jun 11, 2024

Looks nice 👍

Copy link
Contributor

@cairoeth cairoeth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great! are there any packings use cases in 5.1 that we should update to use this lib?

@ernestognw
Copy link
Member

Alright, thanks everyone for your input. I'm happy with the result and glad we took the time to iterate it before making a release 🙌🏻

@ernestognw ernestognw merged commit dc62599 into OpenZeppelin:master Jun 11, 2024
21 checks passed
@Amxx Amxx deleted the procedural/packing branch June 11, 2024 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Procedurally generate packing libraries
4 participants