CIP-0041? | UPLC Serialization Optimizations #314

HarmonicPool · 2022-08-12T16:39:16Z

this CIP proposes some changes to the UPLC AST serialization in order to reduce the size of the serialized output

KtorZ · 2022-08-13T08:08:01Z

CIP-optimized-uplc-serialization/README.md

+
+This proposal suggest ways to reduce serialized scripts size.
+
+the changes where designed to keep the bit oriented style of flat minimizing the number of required bits for values where possible.


The rationale section is a bit thin, though there's a bit of rationale sprinkled in various places of the specification. It'd be nice to have a summary of what motivates each design choices. This can most likely be done by reworking a bit how the specification is written (that is, write the specification without justifications, and move the justifications to the rationale section).

For example, most of the "data serialization" section is about explaining what is currently wrong with the Data CBOR serialization. Instead, leave the specification focused to what the proposal is about, and move the rationale to this section.

It'd be nice to consider some benchmarks? Taking some reference UPLC and serializing them side-by-side with the two methods to see how much gain one can expect.

Thank you @KtorZ; will refactor as suggested ( and correct the various typos 😄 )

L-as · 2022-08-16T00:34:03Z

CIP-optimized-uplc-serialization/README.md

+
+## Abstract
+
+this document describes the parts of the current serialization algorithm that can be improved and provides the specification and documentation needed in order to implement an optimized version of this one.


You probably want to clean up the formatting, e.g. use capital case at the beginning of sentences.

L-as · 2022-08-16T00:34:53Z

CIP-optimized-uplc-serialization/README.md

+        case 6: return "000001";
+        case 7: return "0000001";
+    }
+}


What language is this? Why not describe this in Haskell?

typescript just because I feel the description is more intuitive using it

it translates to haskell as

pad :: Int -> String pad 0 = "00000001" pad 1 = "1" pad 2 = "01" pad 3 = "001" pad 4 = "0001" pad 5 = "00001" pad 6 = "000001" pad 7 = "0000001" pad _ = undefined

KtorZ · 2022-09-06T08:07:41Z

CIP-optimized-uplc-serialization/README.md

+---
+CIP: "???"
+Title: optimized UPLC serialization  
+Authors: Michele Nuzzi <michele.nuzzi.2014@gamil.com> 


Suggested change

Authors: Michele Nuzzi <michele.nuzzi.2014@gamil.com>

Authors: Michele Nuzzi <michele.nuzzi.2014@gmail.com>

KtorZ · 2022-09-06T08:09:01Z

CIP-optimized-uplc-serialization/README.md

+
+the proposed changes to the algorithm will cause the same UPLC Abstract Syntaxt Tree to serialize in a different way based on the algorithm used;
+
+In order to allow the deserializaton process to handle the old serialization algorithm the version of the program sholud be chcked first.


Suggested change

In order to allow the deserializaton process to handle the old serialization algorithm the version of the program sholud be chcked first.

In order to allow the deserializaton process to handle the old serialization algorithm the version of the program should be checked first.

KtorZ · 2022-09-06T08:09:14Z

cc @michaelpj

michaelpj · 2022-09-06T10:43:33Z

This could probably have started life as a plutus issue, but this is also fine!

@kwxm can you take a look?

KtorZ · 2022-10-25T07:20:34Z

Note

We'll be reviewing this proposal briefly in today's editor meeting but since this change concerns Plutus internals, the ultimate decision and approval is up to the current Plutus core team in accordance with CIP-0035.

cc @michaelpj @kwxm

michaelpj · 2022-10-25T10:41:04Z

Apart from anything else, this could do with an impact assessment that assess how much of an improvement this is: actual numbers are very relevant.

kwxm · 2022-10-25T10:49:03Z

CIP-optimized-uplc-serialization/README.md

+```
+the case in which the ```missingBits``` is 0 implies that the current serialized program is already byte-alligned
+
+since this padding carries no usefull informations, the current ```pad( 0 )``` adds a useless byte each time a padding is needed and the number of used bits is a multiple of 8.


I think that the reason for this is so that the decoder can easily discard padding at the end without having to check whether it's at a byte boundary, and this makes it faster. The proposed change would on average save one bit per bytestring, so I'm not sure if it's worth it.

if the proposed changes are accepted the only place where padding will be used is at the end of the script; since bytestring will no longer require to be byte alligned

the check for pad( 0 ) then becomes just lengthInBits `mod` 8 == 0

kwxm · 2022-10-25T10:54:43Z

CIP-optimized-uplc-serialization/README.md

+
+tags from ```integer``` to ```bool``` and the ```data``` one are directly followed by the respective value encoding;
+
+tags ```list``` and ```pair``` are the only tags that do require some other tag in order to be a defined type; since the twos always require some other type in order to be valid, the type application is implcit and it should be removed.


Yes, that's a bit annoying: it's like that because it reflects how the internal representations of types work. The extra generality might permit us to do some more complex things in future, but it's not clear if we'll ever actually need that. Simplifying this might be a good thing to do, since it complicates the encoding/decoding process as well as taking up extra room.

kwxm · 2022-10-25T10:58:12Z

CIP-optimized-uplc-serialization/README.md

+
+### ```data``` serialization
+
+All the effort of minimizing the size of on-chain scripts by prefering ```flat``` over ```CBOR``` serialization are ignored when it comes to ```data``` serialization.


Yes, it would be good to have a more efficient encoding of Data rather than just wrapping the CBOR representation. However, I think the reason for this is that Data is used elsewhere on the chain (including things that are passed to validator scripts as parameters), and CBOR is the preferred format there. It might be difficult to switch because of this.

conversion from CBOR to a specific data encoding can be done by the node prior to being passed as argument;

CBOR encoding is definitely too expansive to be included in a script

HarmonicPool · 2022-10-25T11:01:50Z

Apart from anything else, this could do with an impact assessment that assess how much of an improvement this is: actual numbers are very relevant.

@michaelpj I will work on a CIP implementation these days to have a comparison in terms of size

kwxm · 2022-10-25T11:03:46Z

CIP-optimized-uplc-serialization/README.md

+this implies ```(~1) + #chunks + 1``` meaningless bytes are added per ```ByteString```
+
+in the descripton above
+- step ```1``` allows for an easy serialization and deserialization but doesn't carries any meaningful informations; given the importance of ByteStrings it should be removed at the cost of an added shift while serializing/deserializing the value


I think that might slow things down significantly, since I think we'd need to shift every byte in a bytestring if it didn't start at a byte boundary (or have I got that wrong?). It's important that on-chain deserialisation be as fast as possible, so the extra expense might be problematic. I'd like to see some benchmark results for this.

It might be possible to implement deserialization without shifting bytes for efficiency

I'll leave here the plu-ts implementation of the readNBits function on which the deserialization process is based

I believe a very similar result can be achieved in Haskell using the Data.Binary.Get monad

kwxm · 2022-10-25T11:05:04Z

CIP-optimized-uplc-serialization/README.md

+2) as many bytes as specified in the Unsigned Integer at step 1
+```
+
+using the new algorithm the ```ByteString``` serialization space complexity goes form ```O(n)``` to ```O(log n)``` where ```n``` is the number of bytes in the ```ByteString``` 


Hmmm. I need to think more carefully about this. It's a while since I've thought about bytestring encodings and I'll need to remind myself of the details of how we currently do it.

kwxm

The ideas here aren't nonsensical, but I'd like to see some benchmark results to see how they affect (a) the space taken up by serialised scripts, and (b) deserialisation times. I think that the benefits would have to be quite big in order to accept this.

I'll add that we're aware that there's room for improving things, but the reason that things are the way that they currently are is because we're using a pre-existing library and that's the way that it does things. We're definitely not averse to changing things, but then we'd need to maintain our own version of the flat library,

Apologies for a slightly hurried review: this had slipped off the bottom of my list. I'll think a bit more about this, in particular the bytestring encoding.

Crypto2099 · 2024-07-14T14:34:31Z

Wondering if this is still an issue and needs to remain open given the advancements of Plutus v2/v3 and Plu.TS, Aiken, et al in the ensuing years since this CIP was opened. @michele-nuzzi what do you think?

michele-nuzzi · 2024-07-14T14:41:52Z

Theoretically it could still be something worth exploring

Even with Aiken and plu-ts reducing size as much as possible Devs still tend to max out script size in an effort to reduce recursion and handle many inputs

However I or my team are not able to focus on this in the short term

If someone else wants to take over the CIP I'm ok with that

rphair · 2024-07-23T22:11:25Z

thanks for update @michele-nuzzi ... @Ryun1 @Crypto2099 I'll remove this from the Waiting for Authors table in the README in the upcoming biweekly update.

added uplc serializatonù

b5a1054

KtorZ reviewed Aug 13, 2022

View reviewed changes

L-as reviewed Aug 16, 2022

View reviewed changes

KtorZ changed the title ~~CIP-optimized-uplc-serialization~~ CIP-0041? | UPLC Serialization Optimizations Aug 17, 2022

michele-nuzzi added 2 commits August 24, 2022 11:30

refactoring of specifications

1c87951

rationale

56560d0

KtorZ reviewed Sep 6, 2022

View reviewed changes

KtorZ added Candidate CIP labels Sep 6, 2022

kwxm reviewed Oct 25, 2022

View reviewed changes

KtorZ added the State: Waiting for Author Proposal showing lack of documented progress by authors. label Nov 30, 2022

KtorZ added Category: Plutus Proposals belonging to the 'Plutus' category. and removed Candidate CIP labels Mar 18, 2023

bezirg mentioned this pull request May 17, 2024

Compact / shrink down the flat size of uplc IntersectMBO/plutus#6051

Open

michele-nuzzi closed this by deleting the head repository Jul 23, 2024

rphair mentioned this pull request Jul 27, 2024

Update top-level README: post meeting 93 #865

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CIP-0041? | UPLC Serialization Optimizations #314

CIP-0041? | UPLC Serialization Optimizations #314

HarmonicPool commented Aug 12, 2022

KtorZ Aug 13, 2022

HarmonicPool Aug 14, 2022

L-as Aug 16, 2022

L-as Aug 16, 2022

HarmonicPool Aug 23, 2022

KtorZ Sep 6, 2022

KtorZ Sep 6, 2022

KtorZ commented Sep 6, 2022

michaelpj commented Sep 6, 2022

KtorZ commented Oct 25, 2022

michaelpj commented Oct 25, 2022

kwxm Oct 25, 2022

HarmonicPool Oct 25, 2022

kwxm Oct 25, 2022

kwxm Oct 25, 2022

HarmonicPool Oct 25, 2022 •

edited

Loading

HarmonicPool commented Oct 25, 2022 •

edited

Loading

kwxm Oct 25, 2022 •

edited

Loading

HarmonicPool Oct 25, 2022

kwxm Oct 25, 2022

kwxm left a comment

Crypto2099 commented Jul 14, 2024

michele-nuzzi commented Jul 14, 2024

rphair commented Jul 23, 2024 •

edited

Loading


		This proposal suggest ways to reduce serialized scripts size.

		the changes where designed to keep the bit oriented style of flat minimizing the number of required bits for values where possible.


		## Abstract

		this document describes the parts of the current serialization algorithm that can be improved and provides the specification and documentation needed in order to implement an optimized version of this one.

	Authors: Michele Nuzzi <michele.nuzzi.2014@gamil.com>
	Authors: Michele Nuzzi <michele.nuzzi.2014@gmail.com>


		the proposed changes to the algorithm will cause the same UPLC Abstract Syntaxt Tree to serialize in a different way based on the algorithm used;

		In order to allow the deserializaton process to handle the old serialization algorithm the version of the program sholud be chcked first.


		tags from ```integer``` to ```bool``` and the ```data``` one are directly followed by the respective value encoding;

		tags ```list``` and ```pair``` are the only tags that do require some other tag in order to be a defined type; since the twos always require some other type in order to be valid, the type application is implcit and it should be removed.


		### ```data``` serialization

		All the effort of minimizing the size of on-chain scripts by prefering ```flat``` over ```CBOR``` serialization are ignored when it comes to ```data``` serialization.

CIP-0041? | UPLC Serialization Optimizations #314

CIP-0041? | UPLC Serialization Optimizations #314

Conversation

HarmonicPool commented Aug 12, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KtorZ commented Sep 6, 2022

michaelpj commented Sep 6, 2022

KtorZ commented Oct 25, 2022

michaelpj commented Oct 25, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HarmonicPool Oct 25, 2022 • edited Loading

Choose a reason for hiding this comment

HarmonicPool commented Oct 25, 2022 • edited Loading

kwxm Oct 25, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kwxm left a comment

Choose a reason for hiding this comment

Crypto2099 commented Jul 14, 2024

michele-nuzzi commented Jul 14, 2024

rphair commented Jul 23, 2024 • edited Loading

HarmonicPool Oct 25, 2022 •

edited

Loading

HarmonicPool commented Oct 25, 2022 •

edited

Loading

kwxm Oct 25, 2022 •

edited

Loading

rphair commented Jul 23, 2024 •

edited

Loading