Problematic cost model serialization #2902

JaredCorduan · 2022-07-08T18:37:51Z

The current cost model serialization scheme is problematic:
https://github.com/input-output-hk/cardano-ledger/blob/2505b7f103c78ee5a230b3e302e31d9607fffbd5/eras/babbage/test-suite/cddl-files/babbage.cddl#L324-L327

The are two problems:

We cannot add a V3 cost model in the same protocol parameter update that initiates a hard fork. We can only count on nodes to be updated after a hard fork (chainChecks provides this fantastic guarantee), and so nodes that have not updated would not be able to deserialize a V3 cost model. This is operationally annoying for us, and also means we have to wait an extra epoch to use new versions of Plutus.
We cannot add new fields to existing cost models.

We need a more flexible serialization scheme that address these two problems. Note that the ledger deserializes the CBOR list of integers into a map by using keys provided to us by the Plutus library (where the list is assumed to be ordered corresponding to the alphabetical sorting of the key).

The text was updated successfully, but these errors were encountered:

JaredCorduan · 2022-07-13T20:26:35Z

I talked with @Soupstraw , @bezirg, and @michaelpj, we made this plan:

The Plan

The Plutus function mkEvaluationContext will be changed to take [Integer] instead of Map Text Integer (matching the wire spec)
- If mkEvaluationContext receives fewer integers than it needs (for a given version of Plutus), it will return an error message
- If mkEvaluationContext receives at least as many integers as it needs, it will return an evaluation context, possibly with a warning that too many integers were supplied and ignored
The CostModel serialization will be made more permissive.
- It will accept any [int]
- It will store whatever [int] given to it, and also the results of mkEvaluationContext
  - If it is for a version of Plutus that the ledger is unaware of, it will not call mkEvaluationContext but instead store an appropriate error
- Note that we need to figure out a way to become more permissive in the cost models serialization scheme only at a hard fork boundary, in order to not risk splitting the network. Cost models can currently only be changed by way of the governance mechanism, however, so if we trust that all V1 cost models updates prior to a hard fork have exactly 166 integers (and 175 integers for V2), we could just relax the scheme to [int] (at the risk of an operational mistake by the governance holders). In the past, we have handled such a maneuver by replacing at type with a type family, but that tends to be a fair amount of work and complicates the code. Perhaps there are other ideas?

How this works, new language built-ins

Suppose version x of the node supports Plutus V2 cost models with 10 fields. Suppose version x+1 of the node supports Plutus V2 cost models with 11 fields. Let f be the new field, let m be the current major protocol version, and suppose that Plutus V2 will not support f until m+1.

During major protocol m:

Node x sees a cost model with 11 fields, it makes an EvaluationContext with the 10 fields it knows about, but also stores a warning and the original 11 fields.
- It happily evaluates all V2 scripts, and Plutus will gracefully fail (phase 2) if it sees f (it will fail to deserialize).
Node x+1 sees a cost model with 11 fields, it makes an EvaluationContext with all of them.
- It also happily evaluates all V2 scripts, and Plutus will gracefully fail (phase 2) if it sees f (based on m).
Node x shuts down, the operator updates the software, and comes back up with a node x+1. Upon re-serialization of the ledger state, the operator is now in the case above (a normal x+1 node).

During major protocol m+1:

Node x can no longer participate in the chain, due to the ObsoleteNode error.
Node x+1 is ready to process scripts with f in V2.

How this works, new Plutus versions

Suppose version x of the node does not know anything about Plutus V3. Suppose version x+1 of the node adds support for Plutus V3. Let m be the current major protocol version, and suppose that Plutus V3 is introduced at m+1.

During major protocol m:

Node x sees a cost model for V3. It stores the cost model integers, but in place of an EvaluationContext it stores an error.
- Any V3 script will be rejected since x does not know how to deserialize V3.
Node x+1 sees a cost model for V3, it makes an EvaluationContext for it.
- Any V3 script will be rejected by the Plutus evaluator based on m.
Node x shuts down, the operator updates the software, and comes back up with a node x+1. Upon re-serialization of the ledger state, the operator is now in the case above (a normal x+1 node).

michaelpj · 2022-07-14T09:28:24Z

It will store whatever [int] given to it, and also the results of mkEvaluationContext

... which might be an error. Do we need to store the underlying [Int]? I guess it's needed for re-serializing.

the ledger gives the Plutus evaluator m, and Plutus knows f isn't okay during m

This only applies for node x+1. Node x has an old Plutus which doesn't know about f at all, but that's also fine, it'll be a deserialization failure that happens in the same place.

michaelpj · 2022-07-14T09:32:41Z

Any V3 script will be rejected for not having a cost model.

Won't it be rejected before that for simply not being recognized as a known language by node x?

Any V3 script will be rejected by the Plutus evaluator based on m

This is currently not happening: we should change this. We're currently relying on the absence of the cost model to disable a version.

Although in the scenario you mention, node x+1 will see a cost model for V3 in an update proposal, but it won't actually be applied until the HF that introduces V3, right? So it is still fine... but I'd be more comfortable with an explicit guard on our side also.

bezirg · 2022-07-14T09:50:19Z

Any V3 script will be rejected for not having a cost model.

Won't it be rejected before that for simply not being recognized as a known language by node x?

I agree with @michaelpj on this. The ledger calls into the plutus-ledger-api in a static way ; a specific node version can call into a fixed number of statically linked plutus versions, e.g. V1 and V2. If a newer currently uknown language version comes in e.g. V3, the node running the old software will not be statically linked with any V3.

Any V3 script will be rejected by the Plutus evaluator based on m

This is currently not happening: we should change this. We're currently relying on the absence of the cost model to disable a version.

I think we can enforce this on plutus-master by having a simple extra check inside evaluateScriptRestricting/Counting:

when passedProtocolVersion < expectedNextMajorHardForkIntroducingThisVersion
   fail("in phase2")

Although in the scenario you mention, node x+1 will see a cost model for V3 in an update proposal, but it won't actually be applied until the HF that introduces V3, right? So it is still fine... but I'd be more comfortable with an explicit guard on our side also.

Yes, an extra guard on our side can be helpful in case of mismatch of plutus code version with ledger code version

bezirg · 2022-07-14T09:55:59Z

@JaredCorduan your logic makes sense.

Note that we need to figure out a way to become more permissive in the cost models serialization scheme only at a hard fork boundary, in order to not risk splitting the network. Cost models can currently only be changed by way of the governance mechanism, however, so if we trust that all V1 cost models updates prior to a hard fork have exactly 166 integers (and 175 integers for V2), we could just relax the scheme to [int] (at the risk of an operational mistake by the governance holders). In the past, we have handled such a maneuver by replacing at type with a type family, but that tends to be a fair amount of work and complicates the code. Perhaps there are other ideas?

The problem and this thing with the type families I don't get at all, so no ideas from my side.

michaelpj · 2022-07-14T10:02:45Z

@JaredCorduan If we add a check on our side to reject language versions that shouldn't be enabled in particular protocol versions, we have a choice about whether we include it in isScriptWellFormed or not. If we do, this will be a phase 1 failure, if we don't it will be a phase 2 failure. I think we should try and make as many things as possible phase 1 failures, so I'm inclined to do it unless you disagree?

bezirg · 2022-07-14T10:13:24Z

@michaelpj

a) if we make it phase1 error, we preclude submitting+executing any new-language scrips until HF.
b) if we make it phase2 error, we allow submitting new-language scripts but disallow executing them until HF.

I don't know if there is a use-case to do (b).

michaelpj · 2022-07-14T10:20:10Z

I guess it maybe prevents you from submitting a reference script with a new language version before the HF? Does the ledger deserialization refuse to deserialize transactions containing newer scripts from newer languages before the HF?

JaredCorduan · 2022-07-14T13:13:39Z

Do we need to store the underlying [Int]? I guess it's needed for re-serializing.

exactly

the ledger gives the Plutus evaluator m, and Plutus knows f isn't okay during m

This only applies for node x+1. Node x has an old Plutus which doesn't know about f at all, but that's also fine, it'll be a deserialization failure that happens in the same place.

good catch, thank you! I will edit The Plan.

Won't it be rejected before that for simply not being recognized as a known language by node x?

yes, another good catch! I'll edit that as well.

This is currently not happening: we should change this. We're currently relying on the absence of the cost model to disable a version.
Although in the scenario you mention, node x+1 will see a cost model for V3 in an update proposal, but it won't actually be applied until the HF that introduces V3, right? So it is still fine... but I'd be more comfortable with an explicit guard on our side also.

I also much prefer if Plutus could gaurd this for us as well, otherwise it depends on the timing of the updates (ie the governance has to remember that they must never update the new cost model an epoch before the HF vote, etc).

The problem and this thing with the type families I don't get at all, so no ideas from my side.

yea, no worries, this is a problem on the ledger side

Does the ledger deserialization refuse to deserialize transactions containing newer scripts from newer languages before the HF?

That would be great. If we want to allow folks to create reference scripts in advance for unreleased languages, we would need a similar scheme for transaction outputs as we've just made for the cost models. It's the same problem: the current serialization scheme for transaction outputs does not allow for new languages in reference scripts, and we can only guarantee that folks have the new software after a hard fork. Having the isScriptWellFormed check the plutus version against the protocol version provides us a clean transition.

Let me leave future Jared some proof that the current serialization scheme is not flexible:

Babbage uses the same script type as alonzo:
https://github.com/input-output-hk/cardano-ledger/blob/c32acb6e90ed89c58ef411baead4af554518ff0b/eras/babbage/impl/src/Cardano/Ledger/Babbage.hs#L204

Reference scripts are just deserialized as cbor-in-cbor scripts:
https://github.com/input-output-hk/cardano-ledger/blob/c32acb6e90ed89c58ef411baead4af554518ff0b/eras/babbage/impl/src/Cardano/Ledger/Babbage/TxBody.hs#L693

Alonzo script type deserialization is not flexible wrt languages:
https://github.com/input-output-hk/cardano-ledger/blob/c32acb6e90ed89c58ef411baead4af554518ff0b/eras/alonzo/impl/src/Cardano/Ledger/Alonzo/Scripts.hs#L391-L401

JaredCorduan · 2022-07-14T13:15:57Z

I think we should try and make as many things as possible phase 1 failures

Absolutely. We can add this to our not-yet-existent list of guiding principles. (it's not dogma, everything is still open for discussion, though).

michaelpj · 2022-07-14T13:38:40Z

If we want to allow folks to create reference scripts in advance for unreleased languages

I think I would actively prefer it if they can't :D

JaredCorduan · 2022-07-14T14:19:03Z

If we want to allow folks to create reference scripts in advance for unreleased languages

I think I would actively prefer it if they can't :D

That's definitely my preference as well, I was just trying to tease out the implications (a weak form of a proof by contradiction :) )

JaredCorduan · 2022-09-17T22:35:07Z

the last piece of the puzzle here will be solved by #3014 🙌 (namely how to gracefully change the serialization at a hardfork boundary)

JaredCorduan · 2023-01-31T21:24:43Z

🤣 this was accidentally and automatically resolved by me saying:

In order to resolve #2902, we still need to

bezirg · 2023-02-01T12:48:03Z

@JaredCorduan AI at its finest. Joking, it was a regex "AI"

Starting in version 9, 'CostModels' can now be deserialized from any map from Word8 values to lists of integers. Only valid cost models are actual converted to cost models. resolves #2902

michaelpj · 2023-02-16T10:11:42Z

🎉

zliu41 · 2024-04-16T21:54:55Z

@lehins and I had a discussion on this, and the conclusion is that there are a few problems with the "How this works, new language built-ins" part, and it doesn't currently work:

A transaction proposing to update the number of fields to 11 does not deserialize due to the use of decodeCostModelFailHard.
"Node x sees a cost model with 11 fields, it makes an EvaluationContext with the 10 fields it knows about, but also stores a warning and the original 11 fields" does not appear to be implemented correctly.
Most importantly, mkEvaluationContext fails upon receiving fewer parameters than expected. At the beginning of protocol version m, there are only 10 parameters, and it only becomes 11 later. But node x+1 always expects at least 11. How can node x+1 validate the chain then?

cc @michaelpj @bezirg

michaelpj · 2024-04-17T09:40:12Z

Most importantly, mkEvaluationContext fails upon receiving fewer parameters than expected. At the beginning of protocol version m, there are only 10 parameters, and it only becomes 11 later.

So, what we have discussed in this context is I think:

We end up in the situation where the major protocol version says that e.g. PlutusV3 is allowed, but we do not yet have a cost model for it. We need this to be fine.
The only sensible behaviour in this situation is for PlutusV3 scripts to fail at evaluation time. So in the interval between the protocol version change and the cost model being installed, you just can't run any PlutusV3 scripts, which seems okay.
mkEvaluationContext therefore needs to do the following when given too few parameters:
1. Not fail at construction time
2. Fail when evaluating any script

Does that sound about right?

michaelpj · 2024-04-17T09:40:36Z

Also it would be super if we could somehow set up a test for this. I don't know if the ledger has cross-hard-fork tests?

zliu41 · 2024-04-17T11:43:54Z

So, what we have discussed in this context is I think:

I think your first two points are related to the case where we add a new Plutus version, but the problem here is with adding new builtins to an existing Plutus version.

mkEvaluationContext therefore needs to do the following when given too few parameters: Not fail at construction time

The solution outlined above says: "If mkEvaluationContext receives fewer integers than it needs (for a given version of Plutus), it will return an error message". And this has always been how it works.

If we change mkEvaluationContext to not fail when given too few parameters, then a problem (that @lehins brought up in the discussion) is that this would allow someone to submit a proposal reducing the number of cost model parameters, which cannot be allowed. But perhaps we can rely on the committee/DReps to reject such proposals? Either way, there seems to be a lot of work to be done to make this actually work.

zliu41 · 2024-04-17T11:45:23Z

Fail when evaluating any script

We'll also need to make sure there's no significant performance overhead if we need to check the condition at script evaluation time.

michaelpj · 2024-04-17T14:32:41Z

I think your first two points are related to the case where we add a new Plutus version, but the problem here is with adding new builtins to an existing Plutus version.

Right, so it is more subtle. Perhaps the tightest solution is to fail iff we are missing parameters that are needed for a builtin in the script at hand. But that's tricky. One way we could do it would be to set any missing parameters to MAX_VALUE, so that any attempt to use them will just blow out the budget?

lehins · 2024-04-17T22:08:53Z

Ok, we might have a solution for this, which would require minimal amount of work.

The most important part of the solution is for mkEvaluationContext to never fail if it receives 233 or more parameters, regardless of the protocol version that the node is running in.

This would mean that plutus script execution could receive ExecutionContext that was build from 233 parameters at any point, regardless of presence of new primitive or the protocol version that the node is running in. That is because anyone can propose and potentially enact a CostModel update with 233 parameters for PlutusV3, even after the intra-era hardfork that will introduce new primitives.

I also suggested we implement an optional primitive like that right now, which would work if we had 234 parameters in PV9. This would actually allow us to write some tests for this behavior before we hard fork into Conway.

zliu41 · 2024-04-18T21:49:04Z

By the way, The Plan above says:

It happily evaluates all V2 scripts, and Plutus will gracefully fail (phase 2) if it sees f (it will fail to deserialize).

It also happily evaluates all V2 scripts, and Plutus will gracefully fail (phase 2) if it sees f (based on m).

I believe both are phase-1 failures, not phase-2.

lehins · 2024-04-18T22:04:13Z

I believe both are phase-1 failures, not phase-2.

Yes, that will be either MalformedScriptWitnesses or MalformedReferenceScripts phase 1 validation failure, depending on where in the transaction the script is.

In both cases that will have to be plutus script deserialization failure

JaredCorduan added the 💳 technical-debt Issues related to technical debt we introduced label Jul 8, 2022

Soupstraw assigned Soupstraw and unassigned Soupstraw Jul 12, 2022

JaredCorduan mentioned this issue Sep 17, 2022

Remove dependency on cardano-prelude #2992

Merged

This was referenced Sep 17, 2022

Simplify CostModel internals #3042

Closed

CIP-0059 | Add dates and slot numbers to the table cardano-foundation/CIPs#333

Merged

JaredCorduan added the anachronistic adr with hindsight on our side, this is a good ADR label Dec 2, 2022

JaredCorduan mentioned this issue Jan 30, 2023

version the new cost model deserializers #3274

Merged

6 tasks

JaredCorduan closed this as completed in #3274 Jan 31, 2023

JaredCorduan reopened this Jan 31, 2023

JaredCorduan mentioned this issue Feb 3, 2023

introduce flexible cost model decoding #3283

Merged

6 tasks

JaredCorduan closed this as completed in #3283 Feb 15, 2023

zliu41 mentioned this issue Mar 8, 2024

Add a test for relaxed ledger cost model serialization IntersectMBO/cardano-node-tests#2285

Open

zliu41 reopened this Apr 16, 2024

lehins self-assigned this Apr 17, 2024

lehins added the conway label Apr 17, 2024

lehins added this to To do in Conway via automation Apr 17, 2024

zliu41 mentioned this issue Apr 18, 2024

Change CMTooFewParamsError to a warning IntersectMBO/plutus#5912

Merged

lehins moved this from To do to In progress in Conway Apr 19, 2024

lehins mentioned this issue Apr 19, 2024

Flexible costmodel params #4284

Merged

9 tasks

lehins closed this as completed in #4284 May 8, 2024

Conway automation moved this from In progress to Done May 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problematic cost model serialization #2902

Problematic cost model serialization #2902

JaredCorduan commented Jul 8, 2022

JaredCorduan commented Jul 13, 2022 •

edited

michaelpj commented Jul 14, 2022

michaelpj commented Jul 14, 2022

bezirg commented Jul 14, 2022 •

edited

bezirg commented Jul 14, 2022

michaelpj commented Jul 14, 2022

bezirg commented Jul 14, 2022

michaelpj commented Jul 14, 2022

JaredCorduan commented Jul 14, 2022

JaredCorduan commented Jul 14, 2022

michaelpj commented Jul 14, 2022

JaredCorduan commented Jul 14, 2022

JaredCorduan commented Sep 17, 2022 •

edited

JaredCorduan commented Jan 31, 2023

bezirg commented Feb 1, 2023 •

edited

michaelpj commented Feb 16, 2023

zliu41 commented Apr 16, 2024

michaelpj commented Apr 17, 2024

michaelpj commented Apr 17, 2024

zliu41 commented Apr 17, 2024

zliu41 commented Apr 17, 2024

michaelpj commented Apr 17, 2024

lehins commented Apr 17, 2024

zliu41 commented Apr 18, 2024

lehins commented Apr 18, 2024

Problematic cost model serialization #2902

Problematic cost model serialization #2902

Comments

JaredCorduan commented Jul 8, 2022

JaredCorduan commented Jul 13, 2022 • edited

The Plan

How this works, new language built-ins

How this works, new Plutus versions

michaelpj commented Jul 14, 2022

michaelpj commented Jul 14, 2022

bezirg commented Jul 14, 2022 • edited

bezirg commented Jul 14, 2022

michaelpj commented Jul 14, 2022

bezirg commented Jul 14, 2022

michaelpj commented Jul 14, 2022

JaredCorduan commented Jul 14, 2022

JaredCorduan commented Jul 14, 2022

michaelpj commented Jul 14, 2022

JaredCorduan commented Jul 14, 2022

JaredCorduan commented Sep 17, 2022 • edited

JaredCorduan commented Jan 31, 2023

bezirg commented Feb 1, 2023 • edited

michaelpj commented Feb 16, 2023

zliu41 commented Apr 16, 2024

michaelpj commented Apr 17, 2024

michaelpj commented Apr 17, 2024

zliu41 commented Apr 17, 2024

zliu41 commented Apr 17, 2024

michaelpj commented Apr 17, 2024

lehins commented Apr 17, 2024

zliu41 commented Apr 18, 2024

lehins commented Apr 18, 2024

JaredCorduan commented Jul 13, 2022 •

edited

bezirg commented Jul 14, 2022 •

edited

JaredCorduan commented Sep 17, 2022 •

edited

bezirg commented Feb 1, 2023 •

edited