Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kwxm/bls12-381/prototype (PLT-192, PLT-1557, PLT-1554, etc). #5231

Merged
merged 230 commits into from
Jun 1, 2023

Conversation

kwxm
Copy link
Contributor

@kwxm kwxm commented Mar 24, 2023

Introduction

This PR adds new built-in types and functions to Plutus Core and PlutusTx for pairings over BLS12-381, as specified in CIP-0381. These depend on Haskell bindings to the blst C library which are currently the subject of an open PR in cardano-base. In order not to depend on a branch, the code for these has been coped into plutus-core/cbits and plutus-core/src/plutus-core/Crypto/External. We should be able to merge this PR as-is, but ideally we'd wait until the cardano-base PR is merged and remove the copied files from plutus-core.

Details

The PR introduces three new PLC types:

  • bls12_381_G1_element
  • bls12_381_G2_element
  • bls12_381_MlResult

and seventeen new built-in functions:

bls12_381_G1_add                : [ bls12_381_G1_element, bls12_381_G1_element ] -> bls12_381_G1_element
bls12_381_G1_neg                : [ bls12_381_G1_element ] -> bls12_381_G1_element
bls12_381_G1_scalarMul          : [ integer, bls12_381_G1_element ] -> bls12_381_G1_element
bls12_381_G1_equal              : [ bls12_381_G1_element, bls12_381_G1_element ] -> bool
bls12_381_G1_hashToGroup        : [ bytestring, bytestring ] -> bls12_381_G1_element
bls12_381_G1_compress           : [ bls12_381_G1_element ] -> bytestring
bls12_381_G1_uncompress         : [ bytestring ] -> bls12_381_G1_element

bls12_381_G2_add                : [ bls12_381_G2_element, bls12_381_G2_element ] -> bls12_381_G2_element
bls12_381_G2_neg                : [ bls12_381_G2_element ] -> bls12_381_G2_element
bls12_381_G2_scalarMul          : [ integer, bls12_381_G2_element ] -> bls12_381_G2_element
bls12_381_G2_equal              : [ bls12_381_G2_element, bls12_381_G2_element ] -> bool
bls12_381_G2_hashToGroup        : [ bytestring, bytestring ] -> bls12_381_G2_element
bls12_381_G2_compress           : [ bls12_381_G2_element ] -> bytestring
bls12_381_G2_uncompress         : [ bytestring ] -> bls12_381_G2_element

bls12_381_millerLoop            : [ bls12_381_G1_element, bls12_381_G2_element ] -> bls12_381_mlresult
bls12_381_mulMlResult           : [ bls12_381_mlresult, bls12_381_mlresult ] -> bls12_381_mlresult
bls12_381_finalVerify           : [ bls12_381_mlresult, bls12_381_mlresult ] -> bool

The basic definitions of all of these things are in plutus-core/plutus-core/src/PlutusCore/Crypto/BLS12_381/, in the files G1.hs, G2.hs, and Pairing.hs. The following sections will attempt to give an idea of what all of these do, but some mathematical background will be required first.

Background

Abelian groups

An abelian group is a commutative monoid in which every element has an inverse. These are usually written in an additive style where the operation is +, the identity element is 0, and the inverse of an element a is -a, but they are sometimes written in a multiplicative style where the operation is *, the identity element is 1, and the inverse of an element a is a⁻¹. The types G1 and G2 above represent finite additive abelian groups and MlResult represents a multiplicative one.

For any abelian group one can define a notion of scalar multiplication by integers. Given an element a of an abelian group and an integer n, n*a is defined to be 0 if n=0, a+...+a n times if n>0, and (-a) + ... + (-a) -n times if n<0. In multiplicative notation this scalar "multiple" is written as a power aⁿ.

The add, neg, and scalarMul operations above represent these operations in the groups G1 and G2; in addition there is an equality operation, serialisation (compress) and deserialisation (uncompress) operations, and a hashToGroup operation which takes an arbitrary bytestring and a domain separation tag and produces a hash in the form of an element of G1 or G2.

This leaves three functions: mulMlResult, which performs group multiplication in the group MlResult; millerLoop, which takes an element of G1 and an element of G2 and produces an element of MlResult (this is the actual pairing operation), and finalVerify, which essentially compares two elements of MlResult for equality. The last two operations (millerLoop and finalVerify) are very expensive and are typically only expected to be called a few times in on-chain code.

To explain more of the details of the new functions we need some extra background.

Finite fields

A field F is a set equipped with two operations + and *and elements 0 and 1 such that

  • (F, 0, +) is an abelian group
  • (F, 1, *) is a commutative monoid such that every nonzero element of F has a multiplicative inverse (so F\{0} is an abelian group under multiplication).
  • Multiplication is distributive over addition: a*(b+c) = a*b + a*cfor all a,b,c in F.

Familiar examples of fields are , , and (but not since not every element has a multiplicative inverse in ).

There are also finite fields, and these can be completely classified. It can be shown that for every prime number p and every positive integer n there is a field of order pⁿ (the order of a field or group is its cardinality, the number of elements it contains), and these account for all of the finite fields up to isomorphism. These fields are usually denoted by 𝔽_{pⁿ} or GF(pⁿ) or GF(p,n) or something similar; I'll use GF(pⁿ) because it's easier to type.

For n=1, GF(p) is just ℤₚ with the usual operations, the important point being that every nonzero element there has a multiplicative inverse modulo p, unlike ℤₙ if n is not prime (what would be the inverse of 6 in ℤ₂₄, for example?).

For n>1, we can regard GF(p,n) as consisting of n-tuples of elements of ℤₚ; addition is usual coordinatewise addition, but multiplication will be something much more convoluted (and expensive), where the i-th entry in a product will be a complicated combination of all of the elements of the two tuples being multiplied.

Elliptic curves

For the purposes of this PR, an elliptic curve E over a field F can be regarded as an equation of the form y² = x³+ax+b, where a and b are constants from F. The set of all pairs (x,y) ∈ F×F which satisfy the equation is also referred to as "the curve". When the field is an elliptic curve is what you'd usually think of as a curve (see Google), but when F is a finite field the so-called curve is an apparently random set of pairs of field elements. However, although the geometric structure has vanished, there is an extremely rich algebraic structure that still makes sense.

A fundamental property of elliptic curves is that they are also abelian groups: given two points P and Q on an elliptic curve E one can define another point P+Q which also lies on the curve. There's a geometric picture for this over (see Google again), but also an algebraic form which applies over any field. The coordinates of P+Q are complicated algebraic functions of the coordinates of P and Q. In particular, if you're given a point P and the point nP (= P+...+P) for some integer n, it's not easy to work out what n is: this is the discrete logarithm problem on E and is believed to be computationally intractable in general, which is one reason why elliptic curves are useful in cryptography.

Pairings

A pairing is a function e: G₁ × G₂ → G_T where G₁, G₂ and G_T are abelian groups (the T's supposed to be a subscript, but I can't find a Unicode character for that). In our case, G₁ and G₂ are written additively, but to complicate matters, G_T is written multiplicatively. A pairing is require to be bilinear: for p, p' ∈ G₁, q, q' ∈ G₂, and n ∈ ℤ we should have

  • e(p+p',q) = e(p,q) * e(p',q) (remember that the operation in G_T is multiplication)
  • e(p,q+q') = e(p,q) * e(p,q')
  • e(n*p,q) = e(p,q)ⁿ = e(p,n*q)

The final property here is one reason that the concept of pairing useful in cryptography: it allows one person to perform a computation in G₁ and another to perform a computation in G₂ and then the results can be compared in G_T without having to share information (or at least that's my simplistic interpretation: the details are complicated).

Back to Plutus

The situation in BLS12-381 is as follows:

  • Everything happens over a field of order q where is a 381-bit prime (approximately 10^115)
  • G₁ is a subgroup of an elliptic curve E₁ defined over GF(q)
  • G₂ is a subgroup of an elliptic curve E₂ defined over GF(q²)
  • G_T is a subgroup of the multiplicative group of nonzero elements of GF(q¹²) (the elements of this field are huge: you need 12x48 = 576 bytes to hold a single element)
  • G₁, G₂, and G_T are all isomorphic to each other and are of order p where p is a 255-bit prime (about 10^77). In fact, all of these groups are isomorphic to (ℤₚ, +), but in very obfuscated ways.
  • It is also the case that p*a = 0 for all a ∈ G₁, and similarly for G₂ , so scalar multiplication is periodic modulo p. This means that the efficiency of computing a scalar multiple n*a can be improved by reducing n modulo p before performing that multiplication (although scalar multiplication is still quite expensive since we've got up to 255 bits to multiply by). The reduction is performed automatically by the scalarMul functions.
  • There is a pairing e: G₁ × G₂ → G_T, and combining millerLoop and finalVerify allows one to check whether e(a,b) = e(a',b') for elements a,a' ∈ G₁ and b,b' ∈ G₂. The fact that such a pairing exists is not obvious and depends on some rather complicated mathematics, which itself has undergone a lot of refinement to enable efficient computation.

Serialisation and compression

We can now explain how elements of G₁ and G₂can be serialised as bytestrings.

It follows from the facts above that every element of G₁ can be represented as a pair (x,y) with x,y ∈ GF(q), and each of x and y take up 381 bits, so each fits into 48 bytes with 3 bits left over. Thus (x,y) pairs can be serialised in this form, a pair taking up a total of 96 bytes. However, this is inefficient in terms of space: remember that the point satisfy an equation y² = x³ + ax +b, so if you know the x-coordinate of a point and can compute square roots in GF(q) then you can calculate the y-coordinate of the point and save yourself 48 bytes (with some complications that we'll see later). A similar situation holds for elements of G₂, except that each coordinate requires 96 bytes. There are thus two possibilities for how group elements can be serialised into bytestrings:

  • You can write both coordinates and then when you deserialise you read them in and check that the x and y values satisfy the equation of the curve.
  • You can just save the x coordinate and calculate the y coordinate from that using the equation of the curve. However there are two more complications here
    • Not every point in GF(q) has a square root, so your serialised value may not correspond to a point on the curve.
    • The nonzero points which have square roots have two distinct square roots, so you have a choice of two points on the curve. One of the three spare bits in the 48-byte representation is used to decide which point you want.
  • A further complication is that the zero point on the curve is actually an extra point called "the point at infinity" that effectively has an infinite y coordinate. Another of the spare bits is used to indicate the encoding of the point at infinity.

The blst library supports both of these formats, using the final spare bit to indicate which serialisation method is intended (see the specification here) (the "spare" bits are the most significant (leftmost) bits in the bytestring encoding) . The first method is called "serialisation" and the second "compression". There's a trade-off here: the serialised format is space-inefficient but deserialisation is fast, whereas for the compressed format only half the space is required but uncompression is slower because of the need to extract a square root. In fact, experiments show that uncompression takes about 20 times longer than deserialisation for G₁ and about 30 times longer for G₂. Despite this we have chosen to use the compressed format in Plutus Core, providing uncompression functions bls12_381_G1_uncompress and bls12_381_G1_uncompress, along with corresponding compression functions. We may wish to rethink this at some point, possibly adding serialisation and deserialisation if the time penalty for uncompression is too great.

Our representation is more restrictive than that of blst: blst allows serialisation of any point on the curve E₁ (or E₂), but during uncompression we reject any points which do not lie in the subgroup G₁ (or G₂). The subgroups are much smaller than the entire curve group (|E₁|/|G₁| ≈ 7×10^37 and |E₂|/|G₂| ≈ 3×10^152) so the chance of being able to successfully decompress a random bytestring to get a point in a subgroup is negligible.

Concrete syntax for G₁ and G₂ elements.

We have also used the compressed representation for the concrete syntax for G₁ and G₂ elements. A constant of type G₁ is represented by (con bls12_381_G1_element 0x...) where ... consists of 96 hex digits, and a G₂ element is similar but with 192 hex digits.

Hashing

We also provide two functions bls12_381_G1_hashToGroup and bls12_381_G2_hashToGroup which, as the names suggest, take an arbitrary bytestring and produce an element in the relevant group (not just on the curve). Hashing and uncompression are the only means of obtaining group elements directly. The other group operations preserve the property of being in the subgroup, so none of our functions can can give us arbitrary curve elements.

Pairing operations and bls12_381_mlresult

In the BLS12-381 situation, what one usually wishes to do is to calculate two pairings to get two elements of G_T and then compare these results for equality. This is an expensive computation and it is considerably more efficient (though still expensive) to calculate values of a so-called non-reduced pairing in an intermediate group closely related to G_T and then at the end perform a calculation (final verification) that checks that two elements of the intermediate group become equal when lifted into G_T.

In Plutus Core, the intermediate group is called bls12_381_mlresult, and the operations we can perform on this type are (deliberately) very limited. The only way to obtain a value of type bls12_381_mlresult is as a result of calling bls12_381_millerLoop (hence the name), or by multiplying together two existing elements of that type. We do not provide any concrete syntax, textual printing method, or serialisation format (including flat) for values of type bls12_381_mlresult.

Note also that the operations on bls12_381_mlresult are quite expensive: the (constant) CPU costs are (currently) as follows:

 bls12_381_millerLoop: 402099373
 bls12_381_mulMlResult: 2544991
 bls12_381_finalVerify: 388656972

Fortunately this is in line with the expected on-chain usage patterns: in realistic applications one would tend to perform a few calls to millerLoop, some multiplications, and one call to bls12_381_finalVerify, and costing benchmarks (cabal run bls-costs) suggest that this can be done in a reasonable time. For example, a realistic zk-snark verification takes about 23% of the on-chain script execution limit (memory costs are negligible).

flat serialisation

The canonical storage format for Plutus Core scripts is the flat format (see Appendix E of the Plutus Core specification (pdf here)). We use the compression format described earlier for flat serialisation of elements of type bls12_381_G1_element and bls12_381_G2_element: these are compressed to bytestrings which are then encoded using the normal flat encoding (but with tags 9 and 10 respectively). Deccoding can fail for a number of reasons: the bytestring can be the wrong length (it must be exactly 48 bytes long for bls12_381_G1_element and 96 for bls12_381_G2_element); the encoding can be invalid because the three format bits are used improperly; the encoding may specify a valid field element but one which is not the x-coordinate of a point on the curve ("off-curve"); and the encoding may specify a genuine point on the curve but one which is not in the relevant subgroup ("out-of-group"). All of these are described in the BLSTError type defined in the blst Haskell bindings.

There is no means of serialising bls21_381_mlresult values. The tag 11 has been allocated to this type, but if it is encountered while decoding a flat file then a failure occurs.

Costing

Fortunately costing is quite simple for the BLS12-381 operations: costing benchmarks indicate that everything is constant cost apart from the hashing functions and the scalar multiplication functions, which are linear in the size of their input.

@kwxm
Copy link
Contributor Author

kwxm commented May 26, 2023

Thanks for the comments @effectfully, especially the one about has collisions. I think I've addressed all of them except for the one about explicit dictionaries: I might come back to that later.

@kwxm
Copy link
Contributor Author

kwxm commented May 30, 2023

@angerman I think this PR is blocked because you requested changes because of the Windows build problem. I believe we decided that you'd look into that separately, so is it OK to merge this now? I had some trouble merging the nix files with those in master (but eventually managed to get things into a state where nix develop worked), so you might want to check that they look reasonable.

@angerman
Copy link
Contributor

@kwxm here's the haskell.nix bump PR: #5366

I've also tried to just bump haskell.nix on this PR, but that resulted in

       > untyped-plutus-core/src/UntypedPlutusCore/Transform/Inline.hs:284:29: error:
       >     • Variable not in scope:
       >         asum
       >           :: [Maybe (Term name1 uni1 fun1 ann)]
       >              -> Maybe (Term name1 uni1 fun1 ann)
       >     • Perhaps you meant ‘sum’ (imported from Prelude)
       >     |
       > 284 |         Constr _ _ ts    -> asum $ goTerm <$> ts
       >     |              

after I pulled the latest changes 😕

@angerman
Copy link
Contributor

So this fails on windows with

[168 of 171] Compiling UntypedPlutusCore.Transform.Inline ( untyped-plutus-core/src/UntypedPlutusCore/Transform/Inline.hs, dist/build/UntypedPlutusCore/Transform/Inline.o )

untyped-plutus-core/src/UntypedPlutusCore/Transform/Inline.hs:284:29: error:
    • Variable not in scope:
        asum
          :: [Maybe (Term name1 uni1 fun1 ann)]
             -> Maybe (Term name1 uni1 fun1 ann)
    • Perhaps you meant ‘sum’ (imported from Prelude)
    |
284 |         Constr _ _ ts    -> asum $ goTerm <$> ts
    |                  

too.

I don't think the Darwin -> Windows pipeline works, and I'm not sure why this is in CI.

@angerman
Copy link
Contributor

Notable I don't see that failure on master (with bumped haskell.nix): #5366

Copy link
Contributor

@angerman angerman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should revert this back to the original for now.

nix/cells/automation/ciJobs.nix Outdated Show resolved Hide resolved
nix/cells/automation/ciJobs.nix Outdated Show resolved Hide resolved
@kwxm
Copy link
Contributor Author

kwxm commented May 31, 2023

@kwxm here's the haskell.nix bump PR: #5366

I've also tried to just bump haskell.nix on this PR, but that resulted in

       > untyped-plutus-core/src/UntypedPlutusCore/Transform/Inline.hs:284:29: error:
       >     • Variable not in scope:
       >         asum
       >           :: [Maybe (Term name1 uni1 fun1 ann)]
       >              -> Maybe (Term name1 uni1 fun1 ann)
       >     • Perhaps you meant ‘sum’ (imported from Prelude)
       >     |
       > 284 |         Constr _ _ ts    -> asum $ goTerm <$> ts
       >     |              

after I pulled the latest changes confused

That's weird: asum should be imported from Control.Applicative here. I have no idea why that would have happened (and the file that's causing the error is completely untouched by this PR). Also, I don't get an error on Linux after pulling your changes and restarting nix develop.

@angerman
Copy link
Contributor

angerman commented Jun 1, 2023

@kwxm yes, I'm also a bit surprised by this. Even more so that I don't see this in master 😅

@michaelpj
Copy link
Contributor

Looks like it succeeded but the status isn't getting reported for some reason.

@angerman
Copy link
Contributor

angerman commented Jun 1, 2023

I think this is good to merge. IDK why hydra complains. We could forcepush. I'll force push once.

@michaelpj michaelpj merged commit 996c7fc into master Jun 1, 2023
5 checks passed
@michaelpj michaelpj deleted the kwxm/BLS12_381/prototype branch June 1, 2023 15:31
@kwxm
Copy link
Contributor Author

kwxm commented Jun 1, 2023

Slightly premature, since I was just about to push some more changes! I'll open a new branch for that.

@kwxm kwxm mentioned this pull request Jun 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants