Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Candidates for in-circuit zkSNARK hash functions #36

Closed
HarryR opened this issue Aug 30, 2018 · 5 comments
Closed

Candidates for in-circuit zkSNARK hash functions #36

HarryR opened this issue Aug 30, 2018 · 5 comments

Comments

@HarryR
Copy link
Owner

HarryR commented Aug 30, 2018

The criteria is:

  • Low multiplicative complexity
  • Implementable in Solidity / EVM with low gas cost
  • Work as a hash function
  • Can be used to construct a Merkle tree

Candidates from authenticated MACs:

Candidates from universal hashing:

Specifically for the Merkle tree, where all information is public, all that we're concerned about is collision resistance and malleability. Specifically in this case the hash127 algorithm can be used with a fixed key (r) per tree level. Where s = (r^{n+1} + (r^n * m_0) + (r^{n-1} * m_1) + ... + (r * m_{n-1})) mod p, m is split into 32 bit chunks and m != 0. The values of r^n etc. can be pre-computed.

Concerning malleability, where each message chunk is the size of a field element we can construct an input which resets the intermediate sum into a known state, the smaller the chunk size the less malleable it becomes at the cost of greater multiplicative complexity. For example with field elements as message input we could just find something that when multiplied by r is the difference between the current sum and our target value to produce a collision, and the more message blocks you have control over the easier this gets.

However, splitting into individual bits increases the complexity of the circuit as we need to verify the bitness of every message input to prevent malleability. If the messages are 256 bits each and the merkle tree is 29 deep, that requires 7424 constraints to verify all input bits. Where we compress two messages into 1 field element using 32 bit blocks it requires an additional 16 constraints per level (464 for 29 levels). Then an additional 464 constraints to compute the polynomial for the 29 levels, which puts the total at 8355 constraints.

There is another approach for LongsightF where, when used as a cipher, it may still be usable as the merkle path authenticator there the number of rounds is less significant. In this case we know that for any given result we can walk backwards to find any number of colliding inputs with control over both input parameters, essentially when used as a merkle path authenticator its construction is similar to a sponge construction in duplex mode.

However, the security of the whole authentication path comes down to finding a single colliding input which is essentially the first item in the Merkle tree path, which means the complexity isn't improved by using the merkle-damgård construct, that is finding appropriate inputs for level 28 is as linear after running level 29 in reverse. A good reference for different constructions is: https://www.emsec.rub.de/media/crypto/attachments/files/2011/03/bartkewitz.pdf

The clearest example I can think of which demonstrates this is the collision resistance of LongsightL where the key can be chosen - if you can choose the keys for each round it's very malleable. However, if the round keys are derived via squaring of a single input then you've essentially eliminated the malleability.

e.g.

def DoublevisionH_a(k, m, r, p):
    x = k
    for _ in range(0, r):
        y = (m + x) % p
        m = (y * y) % p
        x = (x * k) % p
    return m

Even with only 2 values of m this intuitively becomes an intractable problem, where for N rounds the message is multiplied by the square of the key, this makes it very similar to the polynomial used by hash127 and Poly1305+AES with the exception that the message blocks and round constants are derived from two initial values.

The problem we have is the parity of quadratic residue where both m and k are squares after the first round, for example there is a linear route backwards and we know the round function isn't bijective. Interesting references:

When used with the curve order of altBN, as used by the field of the zkSNARK circuit, we can ensure that for any fixed k and any 0 < m < p the result will be a bijection as exponentiating by 5 is also a bijection:

def DoublevisionH_b(m, p, r, k):
	x = k
	for _ in range(0, r):
		y = (m + x) % p
		m = powmod(y, 5, p)
		x = x * k
	return m

But, we can't generalise this for any k and any m as there will be overlap while still satisfying the pigeon hole principal. The best thing we can do is to use two bijective sequences, which in turn are also bijective where either k or m are fixed.

def DoublevisionH_c(m, p, r, k):
	x = k
	for _ in range(0, r):
		y = (m + x) % p
		m = powmod(y, 5, p)
		x = powmod(x, 5, p)
	return m

The problem with this though, is that by controlling either k or m you can force y to be zero, which reduces the complexity to finding an initial k than when squared/exponentiated r times results in the desired m.

Another problem is that m and k can be switched and the result is the same, when used in a Merkle tree this is bad as we need to be strictly order-enforcing, whereas when x is squared at each iteration the order is enforced, likewise if k is exponentiated in a bijective fashion m can be squared and the result is still a bijection.

The y step can be either additive or multiplicative.

However, after further testing the problem with either of these is that when m is fixed and k is variable it isn't a bijection, the function can be modified to satisfy this requirement, but then only the last value of y is used which removes a layer of complexity.

def DoublevisionH_d(m, p, r, k):
	for _ in range(0, r):
		y = (m + x) % p
		m = powmod(m, 5, p)
		k = powmod(k, 5, p)
	return y

Adding y to either k or m retains the quality of being a bijection where the same argument that's fixed is the one which y is added to. e.g. for all k \in Z_p and k=(k+y)^5 where m is constant and visa versa.

def DoublevisionH_e(m, k, p):
	y = 0
	r = 4
	for _ in range(0, r):
		y = (m + k) % p
		m = powmod(m, 5, p)
		k = powmod((k + y) % p, 5, p)
	return y

In my pigeonhole surjection test, which measures the distribution of pigeons in holes for all N and M for H(N,M) e.g. for N in range(0,p): for M in range(0,p): H(N,M), the standard deviation for LongsightF hovers at around 2 and the variance at 4. Whereas with DoublevisionH_e the standard deviation hovers at around 1 and variance at 1. If I add y to both k and m before exponentiating them we get similar standard deviation and variance as LongsightF.

So we have a measurable quality which indicates a bijection that the variance is ~1 and stddev is ~1, whereas without a bijection the stddev is ~2 and variance ~4 which indicates the bijective flavour is more consistently distributed, which indicates less randomness.

But, is more random distribution a good quality? Is there a relation between the bijective nature when one parameter is constant, versus something which isn't a bijection. My gut feel is that, with a large enough |p| the additional complexity of having both m and k be x=(x+y)^5 computationally binds both parameters to the value from the previous round. It doesn't solve the "first round y is zero" problem though, but I think that's unsolvable in that you can always algebraically negate the first round with control over one parameter - but that doesn't give you control over the other one which is the crux of the matter.

Which leaves us with one interesting candidate:

def DoublevisionH_f(m, k, p, r):
	y = 0
	e = 5
	for _ in range(0, r):
		y = (m + k) % p
		m = powmod((m + y) % p, e, p)
		k = powmod((k + y) % p, e, p)
	return y

Where the exponent 5 is replaced with whatever is a bijection for that specific field. However, it still suffers the problem of the argument order being reversible due to the commutative nature of the y step, and for a Merkle tree it's absolutely necessary to have ordering.

For the ordering property we can use two sequences of round constants for both m and k, which becomes:

def DoublevisionH_g(m, k, p, r, C):
	y = 0
	e = 5
	for i in range(0, r):
		y = (m * k) % p
		m = powmod((m + y + C[(2*i)]) % p, e, p)
		k = powmod((k + y + C[(2*i)+1]) % p, e, p)
	return y

This ensures that, for round 0, when k and m are equal, y will also be equal, but for round 1, the sequences m and k will have diverged and will be dependent upon their initial value.

Furthermore, by using multiplication in the step for y the degree of the polynomial increases at each round, such that deg(f(x)g(x)) = deg(f(x)) + deg(g(x)) we can show that:

  • Round 1
    • y = m * k
    • deg(y) = 0
    • m = (m + y + C_?)^5
    • deg(m) = 5
    • k = k + y + C_?
    • deg(k) = 5
  • Round 2
    • y = m * k
    • deg(y) = deg(m) + deg(k) = 10
    • m = (m + y + C_?)^5
    • deg(m) = 10 (as the degree of y is 10)
    • k = k + y + C_?
    • deg(k) = 10
  • Round 3
    • y = m * k
    • deg(y) = deg(m) + deg(k) = 20
    • ...
  • ...

As such, the degree doubles at every round, such that after 127 rounds the degree exceeds 2^128, after 30 rounds the degree exceeds 2^31 etc. However, does that mean a collision could be found in 2^r probability using higher order differential cryptanalysis? And furthermore, is my statement about the degree of the polynomial correct, or is it limited to 5 as that's the outside exponent of the k and m equations?

An alternative which pushes the degree further up into the statement is:

def DoublevisionH_h(m, k, p, r, C):
	y = 0
	e = 5
	for i in range(0, r):
		y = (m * k) % p
		m = (powmod((m + C[(2*i)]) % p, e, p) + y) % p
		k = (powmod((k + C[(2*i)+1]) % p, e, p) + y) % p
	return y

Which makes it:

  • Round 0
    • y = (m * k)
    • m = (m + C_?)^5 + (m * k)
    • k = (k + C_?)^5 + (m * k)
  • Round 1
    • y = ((m+C_?)^5 * (k+C_?)^5)
    • deg(y) = 10
    • m = (m + C_?)^5 + y
    • deg(m) == 10 ?
    • etc.
  • etc.

In that case, if you were to multiply by y in stages m and k surely that would increase the rate of the degree at every iteration, at the cost of multiplicative complexity, for example:

def DoublevisionH_i(m, k, p, r, C):
	y = 0
	e = 5
	for i in range(0, r):
		y = (m * k) % p
		m = (powmod((m + C[(2*i)]) % p, e, p) * y) % p
		k = (powmod((k + C[(2*i)+1]) % p, e, p) * y) % p
	return y

Where:

  • Round 0 = degree 0
  • Round 1 = degree 10
  • Round 2 = degree 30
  • Round 3 = degree 70
  • etc.

However, we've diverged far away from my original proof of the bijective property, and gotten into very speculative musings of the degree of polynomials and its relation to security under higher order differential analysis.

The following paper by Xuejia Lai researches on the security of multivariate hash functions: [1] https://eprint.iacr.org/2008/350.pdf - which references MQ-HASH - [2] https://pdfs.semanticscholar.org/a258/904c3e021df8c2de621b7dfa4dc504c8f3b2.pdf - 'On Building Hash Functions From Multivariate Quadratic Equations - Billet, Robshaw & Peyrin' which is further supplemented by 'Interpreting Hash Function Security Proofs' - [3] https://infoscience.epfl.ch/record/172008/files/Juraj-Provsec2010.pdf

In [1] Lai states the degree of MQ-HASH is four... where it's necessary to compute the dth derivative for 2^d inputs, where d is the degree of the multivariate function, if the derivative is a constant we can distinguish the function sampled from a uniformly distributed function, although soemtiems its difficult to translate from mathlish, but it implies that a higher order differential attack requires computation equal to 2^d where d is the degree of the polynomial.

The reason I wanted to rely strongly on the property of a bijection for either k or m is to reduce the sparseness of the polynomial surjection for both k and m. In § 4.1 of [1] Lai describes the probability of trivial collisions, however this is represented over GF(2) rather than GF(p).

.... brain slowly melts I think a better way of analysing this would be via calculus and its relation to quadratic residue over a field.

Either way, we can break this down into the statement:

x[i] = (x[i-1] + C[i])^5 * x[i-1]

Comparative to the MiMC round function:

x[i] = (x[i-1] + C[i])^5 + x[i-2]

Where the MiMC function uses two variables, denoted by x[i-1] and x[i-2], our round function when truncated to a single variable doubles the polynomial degree. When expanded to account for two variables it becomes:

z[i] = x[i-1] * y[i-1]
x[i] = (x[i-1] + Cx[i])^5 * z[i]
y[i] = (y[i-1] + Cy[i])^5 * z[i]

So, to summarise, we now have a compression function which:

  • takes two arbitrary field elements as inputs
  • the input parameters are ordered after at least 1 round
  • the following are intractable problems:
    • finding the inverse from the result
    • finding a target collision using either k or m where the other element is a constant

I think this is good enough to be used as a compression function to form a Merkle tree.

@HarryR
Copy link
Owner Author

HarryR commented Aug 31, 2018

The next step of analysis for these polynomial hash functions is to extend my basic tests where we iterate across all k and m inputs in Z_p for a range of small-ish primes to determine, for each variant:

  • Are there collisions in state
  • Are there any fixed points
  • Are there duplicate sequences
  • Is the output distribution random
  • Parity of each variable

For each candidate we need to output the probabilities, compared to the field size, of each occurrence.

@HarryR
Copy link
Owner Author

HarryR commented Aug 31, 2018

In retrospect, and after analysis of several of these, I think it's a bad idea for me to be creating new crypto, and I should limit myself to using other peoples primitives in a way which relies on their known properties.

For example, knowing the encryption key for the LongsightF cipher you can create any collision, but it's still target collision resistant - in that when the input must satisfy some other hard problem aside from just finding any collision, then you end up with hardness of the birthday problem where it matches your birthday.

Say, for example, if the leaf is H(derive-pubkey(sk) || pubkey2) you cannot find any one collision as in-circuit you need to prove you know sk for which the first public key was derived, and secondly that on-chain you prove pubkey2 via ecrecover.

My only problem with LongsightF is that you can truncate all N levels of the merkle tree authentication path in linear time, which is why we should use a different hash or cipher for the merkle tree. Secondly, the MiMC function has a low polynomial degree which makes it a candidate for higher order differential analysis.

Either way, we have a candidate for a keyed on-way permutation function with round constants:

def DoublevisionH_e(m, k, p, C, e=5):
        Ck = (m + k) % p
	for C_i in C:
		m = powmod((m + C_i) % p, e, p)
		k = powmod((k + Ck) % p, e, p)
		Ck = (m + k) % p
	return Ck

For every k with a constant m it is a bijection and emits a unique sequence specific to that m, for every C it is also a unique sequence and bijection. This is an interesting function. When run through the pigeonhole principal tests, that is for every k and every m we increment found[f(k,m)] += 1. The variance of the values of found is very close to 2 (and standard deviation is sqrt(variance), whereas LongsightF has a variance closer to 4 which indicates DoublevisionH_e is less sparse - e.g. it's less likely for any two inputs to produce a collision.

Is there a relation between k and m, and the bijective nature of a fixed m and every k in sequence? e.g. is one more usable as a key rather than a message or visa versa? You could rephrase this such that for every different key the message maps to exactly one element, but it doesn't guarantee that every message will be mapped to a different output for every key - there will be exactly one k where f(m,k) == m.

I don't know what to call this property, but lets call it leftness and rightness, if an element is Left you add the round constant to it, if the element is Right you add the Ck derived constant to it.

When looking at the higher order algebraic complexity of the m component, e.g. fullratsimp(m[i] = ((m[i-1] + C[i-1])^5 + C[i])^5); the highest degree is 25. See: https://stackoverflow.com/questions/35855762/coefficients-of-polynomials-maxima - degree(expand(m[i] = ((m[i-1] + C[i-1])^5 + C[i])^5)) = 25 - however for two rounds, when fully expanded, the degree of LongsightF is 25 too, e.g. fullratsimp(((((x[1] + C[0])^5 + x[0]) + C[1])^5) + x[1]) - or have I misunderstood Lai's insights into higher-order differential cryptanalysis (most likely).

The circuit diagram for DoublevisionH_e is:

         m         k
         |         |
         |         |
          >--(+)--<     y[0] = m + k
         |    |    |
C_[0] --(+)   |    |    j[0] = m + C[0]
         |    |    |
         |    `---(+)   l[0] = k + y[0]
         |         |
       (n^5)       |    n[0] = j[0]^5
         |         |
         |       (n^5)  o[0] = l[0]^5
         |         |
          >--(+)--<     y[1] = n[0] + o[0]
         |    |    |
         |    |    |    result = y[1]
         |    |    |
-----------------------------------------------
         |    |    |
C_[1] --(+)   |    |    j[1] = m + C[1]
         |    |    |
         |    `---(+)   l[1] = k + y[1]
         |         |
       (n^5)       |    n[1] = j[1]^5
         |         |
         |       (n^5)  o[1] = l[1]^5
         |         |
          >--(+)--<     y[2] = n[1] + o[1]
         |    |    |
         |    |    |    result = y[2]
         |    |    |
-----------------------------------------------
         .    .    .
         .    .    . 

@HarryR
Copy link
Owner Author

HarryR commented Sep 1, 2018

So, to recap, the DoublevisionH_e function above serves as a bijection as f_k(x) -> y for every k, as long as the exponent of 5 is a bijection in the field. With 3 rounds and known k and round constants C_0 ... C_i the degree of the polynomial is 100 (after factoring), or more generally r^5 where r is the number of rounds. Truncating the output by operating over GF(p) turns this into a one-way function with high algebraic complexity. I'm implementing this as a proof of concept with 3 rounds, which will result in similar on-chain gas costs as the LongsightF12p5 function.

To use it as part of a compression function we can use it in a Davies-Meyer, Matyas-Meyer-Oseas or Miyaguchi-Preneel construction, respectively: H_i = f_{x_i}(H_{i-1}) + H_{i-1}, H_i = f_{H_{i-1}}(x_i) + x_i and H_i = f_{H_{i-1}}(x_i) + x_i + H_{i-1} - with the first (Davies-Meyer) being the most straightforward, although I don't like the idea of using direct user-input as the key and would like to avoid the possibility of fixed points, so would go with the third.

Note that, when providing the key k and round constants C_0 ... C_i the polynomial can be reduced to a monomial, e.g. (from Maxima), for round 1:

expt(m_0 + 378564636576870752552352071216473568935315231929115218508990\
97682097273997986, 5) + expt(m_0 + 9294622305742195711954698608399076750764640\
297311407608875685892893643741512, 5)

Which factors to:

2 (m_0 + 23575542981714635483594952865023216822148081745111464729887391\
                     4
787495458869749) (m_0  + 94302171926858541934379811460092867288592326980445858\
                            3
919549567149981835478996 m_0  + 5374284314830216493939598707823051481372314656\
635241550088761517473397463032729317765788229281972144134893270057451023746808\
                                  2
025352600250230542906831219696 m_0  + 1485758729894521400634515331401126425537\
690171697936210719496736129263981628487849852769574904886990353215073834135014\
551055753680711399269758295876852650235353889684198685863349194575226998456989\
38843536974802667129631317808378616 m_0 + 165042507182673763016556095236873674\
447382784804925755887430460628510749391020460911855219883671775925450057444153\
221562360652453504563078588635968865343716432931076923961995957085640020272173\
593576319612712601448009764772137919352847687128231520911524819402127149969685\
4419366393432696790157323128797260496)

So, the degree of the factored monomials for each round, assuming a known k and round constants:

Round Degree Factored Degree Factored/Degree Ratio Notes
0 1 1 1
1 5 4 0.8
2 25 20 0.8
3 125 100 0.8
4 625? 500? 0.8? Very expensive to compute

This shows that while the polynomial can be reduced, the degree can only be reduced by 20%, after which it is irreducible. A possible rule could be factored degree = r^5 - (r-1)^5 where r > 1?

Note that because the reduced polynomial, for one round, is in the form of:

f_a(x) = (a_0 * x^5) + (a_1 * x^4) + (a_2 * x^3) + (a_3 * x^2) + (a_4 * x) + a_5

This resembles the polynomial from Shamir's secret-sharing scheme, however for every subsequent evaluation the values of sequence a will change meaning no two messages with the same key are evaluated over the same set of a_0 ... a_i which would reveal the secret. If you swap the key with the message x will be the key and a will be derived.

If you look at the polynomial used in Hash127 by DJB:

f_{r,x}(m) = (r^{n+1} + (r^n * m_0) + (r^{n-1} * m_1) ... + x)

Where (r, x) are pre-agreed shared secrets, the round constants r are squared then x is added. So we know that in both cases multiple evaluations of the polynomial will not reveal the secrets.. ? However, it's noted that in Bernsteins Poly1305 he uses the same polynomial, then truncates it with another modulus.

Further notes on interpolation probabilities:

From DJB's paper on stronger security bounds for WCS authenticators, and it relates to the bijection tests for the doublevision function, on the last page under 'Interpolation probabilities, collision probabilities, differential probabilities'.

We calculate the collision probability, where for each k it's guaranteed to be 0, but we don't specify a probability of f_{k_1}(x_1) = f_{k_2}(x_2), where m for every k is bijective, but every m and k is surjective. Would this probability be p/(p^2) ?

We calculate a variant of the interpolation probability which shows that (f(x_0),f(x_i),...,f(x_{p-1})) = (y_0, y_1, ..., y_{p-1}) is unique for every (k,m) for sequences of size p, that is the interpolation probability is 0 with p points for the whole space of (k,m). But for point sequences of length n what is the probability? You would expect one sequence to be probability of a collision f_{k_1}(x_1) = f_{k_2}(x_2), but what about two or more?

Thirdly the differential probability, for any fixed g what is the probability of f(x_1) - f(x_2) = g ? I made an initial measure of this, it to occurs at the probability of 1/p for all (k,m) (a space of p*p), and 1 for each k and every m (a space of p).

@HarryR
Copy link
Owner Author

HarryR commented Sep 2, 2018

I think it's worth digging further into higher-order differential cryptanalysis for MiMC-p/p, MiMC-2p/p, DoublevisionH_e and LongsightL and LongsightF, where specifically we're looking at the complexity of the polynomials f_k(x_1) - f_k(x_2), f_{k_1}(x) - f_{k_2}(x) and f_{k_1}(x_1) - f_{k_2}(x_2)

@HarryR
Copy link
Owner Author

HarryR commented Sep 4, 2018

So, we can possibly say that the one-wayness of the problem for Doublevision relates to the discrete logarithm and quadratic residue problems, essentially factoring the sum of cubes.

Interesting references:

The problem is that many of the security assumptions in the papers above rely on using a prime factor which to generate a sub-group, furthermore the exponents are fixed.

The security assumption for the number of rounds for MiMC-p/p as a cipher comes from the computational hardness of determining the Lagrange form of the polynomial of a specific degree, at 160 rounds this is about 2**254.

For a more specific construction, e.g. the merkle tree authentication path, can this be weakened? Lets say we use the Miyaguchi–Preneel construction:

def compress_MiyaguchiPreneel(m_0, m_1, C, e, p):
  H_0 = LongsightL(IV, m_0, C, e, p)
  H_1 = LongsightL(H_0, m_1, C, e, p)
  return (H_0 + H_1 + m_0) % p

The Lagrange form stops becoming as much of a concern for each step as the key changes for every m_0. When chained in-sequence, e.g. in a Merkle tree path with all the address bits fixed to either 1 or 0, would you be able to use the Lagrange form to identify the root for any given input and a specific set of path elements? shrugs And would it be possible to construct the Lagrange form of the inverse?

Anyway, if we use a different IV and different round constants at every level, and make sure that the full path is of sufficiently high degree, the only problem remains that the LongsightL (and F variant) functions are invertible - in that you can truncate the whole Merkle authentication path with control over both arguments. For this reason the Miyaguchi–Preneel construct above is used, avoiding fixed points etc. and where only 1 argument can be used for each function call.

Using this method, and using native field elements for the inputs instead of bits, we can keep the number of constraints for the whole Merkle tree authentication path to say ... under 1500. I will need to run some quick tests to verify that the LongsightL function is bijective etc.

@HarryR HarryR closed this as completed Sep 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant