Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase number of rounds for MiMC (or LongsightL) #87

Closed
HarryR opened this issue Nov 25, 2018 · 4 comments · Fixed by #105

Comments

@HarryR
Copy link
Owner

commented Nov 25, 2018

So, I should've originally listened to Daira, as she's a beacon of wisdom and insight, and further research has shown that MiMC-p/p with 12 rounds is very insecure.

This ticket aims to address the shortcomings of the current implementation and making it 'secure' with some small tweaks.

Define LongsightMP-R/p/e as Miyaguchi–Preneel applied to MiMC-R/p/e. We need R = log7(p) ~= 91 rounds for the non-Feistel variant of MiMC-R/p/7. Raising to the power 7 requires 4 constraints. So, for the BLS12-381 p,

  • LongsightMP-91/p/7 requires 728 constraints.
  • LongsightL-91/p/7 requires 1092 constraints.
    (For reference, the insecure LongsightF-322/p/3 required 644 constraints.)

Exponent of 5 is a bijection over F_p for the altbn scalar field used by Ethereum, need to verify if exponent of 7 is a bijection also.

With exponent=7, each round requires 4 constraints for exponentiation:

def round_e7(x, p):
    a = (x*x) % p # x^2
    b = (a*a) % p # x^4
    c = (a*b) % p # x^6
    return (c*x) % p# x^7

Where 91 rounds requires 364 constraints when using a round-constants.

R = 91 == ceil(log(p) / log(7))
R = 161 == ceil(log(p) / log(3))

The round constants for the keys can be generated as a hash stream using keccak256.

def round_constants(seed, p, R):
    for _ in range(R):
        seed = H(seed)
        yield seed

Any personalisation must change the initial seed, to make it derive a unique sequence of round constants. The constants are used to personalise the use of the key.

Note that for the first round the constant is zero, and for the last round the key is appended to the result, this matches the following implementations and is required to ensure the result is bijective:

Pseudocode:

def mimc(k, x, seed, p, e, R=91):
    for c_i in [0] + round_constants(seed, p, R - 2):
        a = (x + k + c_i) % p
        x = (a ** e) % p
   return (x + k) % p

The Miyaguchi–Preneel one-way construct can be implemented as a linear combination between rounds, returning a linear combination with the previous result, requiring no additional constraints.

def mimc_pe7_mp(k, x, seed, p, e, R=91):
    for x_i in x:
        k = (k + x_i + mimc(k, x_i, seed, p, e, R)) % p
        yield k
  • Rename to MiMC_MP? when used with Miyaguchi–Preneel compression function
  • Verify exponent of 7 is a bijection over F_p (for the altbn scalar field)
  • Use exponent of 7 for Longsight MP, with 91 or 46 rounds
  • Make computation of round constants automatic rather than pre-computed (applies to both C++ and Solidity versions)
  • Gas benchmark for Solidity implementation.

@HarryR HarryR created this issue from a note in November 2018 (ToDo) Nov 25, 2018

@HarryR

This comment has been minimized.

Copy link
Owner Author

commented Nov 29, 2018

I have tried to improve the performance of MiMC-p/p on EVM, and have provided reference implementations in Python and Solidity which I will then verify against the other implementations and the paper.

https://gist.github.com/HarryR/80b5ff2ce13da12edafda6d21c780730

In this implementation the H function is keccak256 and the key is used to derive the round constants sequence rather than using a separate seed specifically for the constants.

I recommend using exponent 7 with 46 rounds, for an acceptable security/performance trade-off.

Gas usage:

exponent rounds gas avg per round poly degree log2(d)
7 100 20580 206 7**99 278
7 91 18735 206 7**90 252
7 46 9510 207 7**45 126
7 4 900 225 7**3 8
7 3 695 232 7**2 5
5 110 21126 192 5**109 253
5 100 19216 193 5**99 230
5 55 10621 193 5**54 125
5 4 880 220 5**3 6
5 3 689 229 5**2 4

Exponent of 7 is slightly more expensive, but requires fewer rounds.

@jbaylina

This comment has been minimized.

Copy link

commented Nov 30, 2018

For reference:

Here is a 100% EVM assembly version of MiMC Smart contract with exponent 7 that takes less than 100gas/round: https://github.com/iden3/mimc

@HarryR HarryR removed this from ToDo in November 2018 Dec 3, 2018

@HarryR HarryR added this to To do in December 2018 Dec 3, 2018

@HarryR HarryR moved this from To do to In progress in December 2018 Dec 5, 2018

@HarryR

This comment has been minimized.

Copy link
Owner Author

commented Dec 8, 2018

@jbaylina suggested that the hash for the base points can be computed separately in order to allow parallel computation, this is relevant to Pedersen Hashes too as we want to re-use as much code as possible for 'constant generation'. He had reservations about using a hash-chain to generate a sequence of constants, in that it wouldn't allow for parallel computation.

In the example code above, round constants are generated using a hash-chain:

def round_constants(seed, p, R):
    for _ in range(R):
        seed = H(seed)
        yield seed

This creates a sequence where 0=H(seed), 1=H(H(seed)) etc.

@jbaylina was suggesting a scheme (and implemented it, for Pedersen hashes) where the seeds for the Y coordinates are generated by making a string of the following components then hashing them (using blake2):

  • Name / Personalisation string
  • Sequence offset
  • Try offset (e.g. when Y coordinate doesn't result in a square X)

See: https://github.com/iden3/circomlib/blob/98a33d5700bee1fe026badd2baf67f7a58429c0e/src/pedersenHash.js#L62

    while (p==null) {
        const S = GENPOINT_PREFIX + "_" + padLeftZeros(pointIdx, 32) + "_" + padLeftZeros(tryIdx, 32);
        const h = createBlakeHash("blake256").update(S).digest();
        h[31] = h[31] & 0xBF;  // Set 255th bit to 0 (256th is the signal and 254th is the last possible bit to 1)
        p = babyJub.unpackPoint(h);
        tryIdx++;
    }

In this implementation a new string is created for every try for every sequence and try.

Whereas ethsnarks uses https://github.com/HarryR/ethsnarks/blob/master/ethsnarks/jubjub.py#L180:

# in base point generation:
data = b"%-28s%04X" % (name, i)
return Point.from_hash(data)

def from_hash(cls, entropy):
	entropy = sha256(entropy).digest()
	entropy_as_int = int.from_bytes(entropy, 'big')
	y = FQ(entropy_as_int)
	while True:
		try:
			p = cls.from_y(y)
		except SquareRootError:
			y += 1
			continue
                        # ...
		return p

This creates a hash for each base point sequence, then increments its big-endian interpretation until an X coordinate can be recovered from the Y. I think my implementation of from_hash is more likely to recover an X coordinate with less average tries than if a new hash is generated at each try - but only marginally. The two implementations are similar, apart from the 'increment Y until matching X is found' part - which for Ethsnarks is designed to be Ethereum compatible (even though modulo square root probably isn't affordable on EVM).

The part which is necessary for Ethereum, IMO, is re-using the same components for MiMC round-constant generation and Pedersen hash base point generation. It's very cheap to implement the round_constants mechanism described above, generating a hash chain using keccak256 from an initial seed is faster and cheaper than any alternative.

Regarding parallelism, I think it's fair to create a sequence of hashes which are used as starting points, while this isn't parallel on its own - it is fast to generate the sequence which can then be made parallel for every individual implementation.

IMO the way forward is to use the same round-constant generation mechanism for MiMC and Pedersen hash, for pedersen hash each constant in the sequence is the starting point for recovering the X from the Y, and for MiMC it's used verbatim.

As far as 'nothing up my sleeve' numbers go, for generating the base points the mechanism implemented by ethsnarks (where the Y coordinate is incremented until a resulting point is found) matches the literature and existing implementations, and the hash sequence relies more heavily on the random-oracle-model of non-determinism.

To use the same sequence generator across EVM and Pedersen hash, the constraints are:

  • initial seed is an arbitrary string
  • use keccak256 hash to turn the initial seed into the first constant of the sequence
  • interpret hash outputs as big-endian 256bit integers
  • use the previous sequence output as the input to generate the next round constant
  • use modulo p, rather than bit masking or truncation, to move the hash output onto the scalar field

For pedersen hash base point generation:

  • start with constant generated from sequence as Y
  • try to recover X coordinate
  • if X is not square, increment Y
  • repeat until X is square
@HarryR

This comment has been minimized.

Copy link
Owner Author

commented Dec 8, 2018

TODO:

  • Implement round-constant sequence generation for keccak256 in C++
  • Use new round-constant sequence for pedersen hash base point generation
  • Re-generate test cases using new base points
  • Use new round-constant sequence for MiMC round constants
  • Use improved MiMC Solidity implementation
  • Re-generate test cases using new MiMC round constants
  • Verify it works etc.

@HarryR HarryR added this to In progress in January 2019 Jan 10, 2019

@HarryR HarryR removed this from In progress in December 2018 Jan 10, 2019

@HarryR HarryR closed this in #105 Jan 17, 2019

January 2019 automation moved this from In progress to Done Jan 17, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
2 participants
You can’t perform that action at this time.