Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to optimize inner-product proof #89

Open
hdevalence opened this issue May 8, 2018 · 11 comments
Open

Try to optimize inner-product proof #89

hdevalence opened this issue May 8, 2018 · 11 comments
Labels
E-hard P-low T-optimization Making things faster/smaller T-research Open, unsolved problems

Comments

@hdevalence
Copy link
Contributor

The inner-product proof spends lots of time in this computation https://github.com/chain/ristretto-bulletproofs/blob/main/src/inner_product_proof.rs#L102-L103 performing double-base scalar multiplications.

The verifier's side inlines all of the computation into a single check. This isn't possible for the prover, which has to produce the L, R points, but (looking quickly) it seems like it should be possible for the prover to only do one multiscalar mul per L and R, and drop all of the double-base scalar muls.

Possibly supersedes #80

@hdevalence hdevalence added T-optimization Making things faster/smaller P-low E-medium labels May 8, 2018
@oleganza
Copy link
Collaborator

oleganza commented May 8, 2018

Seems like it's not the usual multiscalar mul (which produces one point), but a multi-point multiscalar mul. Fully general Pippenger algorithm covers that case (that is, you can compute independent exponentiations faster with a Pippenger's algorithm, than doing them one by one), but we have never needed or implemented it yet.

@oleganza
Copy link
Collaborator

oleganza commented May 8, 2018

Disclaimer: I yet have no intuition on how that sequence of independent multiplications can be computed faster than one multiplication at a time. In this case points are all different, only scalars are the same.

@hdevalence
Copy link
Contributor Author

What I was thinking of was to try collapsing the L, R computations into a single multiscalar multiplication for each, in the same way that the verifier collapses them when performing verification: it does not compute the final G, H points via the iterative process, but collapses it into a single equation involving the original \vec G, \vec H.

However, this would mean that the prover would compute 2k multiscalar multiplications of full size 2n+1 each, whereas currently the prover's work gets smaller in the lower stages. For instance, here's the number of multiscalar multiplications used in each strategy:

current
{2: 252, 5: 2, 9: 2, 17: 2, 33: 2, 65: 2, 129: 2}
collapsed
{129: 12}

so this would actually end up doing more work.

The goal is to avoid having to compute all of the double-base scalar multiplications, but I think this idea doesn't quite work out.

@oleganza
Copy link
Collaborator

@hdevalence do you think this is still worth experimenting with, or should we close the issue?

@hdevalence
Copy link
Contributor Author

There's no milestone attached, so we can leave this open and have it be a point of accumulation for ideas about the inner-product proof.

@hdevalence
Copy link
Contributor Author

Benedikt mentioned that the libsecp implementation does a hybrid strategy, jumping a few steps at once (without shrinking), then cutting the sizes of the vectors. Could be worth exploring, at some point in the future, although at the moment we're still faster on the IPP proof for other reasons.

@hdevalence hdevalence added the T-research Open, unsolved problems label Dec 3, 2018
@hdevalence hdevalence changed the title Try to optimize inner-product proof by collapsing scalar multiplications Try to optimize inner-product proof Dec 11, 2018
@hdevalence
Copy link
Contributor Author

Another strategy would be to try to speed up the double-base scalar multiplications by vectorizing across points. This would require a change in the upstream curve library, and I'm not sure what kind of API would be useful generally (instead of just in this one weird protocol).

@psivesely
Copy link

I observed that you can half the number of multiplications required to scale the witness and generator vectors in each round by computing a' = a_L + u * a_R and b' = b_L + u_inv * b_R (and similarly for G' and H'). There's a proof soundness is preserved in our recent paper https://eprint.iacr.org/2019/1177.pdf. Specifically, we prove computational witness-extended emulatability of our generalized inner product argument (GIPA) that uses this optimization, and since Bulletproofs is an instantiation of GIPA security follows.

Another thing to look at is choosing the challenges u to be sparse and/or complex (https://eprint.iacr.org/2005/276.pdf), which can significantly reduce the number of additions it takes to compute each multiplication. Easier to implement is choosing small exponents, i.e.., truncating the RO output to 128-bits. Though you don't get as much of a speedup as with sparse/ complex exponents. The advantage of sparse exponents versus small exponents grows with the gap between the security parameter and the scalar field size--so the advantage is greater for pairing-friendly curves. I can't say anything about complex exponents.

Note that if the verifier wishes to do a MSM the final scalars will no longer be sparse/small except for perhaps for very small instance sizes. I suspect there exists a threshold instance size where the O(n/log n) speedup of Pippenger's provides a greater verifier speedup than the constant speedup the verifier would gain if they forgo the MSM and instead rescale the generators using the sparse/small/complex exponents as the prover must.

@kevaundray
Copy link
Contributor

I observed that you can half the number of multiplications required to scale the witness and generator vectors in each round by computing a' = a_L + u * a_R and b' = b_L + u_inv * b_R (and similarly for G' and H'). There's a proof soundness is preserved in our recent paper https://eprint.iacr.org/2019/1177.pdf. Specifically, we prove computational witness-extended emulatability of our generalized inner product argument (GIPA) that uses this optimization, and since Bulletproofs is an instantiation of GIPA security follows.

Another thing to look at is choosing the challenges u to be sparse and/or complex (https://eprint.iacr.org/2005/276.pdf), which can significantly reduce the number of additions it takes to compute each multiplication. Easier to implement is choosing small exponents, i.e.., truncating the RO output to 128-bits. Though you don't get as much of a speedup as with sparse/ complex exponents. The advantage of sparse exponents versus small exponents grows with the gap between the security parameter and the scalar field size--so the advantage is greater for pairing-friendly curves. I can't say anything about complex exponents.

Note that if the verifier wishes to do a MSM the final scalars will no longer be sparse/small except for perhaps for very small instance sizes. I suspect there exists a threshold instance size where the O(n/log n) speedup of Pippenger's provides a greater verifier speedup than the constant speedup the verifier would gain if they forgo the MSM and instead rescale the generators using the sparse/small/complex exponents as the prover must.

Does this differ from the improved inner product in https://eprint.iacr.org/2019/944.pdf ?

@psivesely
Copy link

It seems like they make the same observation on w.r.t. halving the multiplications, but provide some different ideas on how to sample the challenges.

@daira
Copy link

daira commented Mar 19, 2020

Another way to sample the challenges, for curves that support a cubic endomorphism, is given by Algorithm 1 in the Halo paper. See also the proof of correctness in Appendix C. It's optimized for use in a circuit but might be useful outside, too (with projective coordinates instead of affine, probably).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
E-hard P-low T-optimization Making things faster/smaller T-research Open, unsolved problems
Projects
None yet
Development

No branches or pull requests

5 participants