Skip to content
This repository has been archived by the owner on Dec 9, 2021. It is now read-only.

Compare/incorporate existing Montgomery modmul implementations #69

Open
5 tasks
unzvfu opened this issue Aug 9, 2020 · 2 comments
Open
5 tasks

Compare/incorporate existing Montgomery modmul implementations #69

unzvfu opened this issue Aug 9, 2020 · 2 comments
Assignees

Comments

@unzvfu
Copy link
Collaborator

unzvfu commented Aug 9, 2020

There are a few existing open Montgomeral modmul implementations which might (i) form a good/better basis for our improvements in Plonky, or (ii) in any case, they should at least give some ideas for improvements that we might not have thought of. We are initially interested in the Rust implementation; assembly implementation will come later.

Some implementations to consider:

@unzvfu unzvfu added this to the Faster base field arithmetic milestone Aug 9, 2020
@unzvfu unzvfu self-assigned this Aug 9, 2020
@dlubarov
Copy link
Contributor

👍. I originally did interleaved Montgomery multiplication because it seemed popular in the literature (e.g. here), but that might have been a mistake, since separate mul and reduce methods would make it easier to optimize squaring and skip certain reductions.

Another option (not that we need to study them all) is curve25519-dalek's [u64; 5] backend. It seems relatively fast, and they also have a SIMD backend.

@unzvfu
Copy link
Collaborator Author

unzvfu commented Aug 11, 2020

Oh, I had forgotten about curve25519-dalek, thanks for reminding me! Added to the list. They presumably use the fast reduction for the modulus 2^255-19, which will be relevant wrt #71.

Generally the interleaved Monty modmul is a bit faster than mult+REDC when you're in the range where 'schoolbook multiplication' is fastest. Definitely not a mistake! If you have bigger numbers, ones where you'd want to use Karatsuba or FFT multiplication (probably not relevant for us), or if you want to take advantage of fast squaring (as we do), then mult/sqr + REDC is the way to go. The question for us will be whether the 'fast squaring' trick is actually faster with the fairly small numbers we're dealing with. [Fast squaring can work with CIOS; see updated comments on #70.]

Also worth noting that fast reduction mod 2^b + c doesn't (necessary) need Monty representation at all.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants