Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preliminary commit for min const generics #182

Closed
wants to merge 12 commits into from

Conversation

jon-chuang
Copy link
Contributor

@jon-chuang jon-chuang commented Jan 18, 2021

Description

To do:

  • match to select literal loop-unrolled impl for square/mul (= very verbose macro-expanded source)...
  • Get rid of weird artifacts due to [T; N * 2] where N is generic being impossible

Tests should pass on nightly 1.51.

Thoughts:

  • limitations still exist for min const generics, especially for our literal-dependent loop unrolling use case, and also const eval with generics. Currently, this has been worked around
  • Not sure how much real gains there are, yes fewer macros, but many breaking changes for imports and for many copy-pasta like BigInt::<4>([0, 0, ..])

Update:
TODO:

  • Check for performance regressions

Thoughts:

  • Currently, mul_without_reduce is instantiated using loops that are not forced to be unrolled, so it is unclear what the compilers behaviour will be. It is only used in runtime code when NO_CARRY is false however. So I think its not a big issue. Still.

Currently, this is how mul_assign looks macro expanded:

 impl<'a, P: FpParams<N>, const N: usize> MulAssign<&'a Self> for Fp<P, N> {
                #[inline]
                fn mul_assign(&mut self, other: &Self) {
                    if P::NO_CARRY {
                        let input = &mut (self.0).0;
                        let other_ = (other.0).0;
                        match N {
                            1 => mul_assign_id1::<P, N>(input, other_),
                            4 => mul_assign_id4::<P, N>(input, other_),
                            5 => mul_assign_id5::<P, N>(input, other_),
                            6 => mul_assign_id6::<P, N>(input, other_),
                            12 => mul_assign_id12::<P, N>(input, other_),
                            13 => mul_assign_id13::<P, N>(input, other_),
                            _ => {
                                ::core::panicking::panic("internal error: entered unreachable code")
                            }
                        };
                        self.reduce();
                    } else {
                        *self = self.mul_without_reduce(other, P::MODULUS, P::INV);
                        self.reduce();
                    }
                }
            }

So, rather neat. But notice we define functions for each n limbs. Since we can control exactly which numbers we instantiate for via the limb_instantiation!(1, 4, 5, 6, 12, 13); macro, I think unsavoury code bloat is substantially reduced.


Before we can merge this PR, please make sure that all the following items have been
checked off. If any of the checklist items are not applicable, please leave them but
write a little note why.

  • Targeted PR against correct branch (main)
  • Linked to Github issue with discussion and accepted design OR have an explanation in the PR that describes this work.
  • Wrote unit tests
  • Updated relevant documentation in the code
  • Added a relevant changelog entry to the Pending section in CHANGELOG.md
  • Re-reviewed Files changed in the Github PR explorer

#[inline]
#[ark_ff_asm::unroll_for_loops]
fn into_repr(&self) -> $BigIntegerType {
// #[ark_ff_asm::unroll_for_loops]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you feel like the compiler can still optimize this well? Or should we try to do some manual unrolling inside the macro?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, for now, since we can no longer take advantage of literals in macro instantiations, I've decided one should have match arms and do the loop unroll for each of the possible num limbs.

This is extremely ugly, and I'm not sure if there's a workaround; the only plausible workaround for unrolling loops parametrised by const is proposal for compiler hints #[optimize(..)] that is not moving atm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, basically, this is not the final form

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One can make the actual impl very compact, but under the hood, with macro expand etc it will be very ugly, just like our asm code selecting the num_limbs match arms

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I will simply modify the proc macro, exposing a new one that creates the match arms up to some limit, which can be a meta attr that can be parsed as usize for the match limit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, for a specific problem, which is copying the data back over in each of the match arms, since there would be a type mismatch, it might make sense to do an unsafe transmute of a pointer; however, this would violate our desire to have safe code.

ff/src/biginteger/mod.rs Outdated Show resolved Hide resolved
ff/src/biginteger/mod.rs Outdated Show resolved Hide resolved
@Pratyush
Copy link
Member

Amazing, thanks for this!

* match to select literal loop-unrolled impl for square/mul (= very verbose macro-expanded source)...

Maybe we can use techniques from the C++ world for this? Eg: https://www.informit.com/articles/article.aspx?p=30667&seqNum=7

It could get complicated though.

* Get rid of weird artifacts due to `[T; N * 2]` where N is generic being impossible

Hmm that's unfortunate; it's due to this issue: rust-lang/rust#43408

* Not sure how much real gains there are, yes fewer macros, but many breaking changes for imports and for many copy-pasta like `BigInt::<4>([0, 0, ..])`

Can't the breaking changes be worked around by still exporting {BigInteger,Fp}{256/320/384...}? Also, there would be nice gains in compile time, as the compiler is simply doing much less work.

};
use ark_serialize::*;

pub trait FpParams<const N: usize>: FpParameters<BigInt = BigInt<N>> {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can get rid of this trait now, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, for now, I use it to instantiate NO_CARRY as a const param. Since otherwise, it was troublesome to ensure it would evaluate at compile time.

@jon-chuang
Copy link
Contributor Author

jon-chuang commented Jan 21, 2021

Maybe we can use techniques from the C++ world for this? Eg: https://www.informit.com/articles/article.aspx?p=30667&seqNum=7

It could get complicated though.

Well for this, Im not sure if it will work; notice the metaprogramming examples use literals as the param. I think something a lot of people would like is the ability to do metaprogramming based on const. Unfortunately const is something evaluated at MIR level, so there is no longer any syntax to do metaprogramming with. Still, it would be really cool to have functionality to do metaprogramming at MIR level.

Can't the breaking changes be worked around by still exporting {BigInteger,Fp}{256/320/384...}? Also, there would be nice gains in compile time, as the compiler is simply doing much less work.

We could. But, one wonders if it is better to make the breaking change; personally I'm torn about this.

@jon-chuang
Copy link
Contributor Author

So I would say at this point that Im quite happy with state of PR, except for having checked for performance regressions and careful review. Also the number of unresolved questions about API and so on. But the core of the code is looking pretty good if you ask me.

@jon-chuang
Copy link
Contributor Author

jon-chuang commented Jan 21, 2021

Here is some preliminary data for the performance regression tests.

There appears to be some problems with the serialization, and more pertinently, the into_repr implementations. Based on old benchmarks however - #80 and arkworks-rs/snark#176, it does seem that into_repr is doing just fine, and the old impl is too fast - which implies something nefarious might be going on.

Apart from that, things seem alright.
Screenshot from 2021-01-22 03-00-32


So I discovered with serialize/deserialize, and discovered that using vec! instead of array is causing the slowdown, now it is fixed.

So perf wise, I would say there are no issues.

@jon-chuang jon-chuang closed this Jan 22, 2021
@jon-chuang jon-chuang mentioned this pull request Jan 22, 2021
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants