Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: replace rug integer and upgrade curve operations #29

Closed
drcapybara opened this issue Oct 27, 2023 · 14 comments
Closed

fix: replace rug integer and upgrade curve operations #29

drcapybara opened this issue Oct 27, 2023 · 14 comments
Assignees
Labels
improvement Improve the design, usability, safety, and/or efficiency of the library Larger Project Larger project sizing affecting multiple areas of the library. Not difficult, but more to consider. literature review Interpret journal paper into code! Way more fun than it sounds :D side-channel Security risk that we should patch up ASAP

Comments

@drcapybara
Copy link
Owner

drcapybara commented Oct 27, 2023

This crate uses used the Rug integer FFI to the GMP library. This gives us blazing fast speeds on integer calculations, but it comes at the cost of not being able to effectively control ownership of the integers when they move out of scope during normal arithmetic operations. Here's an example of some of our problematic code from the edwards curveadd formula:

// (x₁y₂ + y₁x₂)
// all values are assumed to be rug::Integer
let x1y2 = (x1.clone() * y2.clone()) % p.clone();
let y1x2 = (y1.clone() * x2.clone()) % p.clone();
let x1y2y1x2_sum = (x1y2 + y1x2) % p.clone();

Whats happening?
Because every operation *, %, etc is technically a call to a function, the value we send it is going out of scope and thus ownership has changed. as an FFI, rug doesnt have ownership rules in the same way that native rust types do. So the values are dropped when we operate on them. heres an example:

let x1y2 = (x1.clone() * y2.clone()) % p;
let y1x2 = (y1.clone() * x2.clone()) % p; //this fails because p went out of scope above

The problem is that when you are multiplying a curve point by a scalar value that can be 64 bits long in the secure case, these deep copies add up quickly, and frankly seeing so many clones is kind of a bummer and means maybe you arent getting along with the borrow checker so well.

We need an arbitrary precision library that:

  1. Doesnt deep copy on arithmetic operations i.e. gets rid of this excessive cloning
  2. Ideally provides side-channel resistance by using constant time operations if possible
  3. Provides a way to easily test bits at various indices to facilitate the montgomery ladder.
  4. Still performs blazing fast. The whole reason we dropped rust bigint to begin with was because it was found to be in our case orders of magnitude slower than Big in golang for the same construction.
  5. One of the more important items would be to maintain overloaded operators between scalars and curve points. i.e., as much as possible, we want to retain this syntax:
let s = U256::from(b'FFFFFF');
let p = EdCurvePoint::generator();
let product = s * p

Why point 5? because i have a burning hole in my heart that can only be filled overloaded operators

Enter the crypto_bigint and fiat_crypto crates. Points 1 and 2 are addressed trivially. 3 and 4 and 5 will require investigation which is the whole point of this ticket. Replacing the rug crate might will require significant overhaul of the EC point operations and probably the entire library. But hey what else is new. This will enhance the library by:

  1. Eliminating reliance on possibly unsafe FFIs. Doubtful anyone worth their salt in this realm is using rug for anything serious (sorry rug)
  2. Ensuring things are still really fast and awesome
  3. Constant time side-channel resistance will be conferred to large parts of the library as a result. This is a gigantic win because the library has a special emphasis on side-channel resistance and best implementation practices. Here's our shot to take a crack at doing this the "right" way or as close as we can approximate it.

It might be a doozy but its well-motivated and worth the effort in my opinion.

@drcapybara drcapybara added side-channel Security risk that we should patch up ASAP Medium Effort Nothing too crazy. Maybe some new concepts and design patterns. 1 - 2 weeks of casual effort. improvement Improve the design, usability, safety, and/or efficiency of the library labels Oct 27, 2023
@drcapybara
Copy link
Owner Author

drcapybara commented Oct 28, 2023

And it looks like we're able to overload operators between a U576 scalar and a hypothetical curve point without any issues:

use crypto_bigint::{U576, Wrapping, Encoding};
use std::ops::Mul;

#[derive(Debug, Clone, Copy)]
struct CurvePoint {
    x: U576,
    y: U576,
}

impl Mul<U576> for CurvePoint {
    type Output = Self;

    fn mul(self, scalar: U576) -> Self {
        Self {
            x: (Wrapping(self.x) * Wrapping(scalar)).0,
            y: (Wrapping(self.y) * Wrapping(scalar)).0,
        }
    }
}

fn main() {
    let hex_string = "1FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF";
    
    let buffer = push_bytes_into_buffer(hex_string);
    
    // Convert the buffer to a U576 number
    let number = U576::from_be_bytes(buffer);

    let point = CurvePoint { x: number, y: number };
    let scalar = U576::from_be_bytes(buffer);

    let result = point * scalar;

    println!("Result: {:?}", result);
}

@drcapybara
Copy link
Owner Author

drcapybara commented Oct 28, 2023

I made a special point of bit testing in our asks because it can be surprisingly non-trivial in some cases and crypto_bigint does not disappoint: This code explicitly calls the Into conversion for CtChoice, ensuring that the conversion is performed in a way that maintains constant-time guarantees:

// simple test of true/false
if scalar.bit(0).into() {
    println!("test bit = 0")
}
// To negate the same test, the compiler needs the type, so
// this is explicitly using the Into<bool> trait implementation for CtChoice 
// to convert the result of scalar.bit(0) into a bool. 
// This is a more explicit way of doing what .into() does above
// but the compiler demands it to force the explicit choice of a constant time operation
if !(<CtChoice as Into<bool>>::into(scalar.bit(0))) {
    println!("test bit != 0")
}

I just wanted to test some bits man Im not lookin for any trouble

@drcapybara
Copy link
Owner Author

drcapybara commented Oct 31, 2023

Moving to crypto bigint would also likely resolve this issue since we would no longer be relying on rug to determine the number of steps for the montgomery ladder. :

rug:

 for i in (0..=s.bits()).rev()

crypo_bigint offers the same functionality but will need to be tested:

let num = x1.bits().reverse_bits();

with reverse bits being a core trait so need to figure out if it also has the leading zeros issue (confirmed it does not)

continuing to rely on rug remains a security liability for this library.

@drcapybara
Copy link
Owner Author

drcapybara commented Nov 3, 2023

Update: it took awhile to parse crypto_bigint, the documentation leaves a lot to be desired in terms of usage examples, but lets see if we can start to fix up some of our problematic code. We have this block from the edwards curve add formula using rug:

// (x₁y₂ + y₁x₂)
let x1y2 = (x1.clone() * y2.clone()) % p.clone();
let y1x2 = (y1.clone() * x2.clone()) % p.clone();
let x1y2y1x2_sum = (x1y2 + y1x2) % p.clone();

Can we fix this up with crypto_bigint?

Step 1. Become familiar with wide multiplication. Lets define a modulus (this is actually the modulus from E448 in our library) using crypto_bigint impl_modulus macro:

impl_modulus!(
Modulus,
U448, "FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF"
);

Step 2: Lets define some numbers a and b to multiply by each other. In this case, lets just reuse the modulus, so we are in effect just squaring it:

let a = U448::from_be_hex("FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF");
let b = U448::from_be_hex("FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF");

Step 3: square the modulus by using mul_wide on a and b:

let c: (Uint<7>, Uint<7>) = a.mul_wide(&b);

This gives us:

c lower: 0000000000000000000000000000000000000000000000000000000200000000000000000000000000000000000000000000000000000001
c upper: FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFDFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

for squaring, we could also have used:

let d = a.square_wide();

Step 4: reduce. This step is equivalent to ab mod p, but using the highly efficient mongtomery reduction:

let reduction: Uint<7> = montgomery_reduction::<{ Modulus::LIMBS }>(
    &c,
    &Modulus::MODULUS,
    Modulus::MOD_NEG_INV,
);

resulting in:

reduction: 0

which holds because p ^ 2 mod p is indeed zero

Step 5: put it all together:

// (x₁y₂ + y₁x₂)
let x1y2 = montgomery_reduction(&x1.mul_wide(&y2), &Modulus::MODULUS, Modulus::MOD_NEG_INV);
let y1x2 = montgomery_reduction(&y1.mul_wide(&x2), &Modulus::MODULUS, Modulus::MOD_NEG_INV);
let x1y2y1x2_sum = x1y2.add_mod(&y1x2, &Modulus::MODULUS);

@drcapybara
Copy link
Owner Author

drcapybara commented Nov 3, 2023

Observations:

  1. excessive cloning eliminated, and not soon enough.
  2. readability not necessarily improved. this is less of a concern because this function does not directly face the caller. we want curve/scalar multiplication to remain neat and readable, which seems likely given the sample a few comments back. We can come back and add macros or type wrappers and implement overloaded operators later if we really want to, but for now, this more than suffices for our purposes. (to the uninitiated, readability and cryptography are not really known to be bedfellows anyways)
  3. we might need to drop support for the E521 curve. The reason for this is there is no U521 type, and squeezing E521 into U576 is cumbersome. There are types for all of our other curves, (222, 382, and 448). Reasons why dropping E521 is probably fine:
  • E521 aka ~260 bit security is already much higher than required for most purposes. E448 (goldilocks) offers a great balance between security and performance.
  • Discarding an extreme security curve that few are likely to need is well worth the tradeoff of removing rug, so I propose we should drop it for now.

@drcapybara drcapybara added the literature review Interpret journal paper into code! Way more fun than it sounds :D label Nov 4, 2023
@drcapybara
Copy link
Owner Author

to no one's surprise this has now become a literature review. as we tumble deeper down the rabbit hole, Ive started to convert our add forumla from affine form into twisted edwards form. This projects coordinate pairs from (x, y) into (x, y, z). The advantage of this is that it greatly simplifies the add formula and reduces the number of operations we need to carry out. We may consider just leaving all EC operations in projected form. We arent interacting with other libraries and so it might be advantageous to leave it this way. Anyways, Im going to proceed implementing this formula (page 12) :

Addition in Projective Twisted Coordinates:
A = Z₁ ⋅ Z₂
B = A²
C = X₁ ⋅ X₂
D = Y₁ ⋅ Y₂
E = dC ⋅ D
F = B − E
G = B + E
X₃ = A ⋅ F ⋅ ((X₁ + Y₁) ⋅ (X₂ + Y₂) − C − D)
Y₃ = A ⋅ G ⋅ (D− aC)
Z₃ = F ⋅ G

@drcapybara
Copy link
Owner Author

drcapybara commented Nov 7, 2023

This repo more clearly defines the strategy we need in order to take advantage fast addition and scalar multiplications:

Strategy

The main strategy for group arithmetic on Ed448-Goldilocks is to perform the 2-isogeny to map the point to the Twisted-Goldilocks curve, then use the faster Twisted Edwards formulas to perform scalar multiplication. Computing the 2-isogeny then the dual isogeny will pick up a factor of 4 once we map the point back to the Ed448-Goldilocks curve, so the scalar must be adjusted by a factor of 4. Adjusting the scalar is dependent on the point and the scalar. More details can be found in the 2-isogenous paper.

Steps

  1. perform the 2-isogeny to map the point to the Twisted-Goldilocks curve
  2. use the faster Twisted Edwards formulas to perform scalar multiplication
  3. Compute the 2-isogeny then the dual isogeny back to Ed448-goldilocks, adjust scalar as needed

image

@drcapybara drcapybara changed the title fix: remove rug integer and move to crypto_bigint fix: replace rug integer and upgrade curve operations Nov 8, 2023
@drcapybara drcapybara added Larger Project Larger project sizing affecting multiple areas of the library. Not difficult, but more to consider. and removed Medium Effort Nothing too crazy. Maybe some new concepts and design patterns. 1 - 2 weeks of casual effort. labels Nov 8, 2023
@drcapybara
Copy link
Owner Author

drcapybara commented Nov 8, 2023

Update on required items to implement the above steps and isogenous transformations, following the patterns of dalek:

define a scalar type:

It should support all of the usual operations, plus:

  • montgomery multiplication with a reduction step
  • division by four to be used in the 2-isogeny mapping from standard edwards form into twisted form - todo
  • scalar mod 4 * P - also used for isogeny

Montgomery multiplication + reduction:

this should do just fine, tried it with a few test cases and appears to produce correct results:

montgomery_reduction(&a.mul_wide(&b), &Modulus::MODULUS, Modulus::MOD_NEG_INV);

scalar mod 4 * P

picked up this neat little trick:

// Compute (scalar mod 4)
let s_mod_four = scalar[0] & 3;

This works because & 3 zeroes out everything except the two leading bits which are the residues of a mod 4 operation. super cheap way to compute a remainder on a known power of 2

Field Element

For this I have discovered the fiat_crypto crate, providing formally verified field arithmetic in constant time. We will use this for field elements instead of crypto-bigint for a few reasons:

  1. its formally verified. it doesnt get any better than that for safety and security.
  2. the operations are constant time where we need them to be
  3. the implementation is highly optimized and should perform at the peak of what we want
  4. everybody else doing serious things with curves and crypto are using it

Field Elements should have all of the usual operations which appear to be readily available.

We're in too deep to give up now

good problems tend to explode into larger ones. this is way farther than I got last time I tried this and the solution seems within reach

last note, in my blind stupor towards building this out Ive abandoned the other curves and am focusing solely on E448. If when we land this thing we can revisit to make it generic and support the other curves which should be possible with the upgraded FieldElement type.

@drcapybara drcapybara self-assigned this Nov 8, 2023
@drcapybara
Copy link
Owner Author

drcapybara commented Nov 10, 2023

Due a windows build problem with gmp-mpfr-sys, Ive gone ahead and removed rug and replaced it with num-bigint. As I feared, this has caused a substantial regression in the performance of our asymmetric operations:

e222 + SHA3-224 Asymmetric enc + dec                                                                            
time:   [380.56 ms 381.19 ms 381.93 ms]
change: [+53.635% +53.975% +54.327%] (p = 0.00 < 0.05)
Performance has regressed.
e521 + SHA3-512 Asymmetric enc + dec                                                                            
time:   [1.0599 s 1.0606 s 1.0624 s]
change: [+73.890% +74.364% +74.810%] (p = 0.00 < 0.05)
Performance has regressed.

which hopefully illustrates our motivation to grab rug in the first place.

@drcapybara
Copy link
Owner Author

drcapybara commented Nov 10, 2023

Update:

Brain mush viscosity is somewhere between banana pudding and that milk in the fridge thats definitely expired but I feel too guilty to throw away.

Good news:

Basically just co-opting large parts of this excellent crate, we've managed to get E448 largely up and running with add and scalar_mul initially confirmed to be working as expected. We've also dropped in a few nice optimizations that the original authors of the aforementioned crate left open.

To illustrate how dramatic the difference is between our original correct but naive approach,

This:

pub fn add(mut self, p2: &EdCurvePoint) -> EdCurvePoint {
    let x1 = &self.x;
    let y1 = &self.y;
    let x2 = p2.x.clone();
    let y2 = &p2.y;

    let p = self.p.clone();
    let d = self.d.clone();

    // (x₁y₂ + y₁x₂)
    let x1y2 = x1.clone() * y2.clone();
    let y1x2 = y1.clone() * x2.clone();
    let x1y2y1x2_sum = x1y2 + y1x2;
    // 1 / (1 + dx₁x₂y₁y₂)
    let one_plus_dx1x2y1y2 = (Integer::from(1)
        + (d.clone() * x1.clone() * x2.clone() * y1.clone() * y2.clone()))
        % p.clone();
    let one_plus_dx1x2y1y2inv = mod_inv(&one_plus_dx1x2y1y2, &p);
    // (y₁y₂ − x₁x₂)
    let y1y2x1x2_difference = (y1.clone() * y2.clone()) - (x1.clone() * x2.clone());
    // 1 / (1 − dx₁x₂y₁y₂)
    let one_minus_dx1x2y1y2 = (Integer::from(1) - (d * x1 * x2 * y1 * y2)) % p.clone();
    let one_minus_dx1x2y1y2inv = mod_inv(&one_minus_dx1x2y1y2, &p);
    // (x₁y₂ + y₁x₂) / (1 + dx₁x₂y₁y₂)
    let new_x = ((x1y2y1x2_sum * one_plus_dx1x2y1y2inv) % p.clone() + p.clone()) % p.clone();
    // (y₁y₂ − x₁x₂) / (1 − dx₁x₂y₁y₂)
    let new_y = ((y1y2x1x2_difference * one_minus_dx1x2y1y2inv) % p.clone() + p.clone()) % p;
    self.x = new_x;
    self.y = new_y;
    self
}

Has become this:

pub fn add(&self, other: &ExtendedCurvePoint) -> ExtendedCurvePoint {
    let aXX = self.X * other.X;
    let dTT = EDWARDS_D * self.T * other.T;
    let ZZ = self.Z * other.Z;
    let YY = self.Y * other.Y;
    let X1Y2_plus_Y1X2 = (self.X * other.Y) + (self.Y * other.X);

    let X = X1Y2_plus_Y1X2 * (ZZ - dTT);
    let Y = (YY - aXX) * (ZZ + dTT);
    let T = (YY - aXX) * X1Y2_plus_Y1X2;
    let Z = (ZZ - dTT) * (ZZ + dTT);

    ExtendedCurvePoint { X, Y, Z, T }
}

This is a pretty kickass development considering that we've reached basically all of our original design goals:

  1. Excessive clones completely eliminated
  2. Rug and num-bigint completely eliminated
  3. Readability of add is substantially improved
  4. Reduced number of squarings and inversions from original crate
  5. Upgraded scalar type to take advantage of crypto-bigint
  6. Formally verified fiat-crypto delivers safe and fixed-time field arithmetic
  7. we learned a whole lot and made lots of friends along the way

The following test cases succeed:
running 7 tests

  • test curve::extended_edwards::test_g_plus_neg_g ... ok
  • test curve::extended_edwards::r_times_g_id ... ok
  • test curve::extended_edwards::test_four_g_not_id ... ok
  • test curve::extended_edwards::test_g_times_one_g ... ok
  • test curve::extended_edwards::test_g_times_zero_id ... ok
  • test curve::extended_edwards::test_g_times_two_g_plus_g ... ok
  • test curve::extended_edwards::k_g_equals_k_mod_r_times_g ... ok

Bad News:

There is none. This work is super cool and rewarding.

Whats left:

  • To get rid of that milk before it starts to stink
  • Montgomery operations
  • Remaining test cases:
    Any test case involving scalar multiplication or addition seems to be failing, and I suspect this may be due to the montgomery space issue we described above. I think a good place to go next is to get the ladder and the montgomery curve form spun up and then proceed for there. I've implemented montgomery multiplication from scratch but I cant speak to whether it is constant time or not. I'd like to use crypto-bigint for this but the mul_mod functionality is cumbersome to figure out at this time.

@drcapybara
Copy link
Owner Author

drcapybara commented Nov 11, 2023

Update:

We are 3 test cases away from complete success

These test cases are passing:

0 * G = 𝒪
G * 1 = G
G + (-G) = 𝒪
2 * G = G + G
4 * G != 𝒪
r * G = 𝒪
(k + 1) * G = (k * G) + G
k*G = (k % r) * G

leaving the following left to fix:

(k + t) * G = (k * G) + (t * G)
k * (t * G) = t * (k * G) = (k * t mod r) * G
4 * G = 2 * (2 * G)

Observations:

  1. There seems to be an issue I have yet to track down where something is happening with chained multiplication operations producing incorrect results. I have a few ideas about what this might be.
  2. Page 4 of this paper specifies the scalar is always known to be a multiple of 4. We can use our neat little trick from earlier to quickly ensure all of our randomly generated scalars are always a multiple of 4 by simply popping off the last 2 bits:
s &= !0b11;

Preparing for merge:

  1. We need to start migrating this version of the curve into the crate and remove the original version. Our goal should be to match the original API as much as possible to avoid having to rip out whole sections of ops and other parts of the library.
  2. There is a doubling formula that can be implemented in the ExtendedCurvePoint implementation. Currently it just adds to itself, but it might be helpful to the original authors if we can upstream that doubling formula.
  3. Scalars should be generated as 512 bits according to Dr. Barreto. We need to make sure we are capable of generating these and that the test cases still pass.
  4. Unknown if we will actually end up needing Montgomery points. It might turn out that this is the case as to why the 3 test cases from above are failing. Its worth an investigation but if we can get all of these to pass without it, we might merge what we have and come back to that later if its worth it.

@drcapybara
Copy link
Owner Author

drcapybara commented Dec 10, 2023

WE DID IT

All tests pass:

running 11 tests
test e448_tests::test_g_plus_neg_g ... ok
test e448_tests::test_zero_times_g ... ok
test e448_tests::test_four_g_not_id ... ok
test e448_tests::test_two_times_g ... ok
test e448_tests::test_g_times_one ... ok
test e448_tests::test_four_g ... ok
test e448_tests::k_g_equals_k_mod_r_times_g ... ok
test e448_tests::k_t ... ok
test e448_tests::r_times_g_id ... ok
test e448_tests::k_plus_one_g ... ok
test e448_tests::test_ktp ... ok

The problem was with the implementations for the scalar type, do remember that modding scalar operations is done with the curve order r, not the field prime p.

We provide a slightly modified version of curve25519-dalek/ variable base, fixed time multiplication found here, and also here, the only difference being that we operate on u64 limbs to be compatible with the U448 type. Changing the number of limbs required carefully ensuring the radix_16 recentering step was correct, as well as the modular reduction step for scalar multiplication.

Rug and BigInt are gone:

Ive gone ahead and memorialized this momentous achievement:
ezgif com-video-to-gif

What this means for the library:

  1. We went from this:
fn add(mut self, p2: &EdCurvePoint) -> EdCurvePoint {
    let x1 = &self.x;
    let y1 = &self.y;
    let x2 = p2.x.clone();
    let y2 = &p2.y;

    let p = self.p.clone();
    let d = self.d.clone();

    // (x₁y₂ + y₁x₂)
    let x1y2 = x1.clone() * y2.clone();
    let y1x2 = y1.clone() * x2.clone();
    let x1y2y1x2_sum = x1y2 + y1x2;
    // 1 / (1 + dx₁x₂y₁y₂)
    let one_plus_dx1x2y1y2 = (Integer::from(1)
        + (d.clone() * x1.clone() * x2.clone() * y1.clone() * y2.clone()))
        % p.clone();
    let one_plus_dx1x2y1y2inv = mod_inv(&one_plus_dx1x2y1y2, &p);
    // (y₁y₂ − x₁x₂)
    let y1y2x1x2_difference = (y1.clone() * y2.clone()) - (x1.clone() * x2.clone());
    // 1 / (1 − dx₁x₂y₁y₂)
    let one_minus_dx1x2y1y2 = (Integer::from(1) - (d * x1 * x2 * y1 * y2)) % p.clone();
    let one_minus_dx1x2y1y2inv = mod_inv(&one_minus_dx1x2y1y2, &p);
    // (x₁y₂ + y₁x₂) / (1 + dx₁x₂y₁y₂)
    let new_x = ((x1y2y1x2_sum * one_plus_dx1x2y1y2inv) % p.clone() + p.clone()) % p.clone();
    // (y₁y₂ − x₁x₂) / (1 − dx₁x₂y₁y₂)
    let new_y = ((y1y2x1x2_difference * one_minus_dx1x2y1y2inv) % p.clone() + p.clone()) % p;
}

To this:

pub fn add_extended(&self, other: &ExtendedPoint) -> ExtensibleCurvePoint {
    let A = self.X * other.X;
    let B = self.Y * other.Y;
    let C = self.T1 * self.T2 * other.T * TWISTED_D;
    let D = self.Z * other.Z;
    let E = (self.X + self.Y) * (other.X + other.Y) - A - B;
    let F = D - C;
    let G = D + C;
    let H = B + A;
}
  1. Scalar multiplication is now fixed time. With the use of a lookup table, we can decompose the scalar and precompute multiples of P, which provides dramatic decreases in runtime. We then traverse the table and select the correct item, all in fixed-time.
  2. Curve point coordinates are now fiat-crypto, formally-verified field element types distinct from the U448 backing store of the points.
  3. Asymmetric operations will likely improve in performance given the complete mitigation of the excessive cloning.
  4. We dont need any Montgomery representation or operations anymore. What we've built here delivers all of our asks from the beginning. If we ever wanted to turn this into something more like what the dalek authors have designed, then there is an argument for more exotic representations. But for now, we can get away with only using Affine, extended, twisted, and projective Niels.
  5. All operations are now happening on the twist, which means we only have to modular reduce a single time during conversion back to affine.
  6. These improvements combined lead to a nearly x193 times speedup over the previous approach.

Whats left:

  • Integrate curve into ops: I doubt this will go smoothly, it never has so far. Open question about how to compute a standalone modular operation. This affects both signatures and encryptions.
  • bench against obsolete affine composition formula
  • docs

This might need to go into it's own crate eventually. We dont want it to be too tightly coupled to the cryptosystem functions.

@drcapybara
Copy link
Owner Author

It all works (victory)

All asymmetric operations have now been reconfigured to use the upgraded curve. All tests pass.

What this change takes away from the library:

Its worth making a quick note about things we were forced to do to get to this point. One of the nice things about rug is that it can store any bitsize, which lead to the wide variety of edwards curves we originally had available to us. In fact, the only reason we were able to replace the coordinate types with fiat-crypto tight field elements was specifically because of the P448 type available. Without this, we would need to use the U448 type in crypto-bigint which wouldnt be a terrible choice at all, but crypto-bigint also has limited support for the funky bitsizes youll commonly see in exotic math objects like elliptic curves.

Thus, we lose support for the following:

  • e222
  • e381
  • e521

Im unsure if we can bring them back at some point. Arbitrary precision types with odd bitsizes have an extra layer of difficulty in a homebrew solution, and unless the fiat-crypto authors or someone who contributes to it brings them to life, we are constrained to the types available to us.

@drcapybara
Copy link
Owner Author

All tests passing, merging to main

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improve the design, usability, safety, and/or efficiency of the library Larger Project Larger project sizing affecting multiple areas of the library. Not difficult, but more to consider. literature review Interpret journal paper into code! Way more fun than it sounds :D side-channel Security risk that we should patch up ASAP
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

1 participant