C version uses i64 & u64 -- integers, but file has Float64Array's all over #45

3nsoft · 2014-07-26T21:37:57Z

nacl.js as well as nacl-fast.js have Float64Array's in places where i64 & u64 types are present in C code.

There are no floating point operations in NaCl. And JS's regular number, which happens to be float64, are OK to do 32 bits operation, with an exception of multiplication (as it has been said in #41 ). Yet, when C code has u64, the intention of C developers, I suppose, is to have/use all 64 bits in mod 64-bits arithmetic.

If you prove that it is OK to have float64 instead of u64, then such code will be provably correct. Otherwise, not-failing tests do not constitute such proof (unless it is run on all possible inputs).

Let's say with line 667 in nacl-fast:

function M(o, a, b) {
 var i, j, t = new Float64Array(31);
 for (i = 0; i < 31; i++) t[i] = 0;
 for (i = 0; i < 16; i++) {
   for (j = 0; j < 16; j++) {
     t[i+j] += a[i] * b[j];
   }
 }
 ...

Even if a's and b's are within 32 bits, where is a gauruntee that t's will not be rounded as a float, instead of overflowing as in integers.

The text was updated successfully, but these errors were encountered:

dchest · 2014-07-27T11:42:37Z

Note that we don't want modular arithmetic here, we just need enough integer bits to hold the result until performing carry.

@devi do you think there's a chance of going over [-2^53, 2^53] in M (multiplication)? We know that after multiplication car25519 puts the result in range [0, 2^16-1]. The inputs to multiplication (or squaring) come from A (addition) and Z (subtraction) which in their turn, receive 16-bit inputs.

devi · 2014-07-27T13:54:00Z

To be honest, I'm not sure.
Simple test from input 0x00 to 0xff never touch [-2^53, 2^53].

dchest · 2014-07-27T15:13:33Z

OK, so let's take a look. The "meat" of crypto_scalarmult is this:

  for (i=254;i>=0;--i) {
    r=(z[i>>>3]>>>(i&7))&1;
    sel25519(a,b,r);
    sel25519(c,d,r);
    A(e,a,c); 
    Z(a,a,c);
    A(c,b,d);
    Z(b,b,d);
    S(d,e);
    S(f,a);
    M(a,c,a);
    M(c,b,e);
    A(e,a,c);
    Z(a,a,c);
    S(b,a);
    Z(c,d,f);
    M(a,c,_121665);
    A(a,a,d);
    M(c,c,a);
    M(a,d,f);
    M(d,b,x);
    S(b,e);
    sel25519(a,b,r);
    sel25519(c,d,r);
  }

Inputs start with limbs that are radix 2^16, each maximum 65535. Notice that there is at most one addition (A) and one subtraction (Z) before each multiplication (M) or squaring (S, which is just multiplication of the input by itself) touching the input. For example, consider e;

   A(e,a,c);
   ...
   S(d,e);

The maximum value each of the numbers in e can be is 65535+65535 = 131070, because here's what A does:

  for (i = 0; i < 16; i++) o[i] = (a[i] + b[i])|0;

(in fact, we don't need | 0 here, but it doesn't hurt).

Let's set all values in e to the maximum value and call part of S without carrying:

function M_nocar(o, a, b) {
  var i, j, t = new Float64Array(31);
  for (i = 0; i < 31; i++) t[i] = 0;
  for (i = 0; i < 16; i++) {
    for (j = 0; j < 16; j++) {
      t[i+j] += a[i] * b[j];
    }
  }
  for (i = 0; i < 15; i++) {
    t[i] += 38 * t[i+16];
  }
  for (i = 0; i < 16; i++) o[i] = t[i];
}

var e = new Float64Array(16);
for (var i = 0; i < 16; i++) e[i] = 131070;
var o = new Float64Array(16);
M_nocar(o, e, e);
console.log(o);

We get:

[9809405937900, 9173770176600, 8538134415300, 7902498654000, 7266862892700, 6631227131400, 5995591370100, 5359955608800, 4724319847500, 4088684086200, 3453048324900, 2817412563600, 2181776802300, 1546141041000, 910505279700, 274869518400]

Let's find the maximum value here:

var max = 0; for (i = 0; i < o.length; i++) if (o[i] > max) max = o[i];
console.log(max);

Output:

9809405937900

How many bits is this?

console.log(Math.log(max) / Math.LN2)
43.15730290746831

So, for the maximum value of multiplication we need only 44 bits, which fits into JavaScript's range of integer 53 bits without rounding.

After that, in M the result is being pushed back into 16-bit range by:

  car25519(o);
  car25519(o);

so that the following operations again start with 16-bit inputs.

(Also notice that car25519 is the only function which is not implemented with bit-arithmetic like in tweenacl.c, because it's dealing with numbers larger than 2^32).

Any objections or something that I missed?

3nsoft · 2014-07-27T17:35:16Z

Now, this is a proper explanation. Bravo.
Can it be placed into code as comments, so that this does not scare anyone else. Statement """We know that after multiplication car25519 puts the result in range [0, 2^16-1].""" is not particularly obvious for someone who starts to read the code. Comments about assumption on input bit limits and guarantees for the outgoing things, shall remove any confusion.

devi · 2014-07-27T17:46:45Z

To extend @dchest explanation:
| 0 in A is to make sure that we're dealing with integer addition when not using typed-array as inputs.

3nsoft · 2014-07-27T17:53:14Z

That |0 is an implicit 32bit cut, is sort of part of wider public understanding.
But that & 0xffff for every o[i] will changes nothing at the end of car25519 is not obvious. Neither when I look at js, nor C sources.

dchest · 2014-07-27T18:08:04Z

@3nsoft agree on comments. Although we should probably just create an extended commented version of nacl.js to keep the original short like tweenacl.c :-) Also TweetNaCl paper is useful for figuring out what's going on: http://tweetnacl.cr.yp.to/tweetnacl-20131229.pdf.

@devi in this case we don't need 32-bit modular addition, so | 0 is useless (we add two float64 values containing 16-bit numbers, which in the result can give maximum 2^17).

3nsoft · 2014-07-27T18:17:06Z

I think, that we should put links, or pdf into docs folder with the following http://cr.yp.to/ecdh/curve25519-20060209.pdf
Page 9, section 4. """Representing coefficients inside CPUs.""" This is a reasoning that went into the code. (A good lesson for today, thanks everybody :) )
Still, when coming from code-reading end, without formulas for every function, showing how exactly it splits and transforms a 255bit integer.

I totally understand a desire to have clean, short code like original tweet. But, already nacl-fast does not look tweety-sweety. A simple, stupid salsa's line-by-line makes it run faster, in real world, but makes as long as original 100 tweets. Real world is what ultimately matters. And lib's users, other programmers need docs, they need human-parseable code, etc. (damn!) :) .

3nsoft closed this as completed Jul 27, 2014

dlongley mentioned this issue Oct 14, 2019

Replace floating point operations with integer ops digitalbazaar/forge#726

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

C version uses i64 & u64 -- integers, but file has Float64Array's all over #45

C version uses i64 & u64 -- integers, but file has Float64Array's all over #45

3nsoft commented Jul 26, 2014

dchest commented Jul 27, 2014

devi commented Jul 27, 2014

dchest commented Jul 27, 2014

3nsoft commented Jul 27, 2014

devi commented Jul 27, 2014

3nsoft commented Jul 27, 2014

dchest commented Jul 27, 2014

3nsoft commented Jul 27, 2014