Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark on Math.sqrt and the same implementation with "Fast Inverse Square Root" #1120

Closed
Becavalier opened this issue Aug 8, 2017 · 9 comments

Comments

@Becavalier
Copy link
Contributor

Becavalier commented Aug 8, 2017

image

All those benchmark tests were implemented with Emscripten tool chain, and the generated webassembly binary was optimized with flag "-O3".

Generally speaking, the same "sqrt" function which implemented with "Fast Inverse Square Root" should be faster than "Math.sqrt" which is a native javascript function.

But the benchmark result is really different and weird.

The question is "I am not sure if this is a problem of emscripten or it should be like this?"

BTW, the test result is from "Version 62.0.3178.0 (Official Build) canary (64-bit)"

@Becavalier
Copy link
Contributor Author

Here I will show you the source code:

extern "C" {
	double EMSCRIPTEN_KEEPALIVE f64_sqrt (double number) {
	    long i;
	    float x, y;
	    const float f = 1.5F;
	    x = number * 0.5F;
	    y = number;
	    i = *(long *) &y;
	    i = 0x5f3759df - (i >> 1);
	    y = *(float *) &i;
	    y = y * (f - (x * y * y));

	    return number * y;
	}
}

@binji
Copy link
Member

binji commented Aug 8, 2017

It's hard to tell without your benchmark, but it looks like you may actually be measuring the cost of the function call into wasm (which is currently relatively expensive in v8).

Here's an alternate benchmark that sums square roots from 1 to N. It tests using three different sqrt functions:

  • Your f64_sqrt function above
  • The WebAssembly instruction f64.sqrt
  • JavaScript Math.sqrt
function module(bytes) {
  let buffer = new ArrayBuffer(bytes.length);
  let view = new Uint8Array(buffer);
  for (let i = 0; i < bytes.length; ++i) {
    view[i] = bytes.charCodeAt(i);
  }
  return new WebAssembly.Module(buffer);
}

function instance(bytes) {
  return new WebAssembly.Instance(module(bytes), {});
}

let inst = instance("\x00\x61\x73\x6d\x01\x00\x00\x00\x01\x0b\x02\x60\x01\x7c\x01\x7c\x60\x01\x7f\x01\x7c\x03\x05\x04\x00\x00\x01\x01\x07\x29\x04\x05\x73\x71\x72\x74\x31\x00\x00\x05\x73\x71\x72\x74\x32\x00\x01\x09\x73\x75\x6d\x5f\x73\x71\x72\x74\x31\x00\x02\x09\x73\x75\x6d\x5f\x73\x71\x72\x74\x32\x00\x03\x0a\x7a\x04\x33\x01\x01\x7d\x41\xdf\xb3\xdd\xf9\x05\x20\x00\xb6\xbc\x41\x01\x75\x6b\xbe\x22\x01\x43\x00\x00\xc0\x3f\x20\x01\x20\x00\x44\x00\x00\x00\x00\x00\x00\xe0\x3f\xa2\xb6\x20\x01\x94\x94\x93\x94\xbb\x20\x00\xa2\x0b\x05\x00\x20\x00\x9f\x0b\x1f\x01\x01\x7c\x03\x40\x20\x00\xb8\x10\x00\x20\x01\xa0\x21\x01\x20\x00\x41\x01\x6b\x22\x00\x41\x00\x4e\x0d\x00\x0b\x20\x01\x0b\x1e\x01\x01\x7c\x03\x40\x20\x00\xb8\x9f\x20\x01\xa0\x21\x01\x20\x00\x41\x01\x6b\x22\x00\x41\x00\x4e\x0d\x00\x0b\x20\x01\x0b");

let value = 100 * 1000 * 1000;

function measure(name, f) {
  let start = performance.now();
  let x = f(value);
  let end = performance.now();
  let elapsed = end - start;

  print(name, elapsed.toFixed(3) + 'ms', x);
}

function sum_sqrt(n) {
  let sum = 0;
  for (; n >= 0; n--) {
    sum += Math.sqrt(n);
  }
  return sum;
}

measure('fast inv sqrt', inst.exports.sum_sqrt1);
measure('wasm sqrt', inst.exports.sum_sqrt2);
measure('Math.sqrt', sum_sqrt);

And here are the results on my machine:

fast inv sqrt 351.359ms 666130229317.0295
wasm sqrt 181.194ms 666666671666.9794
Math.sqrt 490.354ms 666666671666.9794

@Becavalier
Copy link
Contributor Author

@binji Really make sense!! Thanks!

@Becavalier
Copy link
Contributor Author

So, here is a conclusion:
If one wanna optimize the binary which compiled from c/c++ source code, it's better to replace those methods which from c/c++ standard library with native webassembly operations (like f32.sqrt) in corresponding wat file manually?

@binji
Copy link
Member

binji commented Aug 8, 2017

I don't think that's necessary. If I compile the following C file:

#include <math.h>

double my_sqrt(double x) {
  return sqrt(x);
}

float my_sqrtf(float x) {
  return sqrtf(x);
}

with emscripten using this command:

emcc sqrt.c -s WASM=1 -s SIDE_MODULE=1 -O3 -o sqrt.wasm

I get a module that uses the wasm instructions directly:

  ...
  (func $f0 (export "_my_sqrt") (param $p0 f64) (result f64)
    (f64.sqrt
      (get_local $p0)))
  (func $f1 (param $p0 f32) (result f32)
    (f32.sqrt
      (get_local $p0)))
  ...
  (func $f4 (export "_my_sqrtf") (param $p0 f64) (result f64)
    (f64.promote/f32
      (call $f1
        (f32.demote/f64
          (get_local $p0)))))
  ...

@Becavalier
Copy link
Contributor Author

Please try this one:

     function measure (name, f) {
        let value = 100 * 1000 * 1000
        let start = performance.now()
        let sum = 0
        for (; value >= 0; value--) {
          sum += f(value)
        }
        let end = performance.now()
        let elapsed = end - start
        console.log(name, elapsed.toFixed(3) + 'ms')
      }
      measure('i64_sqrt', eufa.Math.i64_sqrt)
      measure('Math.sqrt', Math.sqrt)

Result(Average):

"i64_sqrt" "1047.590ms"
"Math.sqrt" "903.970ms"

i64_sqrt(generated by emscripten, and translated into .wat by binaryen):

(func $17 (type $1) (param $var$0 i32) (result i32)
  (i32.trunc_u/f64
   (f64.sqrt
    (f64.convert_u/i32
     (get_local $var$0)
    )
   )
  )
 )

@binji
Copy link
Member

binji commented Aug 9, 2017

It looks like that will have the same issue as above, you're likely seeing the (slight) overhead when calling from JavaScript into WebAssembly and back.

Ultimately this is the sort of thing that a JavaScript engine will be able to optimize pretty effectively, so I wouldn't expect the results to differ that much. For example, when I ran this same benchmark on my desktop, I get these results:

fast inv sqrt 526.419ms 666130229317.0295
wasm sqrt 621.159ms 666666671666.9794
Math.sqrt 637.227ms 666666671666.9794

@Becavalier
Copy link
Contributor Author

Becavalier commented Aug 9, 2017

Yes, webassembly seems does not have much advantages at those single basic-level functions (basically like: plus, minus, sqrt, abs and etc), I will try digging into more details on how to implement wasm in a high efficient condition, thanks!!!

@juj
Copy link

juj commented Aug 9, 2017

This was raised and discussed at Emscripten bug tracker also at emscripten-core/emscripten#5439.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants