Benchmark on Math.sqrt and the same implementation with "Fast Inverse Square Root" #1120

Becavalier · 2017-08-08T02:03:52Z

All those benchmark tests were implemented with Emscripten tool chain, and the generated webassembly binary was optimized with flag "-O3".

Generally speaking, the same "sqrt" function which implemented with "Fast Inverse Square Root" should be faster than "Math.sqrt" which is a native javascript function.

But the benchmark result is really different and weird.

The question is "I am not sure if this is a problem of emscripten or it should be like this?"

BTW, the test result is from "Version 62.0.3178.0 (Official Build) canary (64-bit)"

Becavalier · 2017-08-08T02:07:48Z

Here I will show you the source code:

extern "C" {
	double EMSCRIPTEN_KEEPALIVE f64_sqrt (double number) {
	    long i;
	    float x, y;
	    const float f = 1.5F;
	    x = number * 0.5F;
	    y = number;
	    i = *(long *) &y;
	    i = 0x5f3759df - (i >> 1);
	    y = *(float *) &i;
	    y = y * (f - (x * y * y));

	    return number * y;
	}
}

binji · 2017-08-08T07:19:05Z

It's hard to tell without your benchmark, but it looks like you may actually be measuring the cost of the function call into wasm (which is currently relatively expensive in v8).

Here's an alternate benchmark that sums square roots from 1 to N. It tests using three different sqrt functions:

Your f64_sqrt function above
The WebAssembly instruction f64.sqrt
JavaScript Math.sqrt

function module(bytes) {
  let buffer = new ArrayBuffer(bytes.length);
  let view = new Uint8Array(buffer);
  for (let i = 0; i < bytes.length; ++i) {
    view[i] = bytes.charCodeAt(i);
  }
  return new WebAssembly.Module(buffer);
}

function instance(bytes) {
  return new WebAssembly.Instance(module(bytes), {});
}

let inst = instance("\x00\x61\x73\x6d\x01\x00\x00\x00\x01\x0b\x02\x60\x01\x7c\x01\x7c\x60\x01\x7f\x01\x7c\x03\x05\x04\x00\x00\x01\x01\x07\x29\x04\x05\x73\x71\x72\x74\x31\x00\x00\x05\x73\x71\x72\x74\x32\x00\x01\x09\x73\x75\x6d\x5f\x73\x71\x72\x74\x31\x00\x02\x09\x73\x75\x6d\x5f\x73\x71\x72\x74\x32\x00\x03\x0a\x7a\x04\x33\x01\x01\x7d\x41\xdf\xb3\xdd\xf9\x05\x20\x00\xb6\xbc\x41\x01\x75\x6b\xbe\x22\x01\x43\x00\x00\xc0\x3f\x20\x01\x20\x00\x44\x00\x00\x00\x00\x00\x00\xe0\x3f\xa2\xb6\x20\x01\x94\x94\x93\x94\xbb\x20\x00\xa2\x0b\x05\x00\x20\x00\x9f\x0b\x1f\x01\x01\x7c\x03\x40\x20\x00\xb8\x10\x00\x20\x01\xa0\x21\x01\x20\x00\x41\x01\x6b\x22\x00\x41\x00\x4e\x0d\x00\x0b\x20\x01\x0b\x1e\x01\x01\x7c\x03\x40\x20\x00\xb8\x9f\x20\x01\xa0\x21\x01\x20\x00\x41\x01\x6b\x22\x00\x41\x00\x4e\x0d\x00\x0b\x20\x01\x0b");

let value = 100 * 1000 * 1000;

function measure(name, f) {
  let start = performance.now();
  let x = f(value);
  let end = performance.now();
  let elapsed = end - start;

  print(name, elapsed.toFixed(3) + 'ms', x);
}

function sum_sqrt(n) {
  let sum = 0;
  for (; n >= 0; n--) {
    sum += Math.sqrt(n);
  }
  return sum;
}

measure('fast inv sqrt', inst.exports.sum_sqrt1);
measure('wasm sqrt', inst.exports.sum_sqrt2);
measure('Math.sqrt', sum_sqrt);

And here are the results on my machine:

fast inv sqrt 351.359ms 666130229317.0295
wasm sqrt 181.194ms 666666671666.9794
Math.sqrt 490.354ms 666666671666.9794

Becavalier · 2017-08-08T07:38:42Z

@binji Really make sense!! Thanks!

Becavalier · 2017-08-08T07:50:33Z

So, here is a conclusion:
If one wanna optimize the binary which compiled from c/c++ source code, it's better to replace those methods which from c/c++ standard library with native webassembly operations (like f32.sqrt) in corresponding wat file manually?

binji · 2017-08-08T08:22:07Z

I don't think that's necessary. If I compile the following C file:

#include <math.h>

double my_sqrt(double x) {
  return sqrt(x);
}

float my_sqrtf(float x) {
  return sqrtf(x);
}

with emscripten using this command:

emcc sqrt.c -s WASM=1 -s SIDE_MODULE=1 -O3 -o sqrt.wasm

I get a module that uses the wasm instructions directly:

  ...
  (func $f0 (export "_my_sqrt") (param $p0 f64) (result f64)
    (f64.sqrt
      (get_local $p0)))
  (func $f1 (param $p0 f32) (result f32)
    (f32.sqrt
      (get_local $p0)))
  ...
  (func $f4 (export "_my_sqrtf") (param $p0 f64) (result f64)
    (f64.promote/f32
      (call $f1
        (f32.demote/f64
          (get_local $p0)))))
  ...

Becavalier · 2017-08-08T10:49:13Z

Please try this one:

     function measure (name, f) {
        let value = 100 * 1000 * 1000
        let start = performance.now()
        let sum = 0
        for (; value >= 0; value--) {
          sum += f(value)
        }
        let end = performance.now()
        let elapsed = end - start
        console.log(name, elapsed.toFixed(3) + 'ms')
      }
      measure('i64_sqrt', eufa.Math.i64_sqrt)
      measure('Math.sqrt', Math.sqrt)

Result(Average):

"i64_sqrt" "1047.590ms"
"Math.sqrt" "903.970ms"

i64_sqrt(generated by emscripten, and translated into .wat by binaryen):

(func $17 (type $1) (param $var$0 i32) (result i32)
  (i32.trunc_u/f64
   (f64.sqrt
    (f64.convert_u/i32
     (get_local $var$0)
    )
   )
  )
 )

binji · 2017-08-09T01:14:19Z

It looks like that will have the same issue as above, you're likely seeing the (slight) overhead when calling from JavaScript into WebAssembly and back.

Ultimately this is the sort of thing that a JavaScript engine will be able to optimize pretty effectively, so I wouldn't expect the results to differ that much. For example, when I ran this same benchmark on my desktop, I get these results:

fast inv sqrt 526.419ms 666130229317.0295
wasm sqrt 621.159ms 666666671666.9794
Math.sqrt 637.227ms 666666671666.9794

Becavalier · 2017-08-09T02:12:53Z

Yes, webassembly seems does not have much advantages at those single basic-level functions (basically like: plus, minus, sqrt, abs and etc), I will try digging into more details on how to implement wasm in a high efficient condition, thanks!!!

juj · 2017-08-09T06:40:28Z

This was raised and discussed at Emscripten bug tracker also at emscripten-core/emscripten#5439.

Becavalier mentioned this issue Aug 17, 2017

Webasm can replace Javascript in browser? #1126

Closed

Becavalier closed this as completed May 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark on Math.sqrt and the same implementation with "Fast Inverse Square Root" #1120

Benchmark on Math.sqrt and the same implementation with "Fast Inverse Square Root" #1120

Becavalier commented Aug 8, 2017 •

edited

Becavalier commented Aug 8, 2017

binji commented Aug 8, 2017

Becavalier commented Aug 8, 2017

Becavalier commented Aug 8, 2017

binji commented Aug 8, 2017

Becavalier commented Aug 8, 2017

binji commented Aug 9, 2017

Becavalier commented Aug 9, 2017 •

edited

juj commented Aug 9, 2017

Benchmark on Math.sqrt and the same implementation with "Fast Inverse Square Root" #1120

Benchmark on Math.sqrt and the same implementation with "Fast Inverse Square Root" #1120

Comments

Becavalier commented Aug 8, 2017 • edited

Becavalier commented Aug 8, 2017

binji commented Aug 8, 2017

Becavalier commented Aug 8, 2017

Becavalier commented Aug 8, 2017

binji commented Aug 8, 2017

Becavalier commented Aug 8, 2017

binji commented Aug 9, 2017

Becavalier commented Aug 9, 2017 • edited

juj commented Aug 9, 2017

Becavalier commented Aug 8, 2017 •

edited

Becavalier commented Aug 9, 2017 •

edited