Skip to content

Conversation

@MaxGraey
Copy link
Member

@MaxGraey MaxGraey commented Oct 5, 2018

  • improve Math.round
  • reinmplement Mathf.random based on xoroshiro64**
  • switch to copysign if possible
  • avoiding select<f64/f32> because it produce complex branchless machine code unlike integer version (which use cmov instructions)
  • better instruction parallelism
  • fix randomSeed setup

Closes #59

@dcodeIO
Copy link
Member

dcodeIO commented Oct 5, 2018

As far as I understood select has advantages where the condition is random and branch prediction doesn't perform well. Isn't that the case here?

@MaxGraey
Copy link
Member Author

MaxGraey commented Oct 5, 2018

select make sense for simple branch expressions in ALU side (integer arithmetic) which produce single instruction cmov and its variants. SSE/FPU hasn't similar instruction and produce much complicated instruction set but it allow us stay in SSE context (switching between ALU and SSE/FPU contexts not cheap as well) but I try use tricks which don't force ALU condition routines and don't leave SSE/FPU context

EDIT SSE/AVX can simulate cmov via masking and blend both arguments but as I mentioned before that not cheap

let z_h = cp_h * p_h;
let dp_l = select<f64>(dp_l1, 0.0, k);
// let dp_l = select<f64>(dp_l1, 0.0, k);
let dp_l: f64 = dp_l1 * <f64><bool>k;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dcodeIO Hmm, not sure dp_l1 * <f64><bool>k is right way. May be

dp_l1 * <f64>(k != 0)

Is better?

Copy link
Member

@dcodeIO dcodeIO Oct 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the : f64 annotation isn't necessary because 1.0 and dp_l1 are already f64. <bool>k will compile to i32.and(k, 1) if I'm not mistaken, while (k != 0) is an i32.ne(k, 0), hmm. Maybe (k != 0) is easier to understand.

return y;
export function round(x: f64): f64 {
if (!isFinite(x) || x == 0) return x;
if (-0.5 <= x && x < 0) return -0.0;
Copy link
Member

@dcodeIO dcodeIO Oct 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe copysign(0, x)? and remove the x == 0 check? Overall this looks like it has 4 branches

let z = builtin_sqrt<f64>(yy + 1);
if (e >= 0x3FF + 1) y = log(2 * y + 1 / (z + y));
else if (e >= 0x3FF - 26) y = log1p(y + yy / (z + 1));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This refactor doesn't seem to be worth it because it now calculates yy and z even if none of the if conditions is true

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you are right

var twopk = reinterpret<f64>(u);
var y: f64;
if (k < 0 || k > 56) {
if (<i32>(k < 0) | <i32>(k > 56)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make the compiler smarter about logical ors instead of convoluting the source like this. Wdyt?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely yes! I even have opened proposal for that: #277

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, can we keep the || here and at the other places, if any, for now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

if (ey == 0x7FF) return y;
x = reinterpret<f64>(ux);
if (ex == 0x7FF || uy == 0) return x;
if (<i32>(ex == 0x7FF) | <i32>(uy == 0)) return x;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like here

var hx = <u32>(u >> 32);
var k = 0;
if (hx < 0x00100000 || <bool>(hx >> 31)) {
if (<u32>(hx < 0x00100000) | (hx >> 31)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here

var hx = <u32>(u >> 32);
var k = 0;
if (hx < 0x00100000 || <bool>(hx >> 31)) {
if (<u32>(hx < 0x00100000) | (hx >> 31)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here

var k = 1;
var c = 0.0, f = 0.0;
if (hx < 0x3FDA827A || <bool>(hx >> 31)) {
if (<u32>(hx < 0x3FDA827A) | (hx >> 31)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here

var hx = <u32>(u >> 32);
var k = 0;
if (hx < 0x00100000 || <bool>(hx >> 31)) {
if (<u32>(hx < 0x00100000) | hx >> 31) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here

k = <i32>(invln2 * x + builtin_copysign<f32>(0.5, x));
} else {
k = 1 - sign_ - sign_;
k = 1 - (sign_ << 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this improve something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think yes because sign_ - sign_ is two reads from locals, sign_ << 1 only one read operation

var twopk = reinterpret<f32>(u);
var y: f32;
if (k < 0 || k > 56) {
if (<i32>(k < 0) | <i32>(k > 56)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here

var u = reinterpret<u32>(x);
var k = 0;
if (u < 0x00800000 || <bool>(u >> 31)) {
if (<u32>(u < 0x00800000) | (u >> 31)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here

var ix = reinterpret<u32>(x);
var k = 0;
if (ix < 0x00800000 || <bool>(ix >> 31)) {
if (<u32>(ix < 0x00800000) | (ix >> 31)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here

var c: f32 = 0, f: f32 = 0;
var k: i32 = 1;
if (ix < 0x3ED413D0 || <bool>(ix >> 31)) {
if (<u32>(ix < 0x3ED413D0) | (ix >> 31)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here

var ix = reinterpret<u32>(x);
var k: i32 = 0;
if (ix < 0x00800000 || <bool>(ix >> 31)) {
if (<u32>(ix < 0x00800000) | (ix >> 31)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here

if (iy == 0) return 1.0; // x**0 = 1, even if x is NaN
// if (hx == 0x3F800000) return 1.0; // C: 1**y = 1, even if y is NaN, JS: NaN
if (ix > 0x7F800000 || iy > 0x7F800000) return x + y; // NaN if either arg is NaN
if (<i32>(ix > 0x7F800000) | <i32>(iy > 0x7F800000)) return x + y; // NaN if either arg is NaN
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here

y *= Ox1p_126f;
n += 126;
y *= Ox1p_126f * Ox1p24f;
n += 126 - 24;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this doing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this fix appear later in original source code. Ported "musl" don't reflect this changes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, updates, nice :)

if (ux << 1 == 0) return x;
if (!ex) {
for (i = uxi << 9; i >> 31 == 0; ex--, i <<= 1) {}
ex -= builtin_clz<u32>(uxi << 9);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find :)

if (u < 0x3F800000 - (12 << 23)) return 1;
let t = expm1(x);
return 1 + t * t / (2 * (1 + t));
return 1 + t * t / (2 + 2 * t);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also add a comment here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

if (u < 0x42B17217) {
let t = exp(x);
return 0.5 * (t + 1 / t);
return 0.5 * t + 0.5 / t;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also add a comment here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

@dcodeIO dcodeIO merged commit 376afd4 into AssemblyScript:master Oct 25, 2018
@dcodeIO
Copy link
Member

dcodeIO commented Oct 25, 2018

Great, thanks! :)

@MaxGraey MaxGraey deleted the improve-math branch October 26, 2018 02:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize NativeMath/NativeMathf to use WASM builtins where beneficial

2 participants