Investigating differences between asm2wasm and wasm backend output, one cause of the latter's larger code sizes is that it emits loops over an array differently. For example, in a loop writing array[x] = x over an array of ints, asm2wasm emits
(set_local $1
(i32.const 0)
)
(loop $label$2
(i32.store
(i32.add
(get_local $4)
(i32.shl
(get_local $1)
(i32.const 2)
)
)
(get_local $1)
)
(br_if $label$2
(i32.ne
(get_local $3)
(tee_local $1
(i32.add
(get_local $1)
(i32.const 1)
)
)
)
)
)
while the wasm backend emits
(set_local $6
(i32.const 0)
)
(set_local $5
(get_local $3)
)
(loop $label$5
(i32.store
(get_local $5)
(get_local $6)
)
(set_local $5
(i32.add
(get_local $5)
(i32.const 4)
)
)
(br_if $label$5
(i32.ne
(get_local $0)
(tee_local $6
(i32.add
(get_local $6)
(i32.const 1)
)
)
)
)
)
The difference is that in asm2wasm we have one loop variable, the counter, and we calculate the address in the array in each iteration, using an add and a shift, whereas in the wasm backend there are two loop variables, a second for the array offset, and so instead of computing the array address we increment the second variable.
Both seem to run at around the same speed in a simple loop in firefox and chrome. But
- the wasm backend's approach is larger, 74 vs 69 bytes for that loop example, and
- uses two locals instead of 1, which I suspect might explain part of why fannkuch is slower there, as it has multiply-nested such loops, so it's keeping more variables alive across large areas, possibly causing spilling.
Thoughts?
Investigating differences between asm2wasm and wasm backend output, one cause of the latter's larger code sizes is that it emits loops over an array differently. For example, in a loop writing
array[x] = xover an array of ints, asm2wasm emitswhile the wasm backend emits
The difference is that in asm2wasm we have one loop variable, the counter, and we calculate the address in the array in each iteration, using an add and a shift, whereas in the wasm backend there are two loop variables, a second for the array offset, and so instead of computing the array address we increment the second variable.
Both seem to run at around the same speed in a simple loop in firefox and chrome. But
Thoughts?