Consistently optimize small added constants into load/store offsets #1924

kripken · 2019-02-27T00:47:13Z

See #1919 - we did not do this consistently before.

This adds a lowMemoryUnused option to PassOptions. It can be passed on the commandline with --low-memory-unused. If enabled, we run the new optimize-added-constants pass, which does the real work here, replacing older code in post-emscripten.

Aside from running at the proper time (unlike the old pass, see #1919), this also has a -propagate mode, which can do stuff like this:

y = x + 10
[..]
load(y)
[..]
load(y)
=>
y = x + 10
[..]
load(x, offset=10)
[..]
load(x, offset=10)

That is, it can propagate such offsets to the loads/stores. This pattern is common in big interpreter loops, where the pointers are offsets into a big struct of state.

The pass does this propagation by using a new feature of LocalGraph, which can verify which locals are in SSA mode. Binaryen IR is not SSA (intentionally, since it's a later IR), but if a local only has a single set for all gets, that means that local is in such a state, and can be optimized. The tricky thing is that all locals are initialized to zero, so there are at minimum two sets. But if we verify that the real set dominates all the gets, then the zero initialization cannot reach them, and we are safe.

This PR also makes safe-heap aware of lowMemoryUnused. If so, we check for not just an access of 0, but the range 0-1023.

This makes zlib 5% faster, with either the wasm backend or asm2wasm. It also makes it 0.5% smaller. Also helps sqlite (1.5% faster) and lua (1% faster)

dcodeIO · 2019-02-28T19:28:14Z

I wonder if this pass would make sense in scenarios where the low unused memory region is significantly smaller? In AssemblyScript it's just 8 bytes for example, essentially only leaving some space for null / asm.js reinterpretations.

kripken · 2019-02-28T21:12:31Z

Yeah, we may want to make the size of the region customizable. 1024 is just one possibility.

For AssemblyScript in particular, perhaps you don't need the pass, though - do you already use load/store offests in all possible places? Or are there places binaryen could optimize for you?

MaxGraey · 2019-02-28T21:16:48Z

@kripken We waiting this for a long time=)
AssemblyScript/assemblyscript#32

dcodeIO · 2019-02-28T21:31:22Z

It's using immediate offsets where straight-forward, for example when accessing class instance fields, but there are some occasions where it isn't that easy, for example when doing indexed array accesses (due to the fact that these are actual operator overloaded functions where constant offset components become mutable locals, in turn losing "precomputability"). Can also imagine that, after other optimizations, even our "proper" loads might be condensable even more.

kripken · 2019-02-28T21:41:54Z

Interesting. Is there ever a case where it is not valid to use a load/store offset in AssemblyScript? In C, the problem is that a program may assume a pointer may overflow, and that that is valid (!). But another language may just say that's undefined behavior, and that it is always safe to do

load(x + constant)  =>  load(x, offset=constant)

dcodeIO · 2019-02-28T22:03:46Z

In AssemblyScript's (standard library) case that's always undefined behaviour, yeah. A user who's using load<T> directly might assume something else, of course, but to me it appears that we should rather document that than miss out on the optimizations.

binji · 2019-02-28T22:12:17Z

In C, the problem is that a program may assume a pointer may overflow, and that that is valid (!).

Are you sure? That seems like it would be undefined behavior to me.

tlively · 2019-02-28T23:03:33Z

Section 6.5.6, paragraph 8 of the C17 spec says

If both the pointer operand and the result point
to elements of the same array object, or one past the last element of the array object, the evaluation
shall not produce an overflow; otherwise, the behavior is undefined

So I think this is indeed undefined behavior. However, we might still want to support code that depends on it?

kripken · 2019-03-01T00:15:34Z

Yeah, this may well be undefined behavior. But we've seen real-world code that depends on it, weirdly...

I'm not sure it's undefined by that paragraph, though. What I think happens is this:

x = some pointer, say 1000
y = ptrtoint x
z = y - 1500 = -500 or rather 0xfffffe0c
store it to memory or whatever so the optimizer can't see past this line here
a = z + 1500
b = inttoptr a (note, equal to x)
load b

If you do the add as an unsigned integer, and overflow it, then you get 1000 as expected. If instead the optimizer did load(b) = load(z + 1500) = load(z, offset=1500) then it would trap.

Is that undefined behavior in C - is it not valid to do arbitrary intermediate math like that after converting a pointer to an integer, and before converting it back?

kripken · 2019-03-01T18:25:25Z

Still curious about that UB C question, but merging this PR.

tlively · 2019-03-01T19:18:31Z

Section 6.2.6.3, paragraphs 5 and 6 say

An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced
type, and might be a trap representation.

Any pointer type may be converted to an integer type. Except as previously specified, the result
is implementation-defined. If the result cannot be represented in the integer type, the behavior is
undefined. The result need not be in the range of values of any integer type.

Unfortunately I can't find anything that might be referenced by the "except as previously specified" phrase, but it is promising that it can explicitly be a trap representation.

After WebAssembly/binaryen#1924 Binaryen has a --low-memory-unused flag that we should use (and the --post-emscripten pass no longer has any part relevant to GLOBAL_BASE, that's all in that new flag). Better optimization of offsets thanks to that PR improves code size and enables a little more inlining etc., improving the metadce stats.

kripken and others added 22 commits February 23, 2019 13:31

wip [ci skip]

75129b8

wip [ci skip]

c22a3ef

draft [ci skip]

698bb0c

builds [ci skip]

dd4572c

test prep + a problem [ci skip]

022865d

comments [ci skip]

fd31f07

test prep [ci skip]

5b55a6a

test [ci skip]

b259621

comment [ci skip]

ca13d03

Merge remote-tracking branch 'origin/master' into offsets-low

78e2219

cleanup

17c133e

cleanup

e340cf4

cleanup

50f3bb2

wip OAC [ci skip]

1784b1f

more

4ca940b

wip [ci skip]

816896c

wip [ci skip]

b461722

use it in asm2wasm [ci skip]

b05bf09

fix

61563f3

Merge remote-tracking branch 'origin/master' into offsets-low

89f5815

build fix

a88169d

fixes

4eb9ae9

Merge remote-tracking branch 'origin/master' into offsets-low

39bcb03

kripken merged commit 689fe40 into master Mar 1, 2019

kripken deleted the offsets-low branch March 1, 2019 18:28

kripken mentioned this pull request Mar 1, 2019

Update Binaryen emscripten-core/emscripten#8225

Merged

kripken restored the offsets-low branch March 5, 2019 23:57

kripken deleted the offsets-low branch March 5, 2019 23:59

kripken restored the offsets-low branch March 5, 2019 23:59

kripken deleted the offsets-low branch March 5, 2019 23:59

TerrorJack mentioned this pull request Apr 1, 2020

Implement support for --low-memory-unused in binaryen optimizer tweag/asterius#532

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Consistently optimize small added constants into load/store offsets #1924

Consistently optimize small added constants into load/store offsets #1924

Uh oh!

kripken commented Feb 27, 2019

Uh oh!

dcodeIO commented Feb 28, 2019

Uh oh!

kripken commented Feb 28, 2019

Uh oh!

MaxGraey commented Feb 28, 2019 •

edited

Loading

Uh oh!

dcodeIO commented Feb 28, 2019 •

edited

Loading

Uh oh!

kripken commented Feb 28, 2019

Uh oh!

dcodeIO commented Feb 28, 2019

Uh oh!

binji commented Feb 28, 2019

Uh oh!

tlively commented Feb 28, 2019

Uh oh!

kripken commented Mar 1, 2019 •

edited

Loading

Uh oh!

kripken commented Mar 1, 2019

Uh oh!

tlively commented Mar 1, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Consistently optimize small added constants into load/store offsets #1924

Consistently optimize small added constants into load/store offsets #1924

Uh oh!

Conversation

kripken commented Feb 27, 2019

Uh oh!

dcodeIO commented Feb 28, 2019

Uh oh!

kripken commented Feb 28, 2019

Uh oh!

MaxGraey commented Feb 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dcodeIO commented Feb 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kripken commented Feb 28, 2019

Uh oh!

dcodeIO commented Feb 28, 2019

Uh oh!

binji commented Feb 28, 2019

Uh oh!

tlively commented Feb 28, 2019

Uh oh!

kripken commented Mar 1, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kripken commented Mar 1, 2019

Uh oh!

tlively commented Mar 1, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

MaxGraey commented Feb 28, 2019 •

edited

Loading

dcodeIO commented Feb 28, 2019 •

edited

Loading

kripken commented Mar 1, 2019 •

edited

Loading