Skip to content

Conversation

@kripken
Copy link
Member

@kripken kripken commented Feb 27, 2019

See #1919 - we did not do this consistently before.

This adds a lowMemoryUnused option to PassOptions. It can be passed on the commandline with --low-memory-unused. If enabled, we run the new optimize-added-constants pass, which does the real work here, replacing older code in post-emscripten.

Aside from running at the proper time (unlike the old pass, see #1919), this also has a -propagate mode, which can do stuff like this:

y = x + 10
[..]
load(y)
[..]
load(y)
=>
y = x + 10
[..]
load(x, offset=10)
[..]
load(x, offset=10)

That is, it can propagate such offsets to the loads/stores. This pattern is common in big interpreter loops, where the pointers are offsets into a big struct of state.

The pass does this propagation by using a new feature of LocalGraph, which can verify which locals are in SSA mode. Binaryen IR is not SSA (intentionally, since it's a later IR), but if a local only has a single set for all gets, that means that local is in such a state, and can be optimized. The tricky thing is that all locals are initialized to zero, so there are at minimum two sets. But if we verify that the real set dominates all the gets, then the zero initialization cannot reach them, and we are safe.

This PR also makes safe-heap aware of lowMemoryUnused. If so, we check for not just an access of 0, but the range 0-1023.

This makes zlib 5% faster, with either the wasm backend or asm2wasm. It also makes it 0.5% smaller. Also helps sqlite (1.5% faster) and lua (1% faster)

@dcodeIO
Copy link
Contributor

dcodeIO commented Feb 28, 2019

I wonder if this pass would make sense in scenarios where the low unused memory region is significantly smaller? In AssemblyScript it's just 8 bytes for example, essentially only leaving some space for null / asm.js reinterpretations.

@kripken
Copy link
Member Author

kripken commented Feb 28, 2019

Yeah, we may want to make the size of the region customizable. 1024 is just one possibility.

For AssemblyScript in particular, perhaps you don't need the pass, though - do you already use load/store offests in all possible places? Or are there places binaryen could optimize for you?

@MaxGraey
Copy link
Contributor

MaxGraey commented Feb 28, 2019

@kripken We waiting this for a long time=)
AssemblyScript/assemblyscript#32

@dcodeIO
Copy link
Contributor

dcodeIO commented Feb 28, 2019

It's using immediate offsets where straight-forward, for example when accessing class instance fields, but there are some occasions where it isn't that easy, for example when doing indexed array accesses (due to the fact that these are actual operator overloaded functions where constant offset components become mutable locals, in turn losing "precomputability"). Can also imagine that, after other optimizations, even our "proper" loads might be condensable even more.

@kripken
Copy link
Member Author

kripken commented Feb 28, 2019

Interesting. Is there ever a case where it is not valid to use a load/store offset in AssemblyScript? In C, the problem is that a program may assume a pointer may overflow, and that that is valid (!). But another language may just say that's undefined behavior, and that it is always safe to do

load(x + constant)  =>  load(x, offset=constant)

@dcodeIO
Copy link
Contributor

dcodeIO commented Feb 28, 2019

In AssemblyScript's (standard library) case that's always undefined behaviour, yeah. A user who's using load<T> directly might assume something else, of course, but to me it appears that we should rather document that than miss out on the optimizations.

@binji
Copy link
Member

binji commented Feb 28, 2019

In C, the problem is that a program may assume a pointer may overflow, and that that is valid (!).

Are you sure? That seems like it would be undefined behavior to me.

@tlively
Copy link
Member

tlively commented Feb 28, 2019

Section 6.5.6, paragraph 8 of the C17 spec says

If both the pointer operand and the result point
to elements of the same array object, or one past the last element of the array object, the evaluation
shall not produce an overflow; otherwise, the behavior is undefined

So I think this is indeed undefined behavior. However, we might still want to support code that depends on it?

@kripken
Copy link
Member Author

kripken commented Mar 1, 2019

Yeah, this may well be undefined behavior. But we've seen real-world code that depends on it, weirdly...

I'm not sure it's undefined by that paragraph, though. What I think happens is this:

x = some pointer, say 1000
y = ptrtoint x
z = y - 1500 = -500 or rather 0xfffffe0c
store it to memory or whatever so the optimizer can't see past this line here
a = z + 1500
b = inttoptr a (note, equal to x)
load b

If you do the add as an unsigned integer, and overflow it, then you get 1000 as expected. If instead the optimizer did load(b) = load(z + 1500) = load(z, offset=1500) then it would trap.

Is that undefined behavior in C - is it not valid to do arbitrary intermediate math like that after converting a pointer to an integer, and before converting it back?

@kripken
Copy link
Member Author

kripken commented Mar 1, 2019

Still curious about that UB C question, but merging this PR.

@kripken kripken merged commit 689fe40 into master Mar 1, 2019
@kripken kripken deleted the offsets-low branch March 1, 2019 18:28
@tlively
Copy link
Member

tlively commented Mar 1, 2019

Section 6.2.6.3, paragraphs 5 and 6 say

An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced
type, and might be a trap representation.

Any pointer type may be converted to an integer type. Except as previously specified, the result
is implementation-defined. If the result cannot be represented in the integer type, the behavior is
undefined. The result need not be in the range of values of any integer type.

Unfortunately I can't find anything that might be referenced by the "except as previously specified" phrase, but it is promising that it can explicitly be a trap representation.

kripken added a commit to emscripten-core/emscripten that referenced this pull request Mar 1, 2019
After WebAssembly/binaryen#1924 Binaryen has a --low-memory-unused flag that we should use (and the --post-emscripten pass no longer has any part relevant to GLOBAL_BASE, that's all in that new flag).

Better optimization of offsets thanks to that PR improves code size and enables a little more inlining etc., improving the metadce stats.
@kripken kripken restored the offsets-low branch March 5, 2019 23:57
@kripken kripken deleted the offsets-low branch March 5, 2019 23:59
@kripken kripken restored the offsets-low branch March 5, 2019 23:59
@kripken kripken deleted the offsets-low branch March 5, 2019 23:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants