Skip to content

Avoid new blocks in binary reading/writing#1165

Merged
kripken merged 4 commits intomasterfrom
slim
Sep 13, 2017
Merged

Avoid new blocks in binary reading/writing#1165
kripken merged 4 commits intomasterfrom
slim

Conversation

@kripken
Copy link
Copy Markdown
Member

@kripken kripken commented Sep 6, 2017

Background: wasm allows lists at the top of functions, in loop bodies, and in if arms. Binaryen IR only allows a list in a block (so passes only need to deal with lists in one place), and as a result we may need more blocks than wasm does in some cases.

Turns out we generated more than we needed, which is kind of silly since just reading and writing a binary repeatedly could lead to an unstoppable increase in size. This PR fixes that. One commit is for writing, one is for reading.

Comment thread src/wasm/wasm-binary.cpp Outdated
processExpressions();
size_t end = expressionStack.size();
if (start - end == 1) {
if (end - start == 1) {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

amusingly this was wrong all along, so the optimization never kicked in (if it did, it would have led to breakage, which is fixed in this PR - we need to call getBlock in places where a branch might occur, like the top of a function)

@kripken
Copy link
Copy Markdown
Member Author

kripken commented Sep 8, 2017

This has been fuzzed heavily and looks correct. Any concerns?

Comment thread src/wasm-binary.h Outdated

// Gets a potential list of instructions. This is not a block and cannot be
// branched to.
Expression* getList(WasmType type);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is confusing. There is no Expression that's a list, other than a block. GetMaybeBlock makes sense if only because it could return either a block or a single instruction.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess what this means is that it could return a single instruction or an implicit block (which cannot be branched to, and will not appear in the wasm output)?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, exactly - it's reading a "list context" in a wasm binary, so it could be a list of instructions, but we do know it can't be branched to.

Ideas for a better name? :) Maybe getExpressionList or getExpressionOrList?

Comment thread src/wasm-binary.h Outdated
// Gets a potential list of instructions. This is not a block and cannot be
// branched to.
Expression* getList(WasmType type);
// Gets a potential block. This may be branched to.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand what you mean by "potential" here. This always returns an actual block, no? Is the difference just that it may or may not be an implicit block (e.g. a function body or if arm?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see that's wrong. So the only difference from getList is that this could return a single instruction or an explicit block (that can be branched to).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Both can return a block or a single instruction if a block isn't needed. A block might be needed if we branch to it (in getBlock, but not getList where we assume no branches), or if we have more than one instruction.

@kripken
Copy link
Copy Markdown
Member Author

kripken commented Sep 11, 2017

How about getUnnamedBlockOrSingleton() for getting either a singleton or a block, where the block can't be branched to so we create it without a name, and getBlockOrSingleton() for the case where we can branch to it?

@dschuff
Copy link
Copy Markdown
Member

dschuff commented Sep 12, 2017

OK, I think I understand better. In both cases you want an expression or list of expressions. In one case (function top level, explicit block, if arm), you need it to be targetable by a branch. In the other case, it must not be targetable because its container is the one that's targetable (loop). In both cases you want to just use a bare expression if there are in fact no branches that target it.

Given that, I have 2 questions:

  1. Why does the logic have to be duplicated rather than shared (and have a parameter or something)
  2. Why are if-arms in the first category instead of the second? i.e. instead of containing blocks with names, why aren't the arms targetable the way loops are? Maybe because the jump is forward instead of back?

@kripken
Copy link
Copy Markdown
Member Author

kripken commented Sep 12, 2017

Which logic do you mean is duplicated? if you mean the identical lines in getList and getBlock, it's just a few, and it's simpler that way I think. Otherwise getList could get a parameter, but it would be kind of messy to check it multiple times.

About loops, yeah, the issue is that their branch is backwards. wasm no longer lets them be broken out of - they only have a top label, in other words. getBlock optionally creates a block, which is only a forwards branch. So in loops we getList, and handle the branch to the loop top directly.

@dschuff
Copy link
Copy Markdown
Member

dschuff commented Sep 12, 2017

ok. the code seems ok; really getList is really only for loops (everything else can be branched to) so maybe just say in the comment directly that it's for loops because its branch target is on the loop instead of the block.

Or actually, even better: let's just put that logic directly in visitLoop, and then maybe getNextExpressionOrBlock or getBlockOrSingleton or something like that for getBlock

@kripken
Copy link
Copy Markdown
Member Author

kripken commented Sep 12, 2017

I like that, nice. Updated to getBlockOrSingleton(), and the loop-specific getList logic is now inside the loop code, and that name is gone.

@kripken kripken merged commit 048bcad into master Sep 13, 2017
@kripken kripken deleted the slim branch September 13, 2017 00:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants