Skip to content
This repository has been archived by the owner on Nov 3, 2021. It is now read-only.

Global function references require two-pass analysis #76

Closed
ecmziegler opened this issue Jan 29, 2020 · 21 comments · Fixed by #90
Closed

Global function references require two-pass analysis #76

ecmziegler opened this issue Jan 29, 2020 · 21 comments · Fixed by #90

Comments

@ecmziegler
Copy link

Following issue #31, the spec was recently changed to disallow function references in global definitions or function bodies that are not explicitly listed in the element section.

Because the element section occurs in the binary right before the code section and therefore after the all other header sections in Wasm modules, this means that the validation of function references needs to be deferred until the elements section is parsed.

Is this an intended complication or would it help allowing arbitrary function references in the header and extend the allowed references in the code section to any function references in any of the header sections?

@ecmziegler
Copy link
Author

ecmziegler commented Jan 29, 2020

After implementing the spec change in V8, the deferred checks of globals caused multiple issues:

  • Errors are not necessarily reported when they occur and references need to be traced in case they are not declared later.
  • It's not clear when to report these errors as the element and following sections might not even be present.
  • If other errors occurred in the meantime, we might not even get to report the reference error even though it de-facto happened earlier.
  • If there is an error in the element section, it might not even be possible to validate the previous global section at all.

Besides the previously suggested solution of implicitly making any function reference in the header referenceable, one could also think of making this an instantiation error rather than a validation/compilation error or introducing an optional dedicated section that needs to occur before any possible function reference (i.e. before the global section).

@binji
Copy link
Member

binji commented Jan 29, 2020

the spec was recently changed to disallow function references in global definitions

I don't believe this is the intended behavior. The reason for declaring function references in the element segment is solely to prevent having to parse the code section to determine which function references might be used (i.e. avoiding undeclared circular references in the code section). The global section has a similar issue (globals referencing globals) which we punted on by only allowing globals to reference imported globals (which occur in a previous section). But there's no concern about globals referencing functions, since the section occurs later and we already assume that the section must be fully parsed to validate the code section.

Though it does look like the spec is incorrect here: globals can use the ref.func instruction, which is then validated using the context C', which contains just globals from C.

As defined, it can't ever validate a ref.func instruction, since that requires C.funcs[x] to exist and x ∈ C.refs. I think the fix here is to include all functions in C'.funcs and C'.refs.

@rossberg
Copy link
Member

rossberg commented Feb 3, 2020

Actually, the requirement to declare functions used in global initialisers was intentional. My rationale is that at some point we might also have ref.table, ref.memory, ref.global, ref.exception etc, and it's desirable to not get totally ad-hoc about which have to be declared when (some sections are before, some after (exns), some part of the global section). So the general rule extrapolated here is the minimal one: every use of a ref.* has to be declared, except those in an element segment itself.

I see that this appears somewhat annoying with the current structure, but eventually we will have to bite some of this annoyance anyway for some of the cases. (In retrospect, we should have split the global initialisers into a separate section the same way we did for functions and their bodies. That would avoid this issue. Too late, unfortunately.)

But you are totally right about the context, this needs to change for the proposal to work.

@rossberg
Copy link
Member

rossberg commented Feb 3, 2020

@ecmziegler, the natural position to report the error for would be the ref.func instruction -- the way I thought this could be implemented is a simple work list of delayed-checked ref.func instructions, given by their func index and stream offset. Wouldn't that work?

I wouldn't worry about the multiple error case. Wasm makes no guarantee whatsoever about which error is reported when a binary contains multiple ones. I remember from some past discussion that implementations already differ on that in other cases. For the test suite, we tried to make sure that no negative test contains multiple errors.

@ecmziegler
Copy link
Author

@rossberg That's basically how we implemented it in V8 for the time being in order to be in line with the spec. It's just not our preferred way of dealing with errors and I was wondering if that's intended design as I have the feeling that there might be cleaner ways of dealing with it (see suggestions above). At least now I have the confirmation that this choice was made intentionally.

@binji
Copy link
Member

binji commented Feb 3, 2020

Actually, the requirement to declare functions used in global initialisers was intentional.

I see what you mean, good point.

(In retrospect, we should have split the global initialisers into a separate section the same way we did for functions and their bodies. That would avoid this issue. Too late, unfortunately.)

Maybe not too late. We could have a special init expression for globals (just the end instruction, say), which indicates that the expression is defined later, in a new section that occurs after the element section. Then we could require that any init expression that uses ref.func, rec.table, etc. be defined this way. I'm not sure whether it's worth doing at this point, however.

@Horcrux7
Copy link

Horcrux7 commented Feb 3, 2020

We could have a special init expression for globals

Alternative there can also be a new wasm version number.

@rossberg
Copy link
Member

rossberg commented Feb 4, 2020

Maybe not too late.

True. I can bring it up at the meeting, to see what others think.

@gahaas
Copy link
Contributor

gahaas commented Feb 4, 2020

We could also introduce a declaration section. It is not obvious why function declarations are in the element section.

@rossberg
Copy link
Member

rossberg commented Feb 4, 2020

@gahaas, yes, that was one of the other options we discussed (and which I actually prototyped), see my comment and following on issue #31.

I agree this would be cleaner in principle, but there is a catch: element sections already can form function refs without declaration. So there would need to be an exception for them anyway. And once you have that, you can just as well "reuse" this exception as a declaration mechanism, avoiding the introduction of new machinery.

@lars-t-hansen
Copy link
Contributor

If we want reftypes to go to phase four at the next meeting we'll need to resolve this.

@lukewagner
Copy link
Member

An alternative regular scheme would be to say that all ref.* preceding the code section declare that referenced definition as being aliased. Thus, the only exception are the ref.* in the code section. This seems to match the underlying motivation which is that we want to know all the aliased definitions before starting parallel/streaming compilation of the code section.

@rossberg
Copy link
Member

From my perspective, there are two sufficiently (semantically) clean solutions:

  1. The status quo (swallowing the deferred checking).
  2. Changing the binary format to put (new) global initialisers in their own section after the elem section (avoiding the deferred checking).

I'd be fine doing (2), as suggested above by @binji: we'd ban any non-MVP initialiser from the global section and introduce a matching initialiser section that can contain richer ones. Thoughts?

@lukewagner, I'd rather not make the semantics arbitrarily depend on the low-level ordering details of the binary format. And I don't think that solution will scale, see my earlier comment. Imagine for example that we introduce ref.global.

@lukewagner
Copy link
Member

@rossberg Why would it not scale? The regular rule would be: the only place any ref.* has to be declared is within the code section; that applies just as well to ref.global as ref.func, unless I'm missing something. If you think about why we have this already-declared rule in the first place, it is very much a binary encoding detail "bubbling up" into the semantics: the code section is special. Thus, I don't think confining the restriction to only the code section bubbles up anything that hasn't already been bubbled up.

@rossberg
Copy link
Member

rossberg commented Mar 25, 2020

@lukewagner, imagine, say, a hypothetical future with references to globals, and a module like

(type $gt (global (mut i32)))
(global $ig (mut i32) (...))
...
(global $rg (ref $gt) (ref.global $ig))

Here, the global $ig is referenced from within the same section, but the use of that reference isn't yet known at the point that $ig is processed, created and initialised. Should we expect that to be any less of a potential problem than with a function reference?

@eqrion
Copy link
Contributor

eqrion commented Mar 25, 2020

@lukewagner, imagine, say, a hypothetical future with references to globals, and a module like

(type $gt (global (mut i32)))
(global $ig (mut i32) (...))
...
(global $rg (ref $gt) (ref.global $ig))

Here, the global $ig is referenced from within the same section, but that reference isn't yet known at the point that $ig is processed, created and initialised. Should we expect that to be any less of a potential problem than with a function object?

IIRC, the motivating example with ref.func is that it can determine whether a function is indirectly callable which affects the kind of function prologue you may wish to generate. And so we would like to know all the targets of ref.func before we start streaming compile functions.

I don't think that general problem would be an issue for globals. The reason is that we create the actual global objects during instantiation, not when we're streaming compiling. And as far as I can tell, the only thing that 'ref.global' would influence is whether we would need to generate a proper object for the target. And this decision is already influenced by the export section, which comes after the global section.

That being said, I'm not opposed to the solution of splitting off global initializers from global declarations. I fear though that as that involves a new section, it would be significant work that could delay reference-types from progressing to phase 4 again.

@lukewagner
Copy link
Member

@rossberg What I proposed is that, in your example, since the ref.global is outside the code section, it doesn't need to have been declared. Implementation-wise, everything before the code section is effectively one big declaration for the code section, so there's no practical benefit in your example requiring the ref.global to have already been declared.

@rossberg
Copy link
Member

Globals (or tables, memories) that may escape to the host generally require (statically) allocating them differently. For example, an immutable global not referenced may easily be optimised away entirely, and when streaming-compiling, it's useful to know that at the point of it's value's definition. I suppose you are saying that we don't care about streaming up to the code section, but that's a rather selective assumption. If possible, it would be nice to have a more coherent approach across different kinds of entities.

But of course, for some of the other kinds, that wouldn't easily work as we approach it now, because the element section comes after most of the others.

Sigh, so I think what we have at the moment is a bit incoherent design. Not that what we're discussing is gonna make it better or less warty...

Surely unpopular question :) : can we drop forward declarations? Now that references are implemented in a few places, how many engines actually rely on those declarations? I believe V8 doesn't, because it implemented ref.func before the declaration semantics was even in the proposal. I assume SpiderMonkey does? What about others?

@lukewagner
Copy link
Member

lukewagner commented Mar 26, 2020

@rossberg If one takes the purpose of forward declarations (func or other) solely as providing an immutable environment [edit to add: in which to compile the code section] defining the aliasability of each definition, then I think what we have is coherent. The global variable example you gave fits this definition just as much as functions.

Setting aside functions, your last comment actually gives an important example of how we do currently (today in SpiderMonkey) depend on having a priori aliasing info for globals and thus why we need forward declarations for any ref.global in the code section. A similar "in-line" vs "pointer" representation choice theoretically applies to tables' pointer+length values -- the only reason we don't depend on aliasing info currently is that we always inline and, assuming grows are rare, update all instances on grow (however this has a O(n2) pathological case that may cause us to take the conservative route in the future that depends on forward declarations of ref.table).

Thus, I think there is a strong streaming compilation argument for maintaining our a priori aliasing info for all definitions before the code section.

@rossberg
Copy link
Member

If one takes the purpose of forward declarations (func or other) solely as providing an immutable environment defining the aliasability of each definition, then I think what we have is coherent.

Well, no, not if we drop the requirement to declare it for uses outside functions. That's why I would want to keep that requirement. And to at least minimise dependency on assumptions about implementations.

@eqrion
Copy link
Contributor

eqrion commented Mar 27, 2020

It seems like this may be a good candidate to discuss in the next CG meeting. I'll open a PR for an agenda item today.

moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue May 13, 2020
…c targets. r=lth

The CG agreed to resolve reference-types#76 by treating all ocurrences of ref.func
in the 'module environment' as declaring a valid ref.func target for the code
section.

This means we need to allow references to be forward declared via globals,
exports, and the start section.

[1] WebAssembly/reference-types#76

Differential Revision: https://phabricator.services.mozilla.com/D74969
xeonchen pushed a commit to xeonchen/gecko that referenced this issue May 13, 2020
…c targets. r=lth

The CG agreed to resolve reference-types#76 by treating all ocurrences of ref.func
in the 'module environment' as declaring a valid ref.func target for the code
section.

This means we need to allow references to be forward declared via globals,
exports, and the start section.

[1] WebAssembly/reference-types#76

Differential Revision: https://phabricator.services.mozilla.com/D74969
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants