Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create an explicit symbol table #38

Closed
sbc100 opened this issue Jan 17, 2018 · 16 comments
Closed

Create an explicit symbol table #38

sbc100 opened this issue Jan 17, 2018 · 16 comments

Comments

@sbc100
Copy link
Member

sbc100 commented Jan 17, 2018

In the current spec and llvm implementation we are modeling the symbol table based on imports and exports of functions and globals.

While this has worked fairly well it has caused us a few headaches. Specifically around weak symbols which we currently model as both and import and an export.

Also, the modeling of data symbols as wasm globals doesn't make much sense. We are using wasm globals here only to link names to addresses (not for use in set_global/get_global instructions which is their purpose).

We then add extra metadata via the "syminfo" subsection of the "linking" section.

Rather than trying to continue to build an implicit symbol table from imports and exports I propose that we bite the bullet and just be explicit about it. This would be things clearer and allow us to completely remove the imports and exports of globals which we currently use to model data addresses.

My proposal would be to add a new "symtab" section (or subsection?) that would replace the current "syminfo" subsection. Every symbol in the object would require an entry in this table. For data symbols the name would be included directly in the table. For function symbols the name could refer to function import or export so avoid duplicating the name. The symbol index would then be explicit and all relocations would then refer to symbols in this table. The exception would be type relocations for which there is no name or symbol yet.. but we could consider making symbols for types too.

@NWilson
Copy link
Contributor

NWilson commented Jan 17, 2018

Changing the scheme where Symbols are assembled from imports/exports would be quite a major refactor.

After my changes, every Symbol is modelled with exactly one import or export, which makes it fairly neat.

Overall though I am fairly positive about the idea of a "symtab" section.

Pros:

  • You're right it would save on globals, that's not beautiful modelling
  • Avoids separating Symbol metadata from Symbol creation
  • Would allow creating symbols for things that can't be modelled with a global. (Currently, the __stack_pointer is a mutable global and so it can't be exported. This is a pretty minor factor though - and we need the Threading fixes anyway so that mutable globals can be exported, or JavaScript functions can't push things to the stack!)

Cons:

  • If we want the object file to closely reflect the final output wasm, then we'll still need imports/exports for the function Symbols, so now we'd be modelling the function Symbols in two places. Ditto for defined globals in fact.
  • The final Wasm file does have an import/export for all (non-hidden/local) Symbols. We're still needing to keep that one-to-one correspondence going, for the sake of the JavaScript calling code. So why should LLD communicate its symbols to JavaScript in a different way to how Clang communicates its symbols to LLD?

I could be convinced either way. Having just written a series of patches it would nice to see them merged, since I feel they are useful incremental improvements to the current situation. I can see some advantages to the new idea though.

Are there any direct benefits to the "symtab" idea? I can't see that it actually enables anything new/better. It wouldn't even add to robustness, it's just a cleaner way of representing the same data in the Wasm object, by moving it one-to-one from one place to another in the file. (This isn't a disagreement, just to check I haven't missed a benefit.)

@NWilson
Copy link
Contributor

NWilson commented Jan 18, 2018

Another thought: we may not have much time remaining to make this change easily! I've noticed that binaryen and the Rust toolchain have experimental support for using the current linking metadata. It's not simply an internal contract between Clang and LLD, so doing a big change to the format will cause problems down the line for dependent projects (eg Emscripten), and possibly require coordination with them?

(The intention with my "wasm index space vs symbol index space" changes was to preserve indexes; the indexes will only change for function aliases, and other indexes will all stay the same. So disruption to downstream tools should be rather minimal.)

@sbc100
Copy link
Member Author

sbc100 commented Jan 18, 2018

Its actually the fact that the proposed symbol index space is almost by not quite the same as the existing index space that I find a little disconcerting. If we are going to use different index space I'd rather be explicit about it and not have it match or be mistaken in any way for the existing index space. This is one reason for proposing that clean break and explicit symbol table.

You are right about the format though. We do want to minimize churn and breaking changes. I would be food to reach out to our major consumers so we can coordinate with them or at least let them know when we place to make a breaking change. My hope is that we will soon reach a point were we will commit to more stability. Do you have contact with any of the non-emscripten external users? We should try to make sure they join the conversation, or are at least monitoring this repo.

@pepyakin
Copy link

@NWilson Hm, not sure what do you mean by rust having experimental support for using the current linking metadata
AFAIK, Rust doesn't do any linking for wasm directly.

@sbc100
Copy link
Member Author

sbc100 commented Jan 18, 2018

Hi @pepyakin, do you work on the rust wasm stuff? I've been meaning to dig into it and see how it currently works. Even if rust is currently using the linking metadata I imagine we will see use cases where the intermediate object files are being read by tools other than lld. This is why it have the Linking.md "spec" so that others can build on it.

I did hear that the rust people (you?) wrote some kind of compressor (GC) for wam files, and I'm hoping that functionality will be subsumed soon by lld's gc-sections (which will be the default I hope).

@kripken
Copy link
Member

kripken commented Jan 18, 2018

@sbc100 I think you might mean https://github.com/alexcrichton/wasm-gc ? cc @alexcrichton

@pepyakin
Copy link

do you work on the rust wasm stuff?

I'd say I'm just hanging around rust and wasm stuff :)

I did hear that the rust people (you?) wrote some kind of compressor (GC) for wam files, and I'm hoping that functionality will be subsumed soon by lld's gc-sections (which will be the default I hope).

You probably mean https://github.com/alexcrichton/wasm-gc

@alexcrichton
Copy link
Collaborator

@sbc100

I've been meaning to dig into it and see how it currently works.

Ah if you're curious feel free to ask any questions! At a high level though it currently looks like this:

  • Using LLVM 4.0 and the wasm32-unknown-unknown target, we create a giant LLVM module of all Rust code
  • This giant module is them emitted as assembly
  • We then use wasm-as from binaryen to translate this to a *.wasm

and that's it! Lots of hacks, lots of room for improvement. Very much want to use lld instead and avoid the need for wasm-gc at all, that's just a stopgap solution until we get lld!

@NWilson
Copy link
Contributor

NWilson commented Jan 18, 2018

Ok, I'm just spreading FUD then! Sounds like there aren't any current users of the Linking Data except LLD.

I'm not too worried about the compat implications of my changes - after all, you merged the alt index stuff quite rapidly, and that's just as damaging to any people who might be implementing the currently documenting model, since any downstream consumers again need to match the change you made in LLD.

@NWilson
Copy link
Contributor

NWilson commented Jan 18, 2018

I've had another fresh thought on this...

What about shared objects (".so" style Wasm)? I haven't followed all the latest planning, but it seems to me like relocatable Wasm files currently aren't far off from working like that: we have a Wasm import/export for each function symbol, which makes the modules quite usable from another one. And, the globals (that you don't like above) represent the relocatable memory addresses - you said "it's poor modelling to use globals for constants which are never read using a get_global instruction" - but perhaps for code compiled with -fPIC the globals would actually be read from? Maybe that's very different to the actual Wasm shared object plans (I'll read up on it sometime!), but the more is modelling using native Wasm objects, the more chance there is that the browser can leverage it. So, maybe the use of globals isn't so bad after all.

It does bother me vaguely to take a useful idea like "Symbol", which runtime code (dynamic linker) might want to interact with, and move that to be modelled by a custom section that's not so easily available at runtime.

For functions at least, using imports+exports is elegant, and that's how the symbols are exposed to JS, so it is nice to use the matching idea for sharing the same symbols between Clang and LLD. After all, LLD's symbols (that it exports) are precisely the same symbols that Clang is trying to export; the frontend and backend don't have an particularly different notion "callable symbol".

(Finally - if you want a clean break for compatibility reasons, changing the section name to "linking-v2" would take five mins rather than five days for a big refactor. I'm slightly joking, but if that is important, it could be done in an easier way.)

I really hope I'm not becoming attached to the way things just because of the tunnel vision of working on it! If a decision is made to create an explicit symtab section, I'd be happy to help implement it or test it.

@dschuff
Copy link
Member

dschuff commented Jan 19, 2018

Dynamic libraries are definitely something we want to support. We haven't done much design work on that at all yet. And as you noted, it would really benefit from a more stable ABI.

I'm not really worried about breaking ABIs just yet, since we are still getting everything working (and also I think the static linking ABI is a much lower bar for breaks than e.g. the C ABI or dyamic linking ABI), but it will be more important as we get more users and more things work, which we are very close to. Hopefully we've put enough public caveats and our current users are following us closely enough that they won't be too blindsided by changes.

@NWilson
Copy link
Contributor

NWilson commented Jan 20, 2018

After some chat in IRC, I'm convinced!

I'm working on a spec of what the symtab section might look like; see https://github.com/NWilson/tool-conventions/pull/1/files for the ideas so far.

Then my various incremental patches might be merged, and Sam or I will come up with a prototype of the new symtab section.

@tbfleming
Copy link

Sounds like there aren't any current users of the Linking Data except LLD.

I have code in cib which deals heavily in this. Right now it's also tied to D41472. I knew major changes were coming when I wrote it, so I plan to update it once all this settles down.

@sbc100
Copy link
Member Author

sbc100 commented Jan 20, 2018

What is "cib"? If its open source do you have a link to the code. I'm thinking it might be good idea to list all the current users/implementations in Linking.md so we can be responsible about breaking changes.

@tbfleming
Copy link

Clang In Browser. I haven't picked a license yet; I'm leaning toward the liberal ones. https://github.com/tbfleming/cib

It uses the linking info in 2 places right now:

  • A linker to generate the runtime library. (mutiple .wasm files in, single .wasm file out)
  • A loader which hooks user code into the runtime

@NWilson
Copy link
Contributor

NWilson commented Feb 28, 2018

Can close this, I think.

@sbc100 sbc100 closed this as completed Feb 28, 2018
jyknight pushed a commit to jyknight/llvm-monorepo that referenced this issue Mar 1, 2018
This change modified lld to in response the llvm change which
moved to a more explicit symbol table in the object format.

Based on patches by Nicholas Wilson:
 1. https://reviews.llvm.org/D41955
 2. https://reviews.llvm.org/D42585

The primary difference that we see in the test output is that
for relocatable (-r) output we now have symbol table which
replaces exports/imports and globals.

See: WebAssembly/tool-conventions#38
Differential Revision: https://reviews.llvm.org/D43264

llvm-svn=325861
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants