-
Couldn't load subscription status.
- Fork 75
Add text format specification for Linking.md #258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
392a958
51db201
e505cb0
e314caf
c741bc4
ea16ade
a5f894e
d7ecb30
79c827e
9895551
c5270f7
1e66962
422f2c0
c4be34b
f01ca3a
41ddb22
6fab7ca
fd65da7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -290,19 +290,20 @@ where a `syminfo` is encoded as: | |
| | | | `5 / SYMTAB_TABLE` | | ||
| | flags | `varuint32` | a bitfield containing flags for this symbol | | ||
|
|
||
| For functions, globals, events and tables, we reference an existing Wasm object, which | ||
| is either an import or a defined function/global/event/table (recall that the operand of a | ||
| Wasm `call` instruction uses an index space consisting of the function imports | ||
| followed by the defined functions, and similarly `get_global` for global imports | ||
| and definitions and `throw` for event imports and definitions). | ||
| For functions, globals, events and tables, we reference an existing WebAssembly | ||
| entity, which is either an import or a defined function/global/event/table | ||
| (recall that the operand of a Wasm `call` instruction uses an index space | ||
| consisting of the function imports followed by the defined functions, and | ||
| similarly `get_global` for global imports and definitions and `throw` for event | ||
| imports and definitions). | ||
|
|
||
| If a symbols refers to an import, and the | ||
| `WASM_SYM_EXPLICIT_NAME` flag is not set, then the name is taken from the | ||
| import; otherwise the `syminfo` specifies the symbol's name. | ||
|
|
||
| | Field | Type | Description | | ||
| | ------------ | -------------- | ------------------------------------------- | | ||
| | index | `varuint32` | the index of the Wasm object corresponding to the symbol, which references an import if and only if the `WASM_SYM_UNDEFINED` flag is set | | ||
| | index | `varuint32` | the index of the WebAssembly entity corresponding to the symbol, which references an import if and only if the `WASM_SYM_UNDEFINED` flag is set | | ||
| | name_len | `varuint32` ? | the optional length of `name_data` in bytes, omitted if `index` references an import | | ||
| | name_data | `bytes` ? | UTF-8 encoding of the symbol name, omitted if `index` references an import | | ||
|
|
||
|
|
@@ -734,3 +735,288 @@ necessary for referencing such segments (e.g. in `data.drop` or `memory.init` | |
| instruction) do not yet exist. | ||
| - There is currently no support for table element segments, either active or | ||
| passive. | ||
|
|
||
| # Text format | ||
|
|
||
feedab1e marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| The text format for linking metadata is intended for WAT consumers that wish to | ||
| emit relocatable object files, and WAT producers wish to emit human-readable | ||
| relocation metadata for later creation of a relocatable object file. | ||
|
|
||
| ## Relocations | ||
|
|
||
| Relocations are represented as WebAssembly annotations of the form | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same here? Should we just use |
||
| ```wat | ||
| (@reloc <format> <method> <modifier> <symbol-reference> <addend>) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Personally for syntax like this I like to try to avoid "extra layers of indirection" of a sort. Here one layer of indirection is the set of relocations themselves (e.g. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, I specifically decided against that since then it wouldn't be possible to abbreviate the relocation via that predefinition mechanism (so, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is also an issue of Rectifying that error at the source would require me to patch LLVM in sync with this change, so like the other issue with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you have a better idea for what to call There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (By the way the history here is that prior to mulit-table there was only one table, so There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe R_WASM_TABLE_OFFSET_? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd lean towards something like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (coincidentally this would align well with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The idea of just There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, in terms of renaming things, in theory that can be done at any time right? The binary format of preexisting relocations isn't going to change. While work is "created" in the sense that LLVM should eventually update that's no breaking change in the sense that it's impossible to rename things, right? Given that, I personally like @feedab1e's idea of repurposing |
||
| ``` | ||
|
|
||
| - `format` determines the resulting format of a relocation | ||
|
|
||
| |`<format>`| corresponding relocation constants | interpretation | | ||
| |----------|------------------------------------|---------------------| | ||
| |`i32` | `R_WASM_*_I32` | 4-byte [uint32] | | ||
| |`i64` | `R_WASM_*_I64` | 8-byte [uint64] | | ||
| |`leb` | `R_WASM_*_LEB` | 5-byte [varuint32] | | ||
| |`sleb` | `R_WASM_*_SLEB` | 5-byte [varint32] | | ||
| |`leb64` | `R_WASM_*_LEB64` | 10-byte [varuint64] | | ||
| |`sleb64` | `R_WASM_*_SLEB64` | 10-byte [varint64] | | ||
|
|
||
| - `method` describes the type of relocation, so what kind of symbol we are relocating against and how to interpret that symbol. | ||
|
|
||
| | `<method>` | symbol kind | corresponding relocation constants | interpretation | | ||
| |--------------|-------------|------------------------------------|-----------------------------------| | ||
| | `tag` | event* | `R_WASM_EVENT_INDEX_*` | Final WebAssembly event index | | ||
| | `table` | table* | `R_WASM_TABLE_NUMBER_*` | Final WebAssembly table index (index of a table, not into one) | | ||
| | `global` | global* | `R_WASM_GLOBAL_INDEX_*` | Final WebAssembly global index | | ||
| | `func` | function* | `R_WASM_FUNCTION_INDEX_*` | Final WebAssembly function index | | ||
| | `functable` | function | `R_WASM_TABLE_INDEX_*` | Index into the dynamic function table, used for taking address of functions | | ||
| | `functext` | function | `R_WASM_FUNCTION_OFFSET` | Offset into the function body from the start of the function | | ||
| | `customtext` | section | `R_WASM_SECTION_OFFSET` | Offset into a custom section | | ||
| | `data` | data | `R_WASM_MEMORY_ADDR_*` | WebAssembly linear memory address | | ||
|
|
||
| Symbol kinds marked with `*` are considered *primary*. | ||
|
|
||
| - `modifier` describes the additional attributes that a relocation might have. | ||
|
|
||
| | `<modifier>` | corresponding relocation constants | interpretation | | ||
| |--------------|---------------------------------------|-------------------| | ||
| | nothing | nothing | Normal relocation | | ||
| | `pic` | `R_WASM_*_LOCREL_*`, `R_WASM_*_REL_*` | Address relative to `env.__memory_base` or `env.__table_base`, used for dynamic linking | | ||
| | `tls` | `R_WASM_*_TLS*` | Address relative to `env.__tls_base`, used for thread-local storage | | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there any reason not to reflect the entire list of relocation types like they are listed in the binary format and/or in llvm: https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/BinaryFormat/WasmRelocs.def i.e. why create this new concept of a base type + a modifier that doesn't exist elsewhere yet? Why not just use Maybe this new method/format/modifier concept could be added more globally later once the initial version of the text format is added? But for v1 it seems like it would make sense to simply mirror the existing binary format enum. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This was covered extensively in #258 (comment), and @alexcrichton expressed support for it here, but in short, that way there wouldn't be an option to elide parts of the relocation annotation (i.e. defaulting and predefinig wouldn't work), so all relocations would be incredibly verbose (for example, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I don't see how specifying the full relocation type (e.g using This seem like two orthogonal decisions, but I get that I must be missing something:
I'm also not sure that reducing verbosity needs to be the highest priority since the plan is for this format to be mostly machine read and machine written, right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Apart form fully elidable relocations, other types of relocations exist, like in memory (
Well, it needs to be human-readable, too, since it's a text format and humans are expected to read that too, like they usually read assembly, and likewise human-writable. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The relocation type names are not indented to be LLVM specific. The list of 20 relocation types, along with their ffull names, are listed above in this very document. This is designed to mirror the ELF relocation types that are defined in the ELF header and not specific to either LLVM or GCC but are using in both place. I think it might be a good idea to reflect this precisely in text for, so we can avoid having two different ways to specify things. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Personally I don't think There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Sure, but those names currently still don't matter much and can be changed, if we require those names there they suddenly start to matter and can no longer be changed.
These aren't really two ways yet, since the names in relocation types don't really matter currently, so the only "stable" way for naming relocations would be in the text format. However, stabilizing those names as fused now would be harmful, since then for v2 when I am to reintroduce the composite names, this "two ways to specify the same thing" argument would become very real, dooming the format to verbose relocations forever. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I'm OK with eliding relocations, if and only if we have a top level annotation. I'd also be OK without a top level annotation if we can make all relocs explicit. @alexcrichton, can I ask, what is your objection to the top level annotation? Don't you think it would be nice to be able to, at a glace, distinguish relocation wat files without having to visually scan the whole wat file a @sym or @Reloc? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To me it's more of a preference of per-relocation/symbol annotations over a top-level annotation. I agree that either system would work alright, and the main concern that I can think of is printing an wasm object file (binary-to-text) where it feels more natural to print I'd naively expect that with I should also be clear that I'm happy to be overruled here. IMO text-format design is something that's worth bikeshedding but not endlessly, so I wouldn't want to hold up anything on my own behalf too much |
||
|
|
||
| - `addend` describes the additional components of a relocation. | ||
|
|
||
| | `<addend>` | interpretation | condition | | ||
| |--------------|----------------------|-----------------------------------------------| | ||
| | nothing | Zero addend | always | | ||
| | `+<integer>` | Positive byte offset | `method` allows addend | | ||
| | `-<integer>` | Negative byte offset | `method` allows addend and `format` is signed | | ||
| | `<labeluse>` | Byte offest to label | `method` is `*text` | | ||
|
|
||
| - `symbol` describes the symbol against which to perform relocation. | ||
| - For `functext` relocation method, this is the function id, so that if the | ||
| addend is zero, the relocation points to the first instruction of that | ||
| function. | ||
| - For `customtext` relocation method, this is the name of the custom section, | ||
| so that if the addend is zero, the relocation points to the first byte of | ||
| data in that segment. | ||
| - For other relocation methods, this denotes the symbol in the scope of that | ||
| symbol kind. | ||
|
|
||
| The relocation type is looked up from the combination of `format`, `method`, | ||
| and `modifier`. If no relocation type exists, an error is raised. | ||
|
|
||
| If a component of a relocation is predetermined, it must be skipped in the | ||
| annotation text. | ||
|
|
||
| If a component of a relocation is defaulted, it may be skipped in the | ||
| annotation text. | ||
|
|
||
| For example, a relocation into the function table by the index of `$foo` with a | ||
| predetermined `format` would look like following: | ||
| ```wat | ||
| (@reloc functable $foo) | ||
| ``` | ||
| If all components of a relocation annotation are skipped, the annotation may be | ||
| omitted. | ||
|
|
||
| ### Instruction relocations | ||
|
|
||
| For every usage of `typeidx`, `funcidx`, `globalidx`, `tagidx`, a relocation | ||
| annotation is added afterwards, with `format` predefined as `leb`, `method` | ||
| predefined as the *primary* method for that type, and `symbol` defaulted as the | ||
| *primary* symbol of that `idx` | ||
|
|
||
| - For the `i32.const` instruction, a relocation annotation is added after the | ||
| integer literal operand, with `format` predefined as `sleb`, and `method` is | ||
| allowed to be either `data` or `functable`. | ||
| - For the `i64.const` instruction, a relocation annotation is added after the | ||
| integer literal operand, with `format` predefined as `sleb64`, and `method` | ||
| is allowed to be either `data` or `functable`. | ||
| - For the `i{32,64}.{load,store}*` instructions, a relocation annotation is | ||
| added after the offset operand, with `format` predefined as `leb` if the | ||
| *memory* being referenced is 32-bit, and `leb64` otherwise, and `method` | ||
| predefined as `data`. | ||
feedab1e marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| ### Data relocations | ||
|
|
||
| In data segments, relocation annotations can be interleaved into the data | ||
| string sequence. When that happens, relocations are situated after the last | ||
| byte of the value being relocated. | ||
|
|
||
| For example, relocation of a 32-bit function pointer `$foo` and a 32-bit | ||
| reference to a data symbol `$bar` into the data segment of size 8 would look | ||
| like following: | ||
| ```wat | ||
| (data (i32.const 0) "\00\00\00\00" (@reloc i32 functbl $foo) "\00\00\00\00" (@reloc i32 data $bar)) | ||
| ``` | ||
|
|
||
| ## Symbols | ||
|
|
||
| For each relocatable WebAssembly entity type, there exists a corresponding | ||
| symbol identifier namespaces for symbols of that type. | ||
|
|
||
| Additionally, a symbol identifier namespace exists for data symbols. | ||
|
|
||
| Symbol idenitfier namespaces differ from common index spaces in that they also | ||
| allow purely textual names in addition to numeric + optional textual names | ||
| allowed by index spaces. | ||
|
|
||
| Symbols are represented as WebAssembly annotations of the form | ||
| ```wat | ||
| (@sym <name> <qualifier>*) | ||
| ``` | ||
| Data imports represented as WebAssembly annotations of the form | ||
| ```wat | ||
| (@sym.import.data <name> <qualifier>*) | ||
| ``` | ||
|
|
||
| - `name` is the symbol name written as WebAssembly `id`, it is the name by | ||
| which relocation annotations reference the symbol. If it is not present, the | ||
| symbol is considered *primary* symbol for that WebAssembly entity, its name | ||
| is taken from the related entity | ||
| - There may only be one primary symbol for each WebAssembly entity. | ||
| - If a symbol is not associated with a WebAssembly entity, it may not be the | ||
| primary symbol. | ||
|
|
||
| After a name for the symbol is determined, it is placed into the symbol | ||
| identifier namespace corresponding to that symbol type. | ||
|
|
||
| > [!Note] | ||
| > As a consequence of that, the only symbols that can be referred to by a | ||
| > numeric index are _primary_ symbols, since they inherit their numeric index | ||
| > form the relocatable WebAssebly entity. | ||
|
|
||
| - `qualifier` is one of the allowed qualifiers on a symbol declaration. | ||
| Qualifiers may not repeat. | ||
|
|
||
| | `<qualifier>` | effect | | ||
| |---------------------|-----------------------------------------------| | ||
| | `<binding>` | sets symbol flags according to `<binding>` | | ||
| | `<visibility>` | sets symbol flags according to `<visibility>` | | ||
| | `retain` | sets `WASM_SYM_NO_STRIP` symbol flag | | ||
| | `tls` | sets `WASM_SYM_TLS` symbol flag | | ||
| | `(size <int>)` | sets symbol's `size` appropriately | | ||
| | `(offset <int>)` | sets `WASM_SYM_ABSOLUTE` symbol flag, sets symbol's `offset` appropriately | | ||
| | `(name <string>)` | sets `WASM_SYM_EXPLICIT_NAME` symbol flag, sets symbol's `name_len`, `name_data` appropriately | | ||
| | `(init_prio <int>)` | adds symbol to `WASM_INIT_FUNCS` section with the given priority | | ||
| | `(comdat <id>)` | adds symbol to a `comdat` with the given id | | ||
|
|
||
| | `<binding>` | flag | | ||
| |-------------|--------------------------| | ||
| | `global` | 0 | | ||
| | `local` | `WASM_SYM_BINDING_LOCAL` | | ||
| | `weak` | `WASM_SYM_BINDING_WEAK` | | ||
|
|
||
| | `<visibility>` | flag | | ||
| |----------------|------------------------------| | ||
| | `default` | 0 | | ||
| | `hidden` | `WASM_SYM_VISIBILITY_HIDDEN` | | ||
|
|
||
| - The `priority` qualifier may only be applied to function symbols. | ||
| - The `size` and `offset` qualifiers may only be applied to data symbols. | ||
| - The `size` and `name` qualifiers must be applied to data symbols. | ||
| - The `name` qualifier must be applied to data imports. | ||
|
|
||
| If all components of a symbol annotation are skipped, the annotation may be | ||
| omitted. | ||
|
|
||
| > [!Note] | ||
| > Since all components of a symbol can be skipped, a _primary_ symbol always | ||
| > exists for all WebAssembly entities, even if the annotation without a `name` | ||
| > is not present in the symbol sequence | ||
|
|
||
| ### WebAssembly entity symbols | ||
|
|
||
| For symbols related to WebAssembly entity, the symbol annotation sequence | ||
| occurs after the optional `id` of the declaration. | ||
|
|
||
| For example, the following code: | ||
| ```wat | ||
| (import "env" "foo" (func (@sym $a retain (name "a")) (@sym $b hidden (name "b")) (param) (result))) | ||
| ``` | ||
| declares 3 symbols: one primary symbol with the name of the index of the | ||
| function, one symbol with the name `$a`, and one symbol with the name `$b`. | ||
|
|
||
| ### Data symbols | ||
|
|
||
| Data symbol annotations can be interleaved into the data string sequence. | ||
| When that happens, relocations are situated before the first byte of the value | ||
| being defined. | ||
|
|
||
| For example, a declaration of a 32-bit global with the name `$foo` and linkage | ||
| name "foo" would look like following: | ||
| ```wat | ||
| (data (i32.const 0) (@sym $foo (name "foo") (size 4)) "\00\00\00\00") | ||
| ``` | ||
|
|
||
| ### Data imports | ||
|
|
||
| Data imports occur in the same place as module fields. Data imports are always | ||
| situated before data symbols. | ||
|
|
||
| ## COMDATs | ||
|
|
||
| COMDATs are represented as WebAssembly annotations of the form | ||
| ```wat | ||
| (@comdat <id> <string>) | ||
| ``` | ||
| where `id` is the WebAssembly name of the COMDAT, and `<string>` is `name_len` | ||
| and `name_str` of the `comdat`. | ||
|
|
||
| COMDAT declarations occur in the same place as module fields. | ||
|
|
||
| ## Labels | ||
|
|
||
| For some relocation types, an offset into a section/function is necessary. For | ||
| these cases, labels exsist. | ||
| Labels are represented as WebAssembly annotations of the form | ||
| ```wat | ||
| (@sym.label <id>) | ||
| ``` | ||
|
|
||
| ### Function labels | ||
| Function labels occur in the same place as instructions. | ||
| A label always denotes the first byte of the next instruction, or the byte | ||
| after the end of the function's instruction stream, if there isn't a next | ||
| instruction. | ||
|
|
||
| Function label names are local to the function in which they occur. | ||
|
|
||
| ### Data labels | ||
| Data labels can be interleaved into the data string sequence. | ||
| When that happens, relocations are situated after the last byte of the value | ||
| being relocated. | ||
|
|
||
| Data label names are local to the data segment in which they occur. | ||
|
|
||
| ### Custom labels | ||
| Custom labels can be interleaved into the data string sequence. | ||
| When that happens, relocations are situated after the last byte of the value | ||
| being relocated. | ||
|
|
||
| Custom label names are local to the custom section in which they occur. | ||
|
|
||
| ## Data segment flags | ||
| Data segment flags are represented as WebAssembly annotations of the form | ||
| ```wat | ||
| (@sym.segment <qualifier>*) | ||
| ``` | ||
|
|
||
| - `qualifier` is one of the allowed qualifiers on a data segment declaration. | ||
| Qualifiers may not repeat. | ||
|
|
||
| | `<qualifier>` | effect | | ||
| |-------------------|------------------------------------------------------| | ||
| | `(align <int>)` | sets segment's `alignment` appropriately | | ||
| | `(name <string>)` | sets segment's `name_len`, `name_data` appropriately | | ||
| | `strings` | sets `WASM_SEGMENT_FLAG_STRINGS` segment flag | | ||
| | `tls` | sets `WASM_SEGMENT_FLAG_TLS` segment flag | | ||
| | `retain` | sets `WASM_SEG_FLAG_RETAIN` segment flag | | ||
|
|
||
| If `align` is not specified, it is given a default value of 1. | ||
| If `name` is not specified, it is given an empty default value. | ||
|
|
||
| If all components of segment flags are skipped, the annotation may be omitted. | ||
|
|
||
| Data segment annotation occurs after the optional `id` of the data segment | ||
| declaration. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we
Wasmconsistently? SoWasm entityinstead of `WebAssembly entity?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I can do that