From 02a429db84a1045d3e872c5ad14133d8a73efe95 Mon Sep 17 00:00:00 2001 From: feedable <141534996+feedab1e@users.noreply.github.com> Date: Mon, 20 Oct 2025 00:52:17 +0300 Subject: [PATCH 1/7] Add additional validation rules for object files --- Linking.md | 43 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+) diff --git a/Linking.md b/Linking.md index 0bf4723..0619de4 100644 --- a/Linking.md +++ b/Linking.md @@ -181,6 +181,47 @@ relocations applied to the CODE section, a relocation cannot straddle two functions, and for the DATA section relocations must lie within a data element's body. +### Additional validation rules + +When perfoming validation on object files, care must be taken to ensure that +meaningless relocations are not present in the binary. + +**Note**: Linker is not required to perform validation on its input object +files. + +When relocations occur in the CODE section, only the following relocations may +occur: + +| relocation type | condition the value at relocation offset | +|---------------------------------|------------------------------------------| +| `R_WASM_FUNCTION_INDEX_LEB` | must represent a `funcidx` | +| `R_WASM_TYPE_INDEX_LEB` | must represent a `typeidx` | +| `R_WASM_GLOBAL_INDEX_LEB` | must represent a `globalidx` | +| `R_WASM_EVENT_INDEX_LEB` | must represent a `tagidx` | +| `R_WASM_TABLE_NUMBER_LEB` | must represent a `tableidx` | +| `R_WASM_TABLE_INDEX_SLEB` | must represent an operand of `i32.const` | +| `R_WASM_TABLE_INDEX_SLEB64` | must represent an operand of `i64.const` | +| `R_WASM_MEMORY_ADDR_SLEB` | must represent an operand of `i32.const` | +| `R_WASM_MEMORY_ADDR_REL_SLEB` | must represent an operand of `i32.const` | +| `R_WASM_MEMORY_ADDR_TLS_SLEB` | must represent an operand of `i32.const` | +| `R_WASM_MEMORY_ADDR_SLEB64` | must represent an operand of `i64.const` | +| `R_WASM_MEMORY_ADDR_REL_SLEB64` | must represent an operand of `i64.const` | +| `R_WASM_MEMORY_ADDR_TLS_SLEB64` | must represent an operand of `i64.const` | +| `R_WASM_MEMORY_ADDR_LEB` | must represent the `offset` part of `memarg` where `memidx` references a 32-bit memory | +| `R_WASM_MEMORY_ADDR_LEB64` | must represent the `offset` part of `memarg` where `memidx` references a 64-bit memory | + +For `R_WASM_*_OFFSET_I*` relocations, the following condidions must hold for +the addend: + +- If `index` references the CODE section, the addend must represent the first + byte of an instruction, or the byte after the last instruction. +- If `index` references the DATA section, the addend must represent a valid + offset into a data segment's data area. +- If `index` references the custom section, the addend must represent a valid + offset into that custom section's data area. + +All other relocations are considered invalid for the purposes of validation + ## Linking Metadata Section A linking metadata section is a user-defined section with the name @@ -322,6 +363,8 @@ For section symbols: | ------------ | -------------- | ------------------------------------------- | | section | `varuint32` | the index of the target section | +Section symbols may only reference the CODE section, the DATA section, or custom sections. + The current set of valid flags for symbols are: - `1 / WASM_SYM_BINDING_WEAK` - Indicating that this is a weak symbol. When From d00b96f3223bb12e440c8c7fd12e63fd22f05782 Mon Sep 17 00:00:00 2001 From: feedable <141534996+feedab1e@users.noreply.github.com> Date: Mon, 20 Oct 2025 00:57:53 +0300 Subject: [PATCH 2/7] Turn the overlong leb note into a validation rule --- Linking.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/Linking.md b/Linking.md index 0619de4..4b7c1eb 100644 --- a/Linking.md +++ b/Linking.md @@ -63,10 +63,6 @@ The "reloc." custom sections must come after the ["linking"](#linking-metadata-section) custom section in order to validate relocation indices. -Any LEB128-encoded values should be maximally padded so that they can be -rewritten without affecting the position of any other bytes. For instance, the -function index 3 should be encoded as `0x83 0x80 0x80 0x80 0x00`. - Relocations contain the following fields: | Field | Type | Description | @@ -189,6 +185,10 @@ meaningless relocations are not present in the binary. **Note**: Linker is not required to perform validation on its input object files. +All LEB128-encoded values that are to be relocated must be maximally padded so +that they can be rewritten without affecting the position of any other bytes. +For instance, the function index 3 must be encoded as `0x83 0x80 0x80 0x80 0x00`. + When relocations occur in the CODE section, only the following relocations may occur: From e399b05dcc904b0c688c58b9593216c205731096 Mon Sep 17 00:00:00 2001 From: feedable <141534996+feedab1e@users.noreply.github.com> Date: Mon, 20 Oct 2025 20:51:38 +0300 Subject: [PATCH 3/7] Split validation rules into format and method rules, to make them more general --- Linking.md | 53 +++++++++++++++++++++++++++++++---------------------- 1 file changed, 31 insertions(+), 22 deletions(-) diff --git a/Linking.md b/Linking.md index 4b7c1eb..42b5a60 100644 --- a/Linking.md +++ b/Linking.md @@ -189,26 +189,37 @@ All LEB128-encoded values that are to be relocated must be maximally padded so that they can be rewritten without affecting the position of any other bytes. For instance, the function index 3 must be encoded as `0x83 0x80 0x80 0x80 0x00`. -When relocations occur in the CODE section, only the following relocations may -occur: - -| relocation type | condition the value at relocation offset | -|---------------------------------|------------------------------------------| -| `R_WASM_FUNCTION_INDEX_LEB` | must represent a `funcidx` | -| `R_WASM_TYPE_INDEX_LEB` | must represent a `typeidx` | -| `R_WASM_GLOBAL_INDEX_LEB` | must represent a `globalidx` | -| `R_WASM_EVENT_INDEX_LEB` | must represent a `tagidx` | -| `R_WASM_TABLE_NUMBER_LEB` | must represent a `tableidx` | -| `R_WASM_TABLE_INDEX_SLEB` | must represent an operand of `i32.const` | -| `R_WASM_TABLE_INDEX_SLEB64` | must represent an operand of `i64.const` | -| `R_WASM_MEMORY_ADDR_SLEB` | must represent an operand of `i32.const` | -| `R_WASM_MEMORY_ADDR_REL_SLEB` | must represent an operand of `i32.const` | -| `R_WASM_MEMORY_ADDR_TLS_SLEB` | must represent an operand of `i32.const` | -| `R_WASM_MEMORY_ADDR_SLEB64` | must represent an operand of `i64.const` | -| `R_WASM_MEMORY_ADDR_REL_SLEB64` | must represent an operand of `i64.const` | -| `R_WASM_MEMORY_ADDR_TLS_SLEB64` | must represent an operand of `i64.const` | -| `R_WASM_MEMORY_ADDR_LEB` | must represent the `offset` part of `memarg` where `memidx` references a 32-bit memory | -| `R_WASM_MEMORY_ADDR_LEB64` | must represent the `offset` part of `memarg` where `memidx` references a 64-bit memory | +The `offset` part of a `memarg` where `memidx` represents a 32-bit memory may +be treated as either [varuint32], or [varuint64]. + +Constraints are placed on relocations based on the data encoding of the value +to be relocated: + +| Data encoding | Allowed relocation types | +|---------------|--------------------------| +| [uint32] | `R_WASM_*_I32` | +| [uint64] | `R_WASM_*_I64` | +| [varint32] | `R_WASM_*_SLEB` | +| [varint64] | `R_WASM_*_SLEB64` | +| [varuint32] | `R_WASM_*_LEB` | +| [varuint64] | `R_WASM_*_LEB64` | + +If a data encoding for the relocation cannot be determined (i.e. there isn't a +known grammar construct at the relocation offset), the data encoding constraints +aren't applied. For example, this is the case for unknown custom sections and +data segments. + +In the CODE section, only certain grammar constructs are allowed to be targeted +by relocations: + +- For the constant operand of `i*.const` instructions, only + `R_WASM_TABLE_INDEX_*` and `R_WASM_MEMORY_ADDR_*` relocations are allowed. +- For the `offset` part of a `memarg`, only `R_WASM_MEMORY_ADDR_*` relocations + are allowed. +- For `funcidx`, only `R_WASM_FUNCTION_INDEX_*` relocations are allowed. +- For `globalidx`, only `R_WASM_GLOBAL_INDEX_*` relocations are allowed. +- For `tagidx`, only `R_WASM_EVENT_INDEX_*` relocations are allowed. +- For `tableidx`, only `R_WASM_TABLE_NUMBER_*` relocations are allowed. For `R_WASM_*_OFFSET_I*` relocations, the following condidions must hold for the addend: @@ -220,8 +231,6 @@ the addend: - If `index` references the custom section, the addend must represent a valid offset into that custom section's data area. -All other relocations are considered invalid for the purposes of validation - ## Linking Metadata Section A linking metadata section is a user-defined section with the name From 9e0e72c691e89d724dd1bb2869b21521b8706a57 Mon Sep 17 00:00:00 2001 From: feedable <141534996+feedab1e@users.noreply.github.com> Date: Mon, 20 Oct 2025 21:03:08 +0300 Subject: [PATCH 4/7] Add validation rules based on symtab entry types --- Linking.md | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/Linking.md b/Linking.md index 42b5a60..6325eea 100644 --- a/Linking.md +++ b/Linking.md @@ -192,6 +192,18 @@ For instance, the function index 3 must be encoded as `0x83 0x80 0x80 0x80 0x00` The `offset` part of a `memarg` where `memidx` represents a 32-bit memory may be treated as either [varuint32], or [varuint64]. +If relocation's `index` represents a symbol table entry, constraints are placed +on the relocation based on the symbol type it references: + +| Symbol type | Allowed relocation types | +|-------------------|---------------------------| +| `SYMTAB_FUNCTION` | `R_WASM_FUNCTION_IDX_*`, `R_WASM_TABLE_IDX_*`, `R_WASM_FUNCTION_OFFSET_*` | +| `SYMTAB_DATA` | `R_WASM_MEMORY_ADDR_*` | +| `SYMTAB_GLOBAL` | `R_WASM_GLOBAL_INDEX_*` | +| `SYMTAB_SECTION` | `R_WASM_SECTION_OFFSET_*` | +| `SYMTAB_EVENT` | `R_WASM_EVENT_INDEX_*` | +| `SYMTAB_TABLE` | `R_WASM_TABLE_NUMBER_*` | + Constraints are placed on relocations based on the data encoding of the value to be relocated: @@ -204,7 +216,7 @@ to be relocated: | [varuint32] | `R_WASM_*_LEB` | | [varuint64] | `R_WASM_*_LEB64` | -If a data encoding for the relocation cannot be determined (i.e. there isn't a +If an data encoding for the relocation cannot be determined (i.e. there isn't a known grammar construct at the relocation offset), the data encoding constraints aren't applied. For example, this is the case for unknown custom sections and data segments. From ce5a310a09bf847bd68b1d576c34d80529cb2969 Mon Sep 17 00:00:00 2001 From: feedable <141534996+feedab1e@users.noreply.github.com> Date: Mon, 20 Oct 2025 22:23:55 +0300 Subject: [PATCH 5/7] Work around WebAssembly validation rules on custom sections --- Linking.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/Linking.md b/Linking.md index 6325eea..5e4d9a2 100644 --- a/Linking.md +++ b/Linking.md @@ -177,13 +177,14 @@ relocations applied to the CODE section, a relocation cannot straddle two functions, and for the DATA section relocations must lie within a data element's body. -### Additional validation rules +### Object file validation rules -When perfoming validation on object files, care must be taken to ensure that -meaningless relocations are not present in the binary. +For a module to be considered a valid object file, additional constraints are +imposed on the data in custom sections related to linking, to ensure that the +linking process will yield a valid module. -**Note**: Linker is not required to perform validation on its input object -files. +Tools that process object files are only required to produce output if source +object files they process are valid object files. All LEB128-encoded values that are to be relocated must be maximally padded so that they can be rewritten without affecting the position of any other bytes. From 45d8c48b1e18e966c2302cc518f546eaf894df89 Mon Sep 17 00:00:00 2001 From: feedable <141534996+feedab1e@users.noreply.github.com> Date: Mon, 20 Oct 2025 22:55:30 +0300 Subject: [PATCH 6/7] Improve wording on addend validation --- Linking.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Linking.md b/Linking.md index 5e4d9a2..3900ede 100644 --- a/Linking.md +++ b/Linking.md @@ -237,8 +237,8 @@ by relocations: For `R_WASM_*_OFFSET_I*` relocations, the following condidions must hold for the addend: -- If `index` references the CODE section, the addend must represent the first - byte of an instruction, or the byte after the last instruction. +- If `index` references the CODE section, the addend must represent an offset + of an instruction boundary. - If `index` references the DATA section, the addend must represent a valid offset into a data segment's data area. - If `index` references the custom section, the addend must represent a valid From 633148d9cfaea579358ea46a54380c7825839c02 Mon Sep 17 00:00:00 2001 From: feedable <141534996+feedab1e@users.noreply.github.com> Date: Mon, 20 Oct 2025 23:52:11 +0300 Subject: [PATCH 7/7] Fix typo --- Linking.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Linking.md b/Linking.md index 3900ede..35bd2aa 100644 --- a/Linking.md +++ b/Linking.md @@ -217,7 +217,7 @@ to be relocated: | [varuint32] | `R_WASM_*_LEB` | | [varuint64] | `R_WASM_*_LEB64` | -If an data encoding for the relocation cannot be determined (i.e. there isn't a +If a data encoding for the relocation cannot be determined (i.e. there isn't a known grammar construct at the relocation offset), the data encoding constraints aren't applied. For example, this is the case for unknown custom sections and data segments.