Add Webcil support to R2RDump#127885
Conversation
R2RDump previously could not read Webcil files (the format used for managed assemblies in WebAssembly environments). This adds a WebcilImageReader that implements IBinaryImageReader for the Webcil format, enabling R2RDump to dump headers, methods, and section contents from Webcil-format R2R images. Changes: - New WebcilImageReader.cs implementing IBinaryImageReader - ReadyToRunReader detects Webcil format (after MachO, before PE) - DumpModel handles Webcil in reference assembly loading - Program.cs maps OperatingSystem.Unknown to TargetOS.Linux for Webcil - ReadyToRunMethod gracefully handles null PEReader (Webcil has no PE) - ILCompiler.Reflection.ReadyToRun.csproj includes shared Webcil.cs Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace the PEReader ImageReader property with a GetSectionData(int rva) method that returns a BlobReader. This decouples the interface from PEReader, enabling non-PE formats (Webcil) to provide section data. Implementations: - StandaloneAssemblyMetadata: delegates to PEReader.GetSectionData - ManifestAssemblyMetadata: same with null-guard - WebcilAssemblyMetadata: resolves RVA via WebcilImageReader sections - SimpleAssemblyMetadata (tests): delegates to PEReader.GetSectionData Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Implement a full WASM instruction disassembler that decodes WebAssembly binary format into WAT-style text output. This enables the --disasm flag in R2RDump to work with Webcil/WASM R2R images. - Add WasmDisassembler.cs with complete opcode tables for all standard WASM instructions (control, parametric, variable, table, memory, numeric, conversion, sign-extension, reference types) plus 0xFC (bulk memory/saturating truncation), 0xFB (GC), and 0xFD (SIMD) prefixed opcodes - Add WebcilImageReader.GetWasmFunctionBody() to parse the WASM module's type, function, and code sections to extract function info including type signature and local declarations - Integrate into TextDumper.DumpWasmDisasm() to print parameters and locals with their local indices, result types, and disassembled instructions Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
WebcilAssemblyMetadata was not retaining a reference to the pinned metadata byte array passed to its constructor. After GetStandaloneAssemblyMetadata returned, the array could be collected by the GC despite being allocated on the Pinned Object Heap, since no live reference existed. This caused an AccessViolationException when MetadataReader accessed the freed memory on larger files like system.private.corelib.wasm. Fix: store the metadata byte array in a field to keep it rooted for the lifetime of the MetadataReader. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace the stub DecodeFDPrefixed() method with a complete implementation of all WebAssembly SIMD instructions (0xFD prefix, sub-opcodes 0-255) per the WebAssembly spec. This includes memory operations, lane load/store, shuffle, splat, extract/replace lane, comparisons, bitwise operations, arithmetic, and conversion instructions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Implement opcode 0x1F (try_table) per the WebAssembly exception handling spec. Decodes the block type and vector of catch clauses, supporting all four catch clause kinds: catch, catch_ref, catch_all, catch_all_ref. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR extends the CoreCLR R2RDump toolchain to recognize Webcil inputs (including WASM-wrapped Webcil) and adds a WebAssembly bytecode disassembler for dumping function bodies in a WAT-like textual form. It also evolves the metadata abstraction so method-body bytes can be retrieved without assuming a PE-backed PEReader.
Changes:
- Add WebcilImageReader support to ReadyToRunReader initialization and R2RDump’s metadata-opening path.
- Replace
IAssemblyMetadata.ImageReaderwithIAssemblyMetadata.GetSectionData(int rva)and update method-body local signature decoding accordingly. - Add WasmDisassembler and integrate WASM disassembly printing into TextDumper for Webcil/WASM scenarios.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| src/coreclr/tools/r2rdump/WasmDisassembler.cs | New WASM bytecode decoder/disassembler for dumping instructions. |
| src/coreclr/tools/r2rdump/TextDumper.cs | Emits WASM-specific disassembly and metadata (params/locals/results) for Webcil inputs. |
| src/coreclr/tools/r2rdump/Program.cs | Adds fallback handling for OperatingSystem.Unknown when producing TargetDetails. |
| src/coreclr/tools/r2rdump/DumpModel.cs | Detects Webcil inputs when opening reference assemblies for metadata resolution. |
| src/coreclr/tools/aot/ILCompiler.Reflection.ReadyToRun/WebcilImageReader.cs | New reader that parses Webcil (and WASM-wrapped Webcil) and exposes metadata/sections/function bodies. |
| src/coreclr/tools/aot/ILCompiler.Reflection.ReadyToRun/StandaloneAssemblyMetadata.cs | Implements GetSectionData via PEReader section access. |
| src/coreclr/tools/aot/ILCompiler.Reflection.ReadyToRun/ReadyToRunReader.cs | Detects Webcil images and uses WebcilImageReader as the CompositeReader. |
| src/coreclr/tools/aot/ILCompiler.Reflection.ReadyToRun/ReadyToRunMethod.cs | Switches local-signature decoding to use GetSectionData + MethodBodyBlock.Create. |
| src/coreclr/tools/aot/ILCompiler.Reflection.ReadyToRun/ManifestAssemblyMetadata.cs | Implements GetSectionData when backed by a PEReader. |
| src/coreclr/tools/aot/ILCompiler.Reflection.ReadyToRun/ILCompiler.Reflection.ReadyToRun.csproj | Links in shared Webcil definitions (Webcil.cs). |
| src/coreclr/tools/aot/ILCompiler.Reflection.ReadyToRun/IAssemblyMetadata.cs | Replaces PEReader exposure with GetSectionData(int rva). |
| src/coreclr/tools/aot/ILCompiler.ReadyToRun.Tests/TestCasesRunner/R2RResultChecker.cs | Updates test metadata wrapper to implement GetSectionData. |
| case 0x02: | ||
| { | ||
| string bt = ReadBlockType(); | ||
| indent++; | ||
| return $"block{bt}"; |
| } | ||
|
|
||
| private string ReadHeapType() | ||
| { |
| else if (WebcilImageReader.IsWebcilImage(image)) | ||
| { | ||
| CompositeReader = new WebcilImageReader(image); | ||
| } |
There was a problem hiding this comment.
This is a future problem.
|
Tagging subscribers to 'arch-wasm': @lewing, @pavelsavara |
| } | ||
|
|
||
| public ImmutableArray<byte> GetEntireImage() | ||
| => Unsafe.As<byte[], ImmutableArray<byte>>(ref Unsafe.AsRef(in _image)); |
There was a problem hiding this comment.
Is there any way to avoid the case from mutable byte[] to ImmutableArray here?
| return result; | ||
| } | ||
|
|
||
| public int GetOffset(int rva) |
There was a problem hiding this comment.
| public int GetOffset(int rva) | |
| public int GetSectionRelativeOffset(int rva) |
|
|
||
| unsafe | ||
| { | ||
| fixed (byte* p = &image[(int)offset]) |
There was a problem hiding this comment.
Can we use a Span here and individual reads from the fields so we don't need the unsafe wrapping the memcpy here?
| webcilOffset = 0; | ||
|
|
||
| // Simple scan: look for the Webcil magic in the WASM module | ||
| // The Webcil payload is embedded as a custom section in the WASM module |
There was a problem hiding this comment.
Are we searching the whole file here ? Because I think that if we are looking just at data section, then it should be at fixed offset 0. Maybe that's fine for helper tool ?
| /// Decodes WASM binary instructions into WAT (WebAssembly Text) format. | ||
| /// Based on the WebAssembly specification: https://webassembly.github.io/spec/core/ | ||
| /// </summary> | ||
| internal sealed class WasmDisassembler |
There was a problem hiding this comment.
It would be good to add e2e integration test and use wat2wasm to test a round-trip.
Does this also parse non-R2R modules ?
Note
This PR was created with the assistance of GitHub Copilot.
Summary
Adds support for reading and dumping Webcil files in R2RDump, including a full WebAssembly bytecode disassembler.
Changes
GetSectionDatato the metadata interface and implement across all types.WasmDisassemblerclass that decodes WASM binary instructions into WAT text format, covering:try_tableinstruction with all catch clause kinds