feat: Add MSVC symbol demangling support for PE binaries#177
Conversation
Extend SymbolDemangler to demangle MSVC-mangled symbols (?-prefixed) found in Windows PE binaries, closing the gap where the pipeline already detected ?-prefixed symbols but the demangler silently passed them through. - Add msvc-demangler 0.11 (MIT/NCSA) and allow NCSA in deny.toml - is_mangled() now returns true for ?-prefixed symbols - try_demangle_internal() routes ? symbols to a new try_msvc_demangle() helper using DemangleFlags::llvm(), mirroring the Rust/C++ helpers - Demangled MSVC symbols preserve the original in original_text and receive Tag::DemangledSymbol via the existing demangle() flow - Add 9 unit tests (plain fn, constructor, operator, invalid fallback, tag preservation) plus doctest coverage; no pipeline changes needed Closes #19 Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Address code-review findings on the MSVC demangling change: - P1: a crafted ?-prefixed symbol with deeply nested type modifiers drives msvc-demangler's recursive parser into a stack overflow that aborts the process (uncaught by catch_unwind, which does not catch stack overflow). Reject symbols longer than MSVC_MAX_SYMBOL_LEN (4096) before parsing -- cheap defense-in-depth for untrusted binary input. Add a regression test with a 60KB nested-pointer symbol. - P2: correct try_demangle_internal doc comment to mention MSVC. - P2: list msvc-demangler in AGENTS.md Key Dependencies. - Clarify the DemangleFlags::llvm() rationale comment (simplify pass). Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 📃 Configuration Change RequirementsWonderful, this rule succeeded.Mergify configuration change
🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
🟢 Full CI must passWonderful, this rule succeeded.All CI checks must pass. This protection prevents manual merges that bypass the merge queue.
🟢 Do not merge outdated PRsWonderful, this rule succeeded.Make sure PRs are within 10 commits of the base branch before merging
|
|
Warning Review limit reached
More reviews will be available in 46 minutes and 51 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yml Review profile: ASSERTIVE Plan: Pro Run ID: ⛔ Files ignored due to path filters (4)
📒 Files selected for processing (36)
📝 WalkthroughWalkthroughAdds MSVC ChangesMSVC Symbol Demangling
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 7 | ❌ 3❌ Failed checks (3 warnings)
✅ Passed checks (7 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Pull request overview
Adds MSVC (?-prefixed) symbol demangling to Stringy’s existing symbol-demangling classification path so PE import/export names can be converted to readable forms, aligning Windows support with the existing Rust/Itanium flows.
Changes:
- Extended
SymbolDemanglerto detect and demangle MSVC-mangled symbols usingmsvc-demangler, with a max-length guard to reduce stack-overflow DoS risk. - Added
msvc-demanglerdependency and allowed theNCSAlicense indeny.toml. - Added a detailed implementation plan doc and updated
AGENTS.mddependency list.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
src/classification/symbols.rs |
Adds MSVC demangling dispatch + length guard and unit tests for the new path. |
Cargo.toml |
Introduces msvc-demangler = "0.11.0" dependency. |
deny.toml |
Allows NCSA to satisfy cargo deny licensing checks for the new crate. |
docs/plans/2026-06-19-002-feat-msvc-symbol-demangling-plan.md |
Documents the design/requirements/test plan for MSVC demangling. |
AGENTS.md |
Updates the “Key Dependencies” list to include msvc-demangler. |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/plans/2026-06-19-002-feat-msvc-symbol-demangling-plan.md (1)
14-215:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReplace non-ASCII punctuation in this markdown plan.
The new plan includes Unicode punctuation (for example em dashes in several lines). This repo requires ASCII-only punctuation in markdown files.
As per coding guidelines,
**/*.{rs,md,txt,toml,yaml,yml,json}: Verify that no Unicode punctuation is introducedand**/*.{rs,md}: Never use emojis, em-dashes, or non-Latin characters in code or documentation.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/plans/2026-06-19-002-feat-msvc-symbol-demangling-plan.md` around lines 14 - 215, Replace all non-ASCII punctuation characters in the markdown plan file with ASCII equivalents. Specifically, replace em dashes (—) with hyphens (-) or double hyphens (--) throughout the document, and check for any other Unicode punctuation marks (smart quotes, ellipsis characters, etc.) and replace them with standard ASCII equivalents. This applies to the entire document, including all sections from "Problem Frame" through "Sources & Research" and within all code blocks, mermaid diagrams, and regular prose text. Ensure all punctuation conforms to ASCII-only requirements as stated in the project's coding guidelines.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/classification/symbols.rs`:
- Around line 548-674: The src/classification/symbols.rs file now exceeds the
500-line limit with the addition of MSVC demangling tests. Create a new
dedicated test module file (e.g., src/classification/symbols/msvc_tests.rs or a
tests subdirectory) and move all the MSVC test functions
(test_is_mangled_msvc_symbols, test_is_mangled_msvc_not_triggered_for_plain,
test_demangle_msvc_plain_function, test_demangle_msvc_constructor,
test_demangle_msvc_operator, test_demangle_msvc_invalid_fallback,
test_try_demangle_msvc_success, test_try_demangle_msvc_failure,
test_demangle_msvc_oversized_symbol_rejected, and
test_msvc_symbol_preserves_existing_tags) to this new module file. Update the
module declaration in symbols.rs to include the new test module, and remove all
moved test code from the original file to keep production logic separate from
tests.
---
Outside diff comments:
In `@docs/plans/2026-06-19-002-feat-msvc-symbol-demangling-plan.md`:
- Around line 14-215: Replace all non-ASCII punctuation characters in the
markdown plan file with ASCII equivalents. Specifically, replace em dashes (—)
with hyphens (-) or double hyphens (--) throughout the document, and check for
any other Unicode punctuation marks (smart quotes, ellipsis characters, etc.)
and replace them with standard ASCII equivalents. This applies to the entire
document, including all sections from "Problem Frame" through "Sources &
Research" and within all code blocks, mermaid diagrams, and regular prose text.
Ensure all punctuation conforms to ASCII-only requirements as stated in the
project's coding guidelines.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 52ef8802-9f2e-4ad5-b779-b3d0d19e0430
📒 Files selected for processing (5)
AGENTS.mdCargo.tomldeny.tomldocs/plans/2026-06-19-002-feat-msvc-symbol-demangling-plan.mdsrc/classification/symbols.rs
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
|
Queued — the merge queue status continues in this comment ↓. |
Initialize docs/adr/ with the Michael Nygard ADR format and capture the two architectural decisions from the MSVC symbol demangling work: - ADR-0001: length-cap guard (MSVC_MAX_SYMBOL_LEN) chosen over thread isolation, upstream fix, or source restriction to contain the stack-overflow DoS that catch_unwind cannot catch. - ADR-0002: selecting the msvc-demangler crate and allowing the NCSA license, over hand-rolling, shelling to undname.exe, or broadening deny.toml. Adds README.md index and a blank template.md for future records. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…cy test Addresses two non-blocking review follow-ups from PR #177: - symbols.rs exceeded the 500-line soft limit (674 lines, mostly test growth). Split into symbols/mod.rs (production, 306 lines) and symbols/tests.rs (390 lines), mirroring the existing patterns/ module directory convention in classification/. - Add test_demangle_msvc_idempotent covering the MSVC double-demangle no-op, matching the existing Rust-path test_demangle_idempotent. The no-op was mechanically guaranteed by the is_mangled() early-exit but not directly tested. No behavior change. fmt + clippy clean; all symbol tests pass. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
- docs/plans: replace 32 em-dashes (—) with ASCII -- per AGENTS.md ASCII-only rule (Copilot review, 3 threads) - symbols/tests.rs: remove test_is_mangled_msvc_not_triggered_for_plain; its printf/CreateFileW/empty cases are already covered by test_is_mangled_not_mangled (Copilot DRY review) Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…tails Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…t standards - Deleted work_next_task.md as it was no longer relevant to the workflow. - Removed mcp.json configuration file for MCP server setup. - Eliminated cargo-toml.mdc, configuration-management.mdc, error-handling-patterns.mdc, error-handling.mdc, linting-rules.mdc, performance-optimization.mdc, and rust-standards.mdc as they were outdated or redundant. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…ge ecosystems Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…s and guidelines Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
dist-workspace.toml's github-action-commits entry for actions/upload-artifact had a trailing space inside the quoted SHA, which cargo-dist templated into release.yml at 5 uses: lines. GitHub Actions treats the whole string as the ref, so it would fail to resolve. Fixed the source config and the generated workflow; actionlint clean. (Copilot review, 6 threads) Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
| entry: mise x -- shellcheck | ||
| language: system | ||
| types: [shell] | ||
| # 🐚 Shell script validation (local via mise, avoids Docker dependency) |
| language: system | ||
| pass_filenames: false | ||
| types: [cargo, cargo-lock] | ||
| # 🔒 Security audit for Rust dependencies (requires mise - run `just setup` first) |
|
@Mergifyio queue |
Merge Queue Status
This pull request spent 15 minutes 30 seconds in the queue, including 15 minutes 10 seconds running CI. Required conditions to merge
|
Summary
Extends
SymbolDemanglerto demangle MSVC-mangled symbols (?-prefixed) found in Windows PE binaries. The pipeline'sclassify_strings()already detected?-prefixed symbols as mangled candidates, butSymbolDemanglerhad no?handling -- so every MSVC symbol from a PE import/export table was detected and then silently passed through with no demangled output. MSVC is the dominant Windows toolchain, so this covers most PE symbols.Closes #19.
Example:
?printf@@YAHPEBDZZ->int __cdecl printf(char const *, ...)What changed
?-prefix support inSymbolDemangler(src/classification/symbols/):is_mangled()now returnstruefor?-prefixed symbols, andtry_demangle_internal()routes them to a newtry_msvc_demangle()helper usingmsvc_demangler::demangle(..., DemangleFlags::llvm()). The helper mirrors the existingtry_rust_demangle()/try_cpp_demangle()exactly, including the "differs from input" guard. The demangled form replacestext, the original is preserved inoriginal_text, andTag::DemangledSymbolis applied via the existingdemangle()flow -- no pipeline changes required.msvc-demangler 0.11(MIT/NCSA dual-licensed).NCSAadded to thedeny.tomllicense allow-list. The crate pulls onlybitflags+itoatransitively, no new duplicate versions.msvc-demanglerparses recursively with no depth limit. A crafted?-symbol with ~18KB+ of nested type modifiers overflows the stack and aborts the process -- and a stack overflow aborts rather than unwinds, so the pipeline'scatch_unwindcannot contain it. Since Stringy analyzes untrusted/malicious binaries,try_msvc_demangle()rejects symbols longer thanMSVC_MAX_SYMBOL_LEN(4096) before parsing. Real MSVC symbols are far shorter even when heavily templated; this is cheap defense-in-depth.src/classification/symbols.rsgrew past the 500-line soft limit, so it is now a module directory --symbols/mod.rs(production) +symbols/tests.rs(tests) -- following the existingclassification/patterns/convention.docs/adr/0001-msvc-demangler-length-cap.mdanddocs/adr/0002-msvc-demangler-dependency.md, with an ADR index (README.md) and template seeded for future use.Test plan
?-detection, plain function, constructor, operator overload, invalid-input graceful fallback, oversized-symbol rejection, idempotent re-demangle,try_demanglesuccess/failure, existing-tag preservationis_mangled,try_demangle) and passingcargo clippy --all-targets -- -D warningsclean;cargo fmt --checkclean?-prefixed subset -- comparable to the existing Rust/C++ pathsjust gen-fixtures) and runs the full suite -- local fixture generation is blocked by a Zig toolchain env issue on this machineRequirements (issue #19, Req 4.1)
R1-R5 and R7 met. R6 (
cargo deny checkpasses) is met for what the new dependency controls: licenses pass and no new duplicate version is introduced. A pre-existingcargo deny check bansfailure (duplicatewindows-sysviafs4 -> mmap-guard, already red onmain) is unrelated to this change and is tracked in #179.Deferred / tracked separately
These were raised in review and intentionally not done here, so they are filed as issues rather than left in this description:
windows-sysversions to fix thecargo deny check bansfailure (pre-existing tech debt; needs its own dependency bump, must not be masked with adeny.tomlskip).