Skip to content

Add SSE2 integer XMM lifting with oracle-backed SIMD tests#71

Merged
NaC-L merged 15 commits intomainfrom
feature/simd-sse2-phase1
Mar 8, 2026
Merged

Add SSE2 integer XMM lifting with oracle-backed SIMD tests#71
NaC-L merged 15 commits intomainfrom
feature/simd-sse2-phase1

Conversation

@NaC-L
Copy link
Copy Markdown
Owner

@NaC-L NaC-L commented Mar 7, 2026

Summary

  • add 128-bit operand typing (Register128/Memory128) across common operand modeling and both disassembler frontends (Zydis + iced Rust bridge)
  • enable and implement SSE2 integer handlers for MOVDQA, PAND, POR, and PXOR using the existing GetIndexValue/SetIndexValue path
  • extend test/oracle plumbing to support 128-bit XMM register values (APInt in tester + JSON parser/serializer + oracle register aliases)
  • add SIMD seed/oracle cases and refresh regression documentation/scope metadata

Validation

  • python test.py baseline
  • python test.py micro --check-flags
  • python test.py quick
  • python test.py report --json (112/115 handlers covered, 97.4%)

@NaC-L
Copy link
Copy Markdown
Owner Author

NaC-L commented Mar 7, 2026

Automated review summary (main...feature/simd-sse2-phase1): changes requested.

Blocking findings

  1. RIP is not initialized from function args (lifter/core/LifterClass_Concolic.hpp, around InitRegisters_impl): eipArg is consumed/named but never written via SetRegisterValue(Register::RIP/EIP, ...), while reads map EIP/RIP to RIP_. This can leave RIP unset.

  2. SIMD width regressions can be masked by test comparison (lifter/test/Tester.hpp, runTestCase): expected is coerced to actual->getBitWidth() (zextOrTrunc), so a wrong-width write (e.g., 64-bit write to XMM) can still pass when low bits match.

  3. Strict cross-oracle validation can be bypassed (scripts/rewrite/generate_oracle_vectors.py): secondary provider failures are always downgraded to warnings; generation proceeds and compare_results can effectively skip strict validation when only one provider result remains.

  4. Signed div/rem emulation uses float conversion (scripts/rewrite/sleigh_oracle.py, _op_int_sdiv/_op_int_srem): int(a / b) loses precision for large 64-bit values, producing incorrect oracle math.

Non-blocking

  • getRegOfSize in lifter/disasm/CommonDisassembler.hpp still only handles 8/16/32/64 while 128-bit XMM support was added elsewhere; not currently hit by callsites but risky for future generic size conversions.

@NaC-L
Copy link
Copy Markdown
Owner Author

NaC-L commented Mar 7, 2026

Addressed the previously reported blocking findings in follow-up commits:

  • RIP init fixed in concolic register setup (lifter/core/LifterClass_Concolic.hpp)
  • XMM size handling added to getRegOfSize (lifter/disasm/CommonDisassembler.hpp)
  • Microtest register assertions now enforce expected bit-width (lifter/test/Tester.hpp)
  • Strict oracle mode now fails on secondary-provider emulation failures (scripts/rewrite/generate_oracle_vectors.py)
  • Signed div/rem in Sleigh oracle now use integer truncation semantics without float conversion (scripts/rewrite/sleigh_oracle.py)
  • Determinism gate follow-up completed by refreshing lifter/test/test_vectors/golden_ir_hashes.json after confirming stable repeat-run IR hashes

Validation run locally:

  • python test.py micro --check-flags custom_knownbits
  • python test.py micro --check-flags
  • python test.py quick
  • python -m py_compile scripts/rewrite/generate_oracle_vectors.py scripts/rewrite/sleigh_oracle.py

Current PR head: 0b80205
All required PR checks are green.

@NaC-L
Copy link
Copy Markdown
Owner Author

NaC-L commented Mar 7, 2026

Second-pass review applied one additional defensive hardening change:

  • lifter/disasm/CommonDisassembler.hpp
    • getRegOfSize(...) now guards invalid size values before indexing register arrays.
    • Changed index to int and return Register::None for unsupported sizes.

Commit: ef2cbf3

Local validation:

  • python test.py quick

CI re-ran automatically on this commit and is currently in progress.

@NaC-L
Copy link
Copy Markdown
Owner Author

NaC-L commented Mar 7, 2026

Fourth-pass review summary (main...feature/simd-sse2-phase1): changes requested.

Blocking

  1. Concolic memory argument contract mismatch

    • lifter/core/LifterClass_Concolic.hpp: function args now place XMM args after memory.
    • lifter/core/MergenPB.hpp: passes still use fnc->getArg(fnc->arg_size()-1) as memory (PromotePseudoStackPass / PromotePseudoMemory).
    • This now points at XMM15, not memory.
  2. Sleigh register backing store too small for XMM offsets

    • scripts/rewrite/sleigh_oracle.py allocates register space as bytearray(0x1000).
    • pypcode x86_64 offsets for XMM are beyond that (XMM0=4608, XMM15=5568).
    • Reads/writes at those offsets do not hit intended register locations, breaking SSE2 oracle correctness.

Medium

  1. APInt literal parser lacks explicit width-fit validation (lifter/test/TestInstructions.cpp)
  2. Microtest pattern checks were loosened (scripts/rewrite/instruction_microtests.json), reducing semantic regression sensitivity.

Confirmed fixed since earlier review

  • Signed div/rem no longer uses float conversion (_trunc_div_signed path in sleigh_oracle.py).
  • Strict mode now fails on secondary-provider emulation errors (generate_oracle_vectors.py).

@NaC-L
Copy link
Copy Markdown
Owner Author

NaC-L commented Mar 7, 2026

Applied the latest (fourth-pass) review fixes in commit 25c2318:

  1. Concolic memory arg contract mismatch
  • lifter/core/MergenPB.hpp
  • run_opts() now resolves and consistently passes memoryArg from this->memoryAlloc (with named-arg fallback) into GEPLoadPass, PromotePseudoStackPass, and PromotePseudoMemory.
  • Removed fragile positional usage (getArg(arg_size()-1) / off-by-one pattern).
  1. Sleigh register backing too small for XMM offsets
  • scripts/rewrite/sleigh_oracle.py
  • Expanded _regs backing store to 0x2000 and added explicit register-space bounds checks on register read/write.
  1. APInt parser width-fit validation
  • lifter/test/TestInstructions.cpp
  • parseAPIntLiteral now rejects out-of-width values explicitly for integer and string literals.
  1. Microtest sensitivity restoration
  • scripts/rewrite/instruction_microtests.json
  • Tightened loosened patterns (branch/diamond/cmov/switch cases) while keeping them stable against SSA renumbering.

Local validation:

  • python -m py_compile scripts/rewrite/sleigh_oracle.py scripts/rewrite/generate_oracle_vectors.py
  • python test.py quick

PR checks were triggered automatically for this commit.

@NaC-L
Copy link
Copy Markdown
Owner Author

NaC-L commented Mar 7, 2026

Follow-up fix pushed in 8bbb432 to address the new CI failure mode seen on 25c2318:

  • scripts/rewrite/instruction_microtests.json

    • nested_branch patterns were too SSA-index specific (%1) and failed in CI.
    • Updated to SSA-agnostic but still semantic checks:
      • "icmp slt i32", ", 11", ", 21", plus existing select/phi-shape checks.
  • lifter/test/test_vectors/golden_ir_hashes.json

    • Refreshed via python test.py update-golden for current baseline output.

Validation run locally:

  • python test.py quick
  • python test.py baseline

Both passed (including determinism).

Current PR head checks on 8bbb432 are all green:

  • build (iced)
  • build (zydis)
  • rewrite-quick-gate
  • rewrite-strict-gate

@NaC-L
Copy link
Copy Markdown
Owner Author

NaC-L commented Mar 7, 2026

Additional review note (non-blocking cleanup):

In lifter/core/LifterClass.hpp, floatingPointValue still exists as { Value* v1; Value* v2; } (around lines 152-156), but current active SSE2/XMM lifting paths use unified i128 value handling (GetIndexValue/SetIndexValue with Register128) rather than split-lane pairs.

Recommendation: remove this legacy v1/v2 struct (and related commented split-lane remnants) to avoid implying a two-lane model for scalar 64-bit values and to keep register semantics unambiguous.

@NaC-L
Copy link
Copy Markdown
Owner Author

NaC-L commented Mar 7, 2026

Addressed the latest non-blocking cleanup note in commit 09e3f90:

  • lifter/core/LifterClass.hpp
    • Removed unused legacy floatingPointValue { Value* v1; Value* v2; }.
    • Removed remaining commented split-lane remnants (simpleFPV / RegisterManagerFP declarations and references) from this header.

Validation:

  • python test.py quick passed locally.

Additional check requested during review iteration:

  • Ran a warning-focused build with -Wall -Wextra in a dedicated out-of-tree configuration (build_wall_rel, Release).
  • Build completes; warnings observed are mostly existing unused-parameter / unused-but-set diagnostics in:
    • lifter/memory/MemoryPolicy.hpp
    • lifter/analysis/CustomPasses.hpp

@NaC-L
Copy link
Copy Markdown
Owner Author

NaC-L commented Mar 7, 2026

Latest merge-readiness review (main...feature/simd-sse2-phase1 @ 09e3f90): changes requested.

I re-ran full review on current 22-file diff (including new lifter/core/LifterClass.hpp cleanup commit) and validated findings directly.

Blocking

  1. SLEIGH emulator iteration cap rejects valid instructions (scripts/rewrite/sleigh_oracle.py, execute, max_iters = len(ops) * 20).
    • Repro on current branch using existing vector: smoke_pdep_pdep from lifter/test/test_vectors/oracle_vectors.json throws PcodeEmulatorError: P-code execution exceeded max iterations.
    • This can break strict dual-provider oracle runs on legitimate cases.

Medium

  1. pand/por/pxor enabled without MMX/XMM operand-class gate (lifter/semantics/x86_64_opcodes.x + lifter/semantics/Semantics_Misc.ipp).

    • MMX forms decode (MM0..MM7 are mapped in disassemblers), but generic register helpers in CommonDisassembler.hpp do not map MMX in getBiggestEncoding and return fallback defaults (Register::None / size 0).
    • Risk: valid MMX encodings can flow into unsupported register paths and fail lifting.
  2. Microtest matcher sensitivity reduced by tokenized patterns (scripts/rewrite/instruction_microtests.json + scripts/rewrite/verify.ps1).

    • Verifier uses independent Select-String -SimpleMatch checks per token, so split checks like "mul i32" and ", 3" can pass even if immediate binding on the target instruction drifts.

Confirmed fixed

  • Memory argument pass plumbing (MergenPB.hpp) is now resolved by named memory argument / memoryAlloc and no longer depends on terminal arg position.
  • Legacy split-lane FP remnants in LifterClass.hpp cleanup look non-functional/dead and removal is safe.

@NaC-L
Copy link
Copy Markdown
Owner Author

NaC-L commented Mar 7, 2026

Applied the latest merge-readiness review findings in commit f2d2b3b.

1) Blocking: SLEIGH iteration cap too low on valid instruction loops

  • File: scripts/rewrite/sleigh_oracle.py
  • Updated PcodeEmulator.execute() loop guard:
    • old: max_iters = len(ops) * 20
    • new: branch-aware cap:
      • branchy ops (BRANCH/CBRANCH/BRANCHIND): max(len(ops) * 128, 2048)
      • straight-line ops: len(ops) + 1
  • Error now includes current/limit/op-count context.
  • Reproduced prior failure case and verified fix locally for smoke_pdep_pdep from oracle_vectors.json.

2) Medium: explicit MMX/XMM operand-class gate for SIMD integer handlers

  • Files:
    • lifter/semantics/Semantics_Misc.ipp
    • lifter/semantics/x86_64_opcodes.x
  • Added hard operand-class guards in lift_movdqa, lift_pand, lift_por, lift_pxor:
    • destination must be Register128
    • source must be Register128 or Memory128
  • Added opcode-table comment clarifying these handlers are XMM-only and MMX forms are intentionally rejected.

3) Medium: microtest matcher sensitivity with split tokens

  • Files:
    • scripts/rewrite/verify.ps1
    • scripts/rewrite/instruction_microtests.json
  • Extended verifier schema support for grouped same-line assertions:
    • pattern object form: { "line_all": ["tokenA", "tokenB"] }
  • Migrated vulnerable checks (branch, nested_branch, diamond, cmov_chain) from independent tokens to same-line grouped token assertions.

Local validation

  • python -c "... SleighOracleProvider().emulate(smoke_pdep_pdep) ..." (repro no longer fails)
  • powershell -ExecutionPolicy Bypass -File scripts/rewrite/verify.ps1
  • python -m py_compile scripts/rewrite/sleigh_oracle.py
  • python test.py quick

CI reran automatically for this commit.

@NaC-L
Copy link
Copy Markdown
Owner Author

NaC-L commented Mar 7, 2026

Applied one additional correctness fix in commit 1ae3671 after re-reading review context and running a focused follow-up audit.

Fix: movdqa guard now accepts both legal operand forms

  • File: lifter/semantics/Semantics_Misc.ipp
  • Previous guard only allowed destination Register128, which incorrectly rejected valid store form movdqa m128, xmm.
  • Updated guard now accepts exactly these legal forms:
    • xmm <- xmm/m128
    • m128 <- xmm
  • MMX/non-128-bit forms remain rejected.

Validation

  • Targeted regression scenario (custom vector, movdqa [rsp], xmm0 bytes 66 0F 7F 04 24) now passes in microtest harness.
  • python test.py quick passes.

CI reran automatically for this commit and is in progress.

@NaC-L
Copy link
Copy Markdown
Owner Author

NaC-L commented Mar 7, 2026

Latest re-review on current head (1ae3671): changes requested (not merge-ready yet).

What I re-checked

  • Full 23-file diff (main...feature/simd-sse2-phase1)
  • 4 parallel reviewer passes + manual review of remaining files (docs/*, scripts/rewrite/verify.ps1)
  • Direct runtime check of Sleigh provider across current oracle set

Current blocking concern

  1. MMX forms of PAND/POR/PXOR now hard-fail instead of falling back to not_implemented
    • Registration enables mnemonic-level dispatch (lifter/semantics/x86_64_opcodes.x: PAND/POR/PXOR).
    • Handlers enforce XMM-only operand types and call UNREACHABLE otherwise (lifter/semantics/Semantics_Misc.ipp, lift_pand/lift_por/lift_pxor).
    • Disassemblers still decode MMX registers for these mnemonics (IcedDisassembler.hpp / ZydisDisassembler.hpp MM0..MM7 mappings).
    • Effect: valid MMX encodings can now abort lifting via llvm_unreachable (from UNREACHABLE macro) rather than taking the existing unsupported-instruction path (not_implemented in semantics dispatcher default).

Verified resolved from prior reviews

  • Memory-argument pass plumbing in MergenPB.hpp is fixed (named memory resolution).
  • Sleigh register-space sizing/bounds are fixed (0x2000 + bounds checks).
  • Current vector corpus check: Sleigh emulation passes for all current unicorn-mode cases (89/89, no failures), including smoke_pdep_pdep.
  • verify.ps1 now supports line_all checks, improving microtest matching precision.

Recommendation: block merge until MMX behavior is explicitly handled (either gate dispatch before these handlers for MMX forms, or route non-XMM forms to the non-implemented fallback rather than UNREACHABLE).

@NaC-L
Copy link
Copy Markdown
Owner Author

NaC-L commented Mar 7, 2026

Applied the latest changes-requested finding in commit f2540ab.

Fix: MMX forms of pand/por/pxor no longer hard-fail

  • File: lifter/semantics/Semantics_Misc.ipp
  • Updated unsupported-operand handling in:
    • lift_movdqa
    • lift_pand
    • lift_por
    • lift_pxor
  • Behavior change:
    • previously: unsupported operand forms hit UNREACHABLE(...) (hard abort)
    • now: unsupported forms route to the existing dispatcher fallback behavior by emitting not_implemented return and setting run = 0, finished = 1.
  • This preserves XMM SSE2 handling while preventing valid MMX encodings from crashing the lifter.

Validation

  • python test.py quick
  • targeted MMX fallback probe with custom vector (0F DB C1, pand mm0, mm1) via MERGEN_TEST_VECTORS + python test.py micro mmx_pand_fallback_no_crash
    • passes without abort.

CI reran automatically on this commit.

@NaC-L
Copy link
Copy Markdown
Owner Author

NaC-L commented Mar 8, 2026

Review verdict: Request changes

I reviewed the diff for main...feature/simd-sse2-phase1 and found 2 actionable issues.

P2 - Sleigh dependency error handling is bypassed

  • scripts/rewrite/sleigh_oracle.py#L18 imports pypcode at module load time.
  • scripts/rewrite/sleigh_oracle.py#L513-L519 has intended graceful fallback in __init__, but it never runs when pypcode is missing.
  • scripts/rewrite/generate_oracle_vectors.py#L233-L235 imports SleighOracleProvider directly.

Repro:
python -S scripts/rewrite/generate_oracle_vectors.py --seed scripts/rewrite/oracle_seed_vectors.json --out NUL --providers sleigh
Current behavior: raw ModuleNotFoundError traceback instead of actionable install guidance.

P3 - Baseline doc attributes parity check to wrong script

  • docs/REWRITE_BASELINE.md#L78-L82 says verify.ps1 enforces manifest/source parity.
  • Actual parity check is in scripts/rewrite/run.ps1#L40-L47.
  • scripts/rewrite/verify.ps1 validates IR existence and pattern matches only.

Please fix these before merge.

@NaC-L
Copy link
Copy Markdown
Owner Author

NaC-L commented Mar 8, 2026

Applied the latest re-review findings in commit f53308d.

1) Sleigh dependency error handling now degrades cleanly

  • Files:
    • scripts/rewrite/sleigh_oracle.py
    • scripts/rewrite/generate_oracle_vectors.py

Changes:

  • sleigh_oracle.py now treats pypcode as an optional dependency at module import time:
    • wraps from pypcode import Context, OpCode in ModuleNotFoundError handling
    • keeps module importable when pypcode is absent
    • SleighOracleProvider.__init__ emits explicit install guidance when unavailable.
  • Dispatch table creation is now safe when OpCode is unavailable (_OP_DISPATCH = {} if OpCode is None else {...}), avoiding import-time crashes.
  • generate_oracle_vectors.py provider creation now maps missing-Sleigh dependency errors to OracleError with actionable guidance.
  • CLI now catches OracleError in __main__ and prints a concise error line, exiting with code 1 (no raw traceback).

Validated repro from review:

  • python -S scripts/rewrite/generate_oracle_vectors.py --seed scripts/rewrite/oracle_seed_vectors.json --out NUL --providers sleigh
  • Now returns:
    • ERROR: Sleigh provider requires \pypcode` Python package. Install with `pip install pypcode`.`

Also validated normal Sleigh path:

  • python scripts/rewrite/generate_oracle_vectors.py --seed scripts/rewrite/oracle_seed_vectors.json --out .git/tmp_sleigh_oracle_vectors.json --providers sleigh

2) Baseline doc parity-check attribution fixed

  • File: docs/REWRITE_BASELINE.md
  • Pass-criteria section now correctly states:
    • manifest/source parity is enforced by run.ps1
    • IR/pattern checks are enforced by verify.ps1

Local validation

  • python -m py_compile scripts/rewrite/generate_oracle_vectors.py scripts/rewrite/sleigh_oracle.py
  • python test.py quick

CI reran automatically for this commit.

@NaC-L
Copy link
Copy Markdown
Owner Author

NaC-L commented Mar 8, 2026

Second-pass review update (Request changes)

Re-reviewed current main...feature/simd-sse2-phase1 head.

P2 - Unknown provider path now crashes with AttributeError

  • File: scripts/rewrite/generate_oracle_vectors.py (create_provider around lines 229-247)
  • create_provider handles unicorn and sleigh, but no final raise for unknown names.
  • Result: unknown providers return None, later dereferenced as provider.name / provider.emulate.

Repro:
python scripts/rewrite/generate_oracle_vectors.py --seed scripts/rewrite/oracle_seed_vectors.json --out NUL --providers unicorn,unknown_provider

Observed: crashes with AttributeError: NoneType object has no attribute ... instead of actionable OracleError("Unsupported oracle provider ...").

P3 - Pass-criteria docs still over-attribute parity check to verify script

  • File: docs/REWRITE_BASELINE.md lines 78-82
  • Text says run.ps1 enforces parity before lifting, but the bullets under verify.ps1 still include parity as if verify enforces it.
  • Actual behavior: parity check is in run.ps1; verify.ps1 checks IR existence/pattern matches only.

Please fix both before merge.

@NaC-L
Copy link
Copy Markdown
Owner Author

NaC-L commented Mar 8, 2026

Applied the latest review comment (4019055798) in commit 433eb12.

1) Unknown provider now fails with actionable OracleError

  • File: scripts/rewrite/generate_oracle_vectors.py
  • Restored explicit final branch in create_provider(...):
    • raise OracleError(f"Unsupported oracle provider '{name}'")
  • This prevents None from propagating and later crashing with AttributeError.

Validated repro from review:

  • python scripts/rewrite/generate_oracle_vectors.py --seed scripts/rewrite/oracle_seed_vectors.json --out NUL --providers unicorn,unknown_provider
  • now returns:
    • ERROR: Unsupported oracle provider 'unknown_provider'

2) Baseline docs parity attribution cleaned up

  • File: docs/REWRITE_BASELINE.md
  • Pass criteria now states:
    • parity check belongs to run.ps1
    • verify.ps1 covers IR existence + pattern presence only
  • Removed residual bullet that still implied parity enforcement under verify checks.

Additional validation

  • python -S scripts/rewrite/generate_oracle_vectors.py --seed scripts/rewrite/oracle_seed_vectors.json --out NUL --providers sleigh (still prints actionable pypcode install guidance)
  • python -m py_compile scripts/rewrite/generate_oracle_vectors.py
  • python test.py quick

CI reran automatically for this commit.

@NaC-L
Copy link
Copy Markdown
Owner Author

NaC-L commented Mar 8, 2026

Third-pass review update (Request changes)

Re-reviewed current head (433eb12) and validated findings locally.

P2 - SIMD helpers still reject valid XMM16..XMM31

  • lifter/disasm/CommonDisassembler.hpp limits SIMD fast-path checks to XMM0..XMM15 in:
    • getBiggestEncoding (around lines 69-73)
    • getRegOfSize (around lines 192-196)
    • getRegisterSize (around lines 352-356)
  • But decoders still map higher XMM regs, e.g.:
    • lifter/disasm/IcedDisassembler.hpp maps XMM16
    • lifter/disasm/ZydisDisassembler.hpp maps XMM16

So high XMM registers can fall through to Register::None / size 0 despite being valid decoded registers.

P2 - line_all verifier matching is substring-permissive

  • scripts/rewrite/verify.ps1 checks each token with IndexOf(..., Ordinal), so numeric tokens can match wrong constants.
  • Example: token ", 3" also matches line text containing ", 30".
  • Repro:
    • powershell -NoProfile -Command "('mul i32 %0, 30').IndexOf(', 3',[System.StringComparison]::Ordinal)"
    • returns 10 (match).
  • This can allow semantic regressions to pass for patterns intended to pin exact constants in instruction_microtests.json.

P3 - Pass-criteria docs still overstate parity guarantee

  • docs/REWRITE_BASELINE.md says parity is "exactly one manifest entry".
  • scripts/rewrite/run.ps1 currently checks only missing and extra set membership; it does not reject duplicate sample names.
  • Please either enforce uniqueness in script or relax doc wording.

Please address before merge.

@NaC-L
Copy link
Copy Markdown
Owner Author

NaC-L commented Mar 8, 2026

Applied the latest third-pass review findings (4019192306) in commit 25da4f8.

1) SIMD helper range now accepts XMM16..XMM31

  • File: lifter/disasm/CommonDisassembler.hpp
  • Updated fast-path XMM range checks from XMM0..XMM15 to XMM0..XMM31 in:
    • getBiggestEncoding
    • getRegOfSize
    • getRegisterSize
  • This aligns helper behavior with decoder mappings that already include high XMM registers.

2) line_all matcher no longer allows partial numeric token matches

  • File: scripts/rewrite/verify.ps1
  • Added Test-LineTokenMatch helper for line_all token checks.
  • Matching now uses regex-escaped token checks with word-boundary guards for alnum/underscore edges.
  • Prevents false positives like token ", 3" matching ", 30".

Validated repro:

  • powershell -NoProfile -Command "... Regex::IsMatch('mul i32 %0, 30', pattern-for-', 3') ..."
  • now evaluates to False.

3) Baseline parity now enforces uniqueness (matches docs)

  • File: scripts/rewrite/run.ps1
  • Added duplicate-name checks for:
    • source basenames (testcases/rewrite_smoke/*)
    • manifest sample names (instruction_microtests.json)
  • Runner now fails if either side has duplicates, making "exactly one manifest entry" enforceable.

Local validation

  • python test.py quick
  • line_all numeric-token repro check (returns False for ", 3" vs ", 30")

CI reran automatically for this commit.

@NaC-L
Copy link
Copy Markdown
Owner Author

NaC-L commented Mar 8, 2026

Fourth-pass review update

Re-reviewed latest head (25da4f8).

I validated the prior blockers are addressed:

  • XMM helper range now covers XMM0..XMM31 in CommonDisassembler.hpp.
  • Unknown providers now fail with explicit OracleError.
  • Missing pypcode path now returns actionable error text.
  • Manifest parity duplicate checks are enforced in run.ps1, and baseline docs were aligned.
  • verify.ps1 token matching now uses boundary-aware matching.

Smoke check run succeeded:
python scripts/rewrite/generate_oracle_vectors.py --seed scripts/rewrite/oracle_seed_vectors.json --out NUL --providers unicorn,sleigh

Non-blocking follow-up

python -m scripts.rewrite.generate_oracle_vectors --providers sleigh still fails because create_provider() imports sleigh_oracle as a top-level module (from sleigh_oracle import ...).

Given documented entrypoints use script/cmd invocation, I consider this a robustness follow-up rather than a merge blocker.

@NaC-L
Copy link
Copy Markdown
Owner Author

NaC-L commented Mar 8, 2026

Applied the latest follow-up review note (4019255752) in commit ba8b3b9.

Fix: module entrypoint now resolves Sleigh provider import correctly

  • File: scripts/rewrite/generate_oracle_vectors.py
  • Updated create_provider("sleigh") import resolution to support both invocation styles:
    • script mode (python scripts/rewrite/generate_oracle_vectors.py ...) via sleigh_oracle
    • module mode (python -m scripts.rewrite.generate_oracle_vectors ...) via package-qualified import path
  • Preserved existing actionable error behavior for:
    • missing pypcode
    • unsupported provider names

Validation

  • python -m scripts.rewrite.generate_oracle_vectors --seed scripts/rewrite/oracle_seed_vectors.json --out NUL --providers sleigh
    • succeeds (module mode no longer import-fails)
  • python scripts/rewrite/generate_oracle_vectors.py --seed scripts/rewrite/oracle_seed_vectors.json --out NUL --providers unicorn,unknown_provider
    • still returns actionable ERROR: Unsupported oracle provider ...
  • python -m py_compile scripts/rewrite/generate_oracle_vectors.py
  • python test.py quick

CI reran automatically for this commit.

@NaC-L
Copy link
Copy Markdown
Owner Author

NaC-L commented Mar 8, 2026

Fifth-pass review update

Re-reviewed latest head (ba8b3b9) and re-ran targeted checks.

Validation run

  • build_iced/rewrite_microtests.exe -> All instruction microtests passed (including movdqa_xmm0_xmm1_basic, pxor_xmm0_xmm1_basic, pand_xmm0_xmm1_basic, por_xmm0_xmm1_basic).
  • python -m scripts.rewrite.generate_oracle_vectors --seed scripts/rewrite/oracle_seed_vectors.json --out NUL --providers sleigh -> succeeds.

I do not see any new blocking regressions in the currently active concolic/rewrite paths.

Non-blocking follow-up

There is still an architectural mismatch worth tracking: lifterSymbolic only indexes GPR/RIP/RFLAGS, while SSE2 opcodes are now globally wired in x86_64_opcodes.x. If symbolic mode is ever re-enabled for SSE2 inputs, it will need explicit XMM handling (or early rejection) to avoid assertion/index issues.

No additional blockers from this pass.

@NaC-L
Copy link
Copy Markdown
Owner Author

NaC-L commented Mar 8, 2026

Reviewed the latest feedback (4019849504) and confirmed current state.

Latest review status

  • No new blocking regressions identified on head ba8b3b9.
  • Reviewer validation passed for:
    • rewrite microtests (including SSE2 cases)
    • module-mode Sleigh oracle generation (python -m scripts.rewrite.generate_oracle_vectors ... --providers sleigh)

Current PR health

  • build (iced)
  • build (zydis)
  • rewrite-quick-gate
  • rewrite-strict-gate

Noted non-blocking follow-up

  • Symbolic-mode/XMM coverage mismatch remains an architectural follow-up (not a blocker for this PR’s active rewrite/concolic scope).

No additional code changes were required from this latest pass.

@NaC-L NaC-L merged commit c148eba into main Mar 8, 2026
4 checks passed
@NaC-L NaC-L deleted the feature/simd-sse2-phase1 branch March 8, 2026 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants