armv8-encode is a Rust project for machine-code analysis, decoding, encoding,
and rewriting on AArch64.
The crate started as a decoder/encoder pair, then grew an analysis layer
(basic blocks, control-flow graphs), then a symbolic rewriter, then full
ELF read+write support including ET_DYN. Today it can take an unmodified
aarch64 .so, edit its .text or .data, append new functions in a
fresh PT_LOAD segment that call existing PLT-bound externs (and, via a
single dlsym anchor, any dynamic symbol the process can see), and
produce a runnable byte stream the dynamic linker accepts.
The primary architectural target is AArch64; the rest of the layering
(container, mc, rewrite) stays format- and architecture-neutral
so other ISAs can plug in later.
- Decode any AArch64 instruction word the imported opcode table covers.
- Encode new instructions from typed templates.
- Linear-sweep or recursive-descent disassemble a code region.
- Build a CFG and reason about basic blocks and control flow.
- Read ELF (
.o,.so, executables) and Mach-O (.o) into a neutral container model. ET_DYN/ET_EXEC inputs preserve the full ELF surface needed to round-trip them. - Write ELF ET_REL (
.o) and ET_DYN (.so/PIE executable) byte streams that the system linker / dynamic linker accept. - Rewrite a section symbolically: lift instructions to an editable
IR with
Target::Symbol/Target::Blockoperands, mutate, lay out (with conditional-branch widening), emit, splice back. - Edit
.rodata/.dataas a sequence of pointer + bytes items (e.g. swap a function-pointer slot in a vtable). - Edit
.textof a real.soin place via the high-level [BinaryEditor] API. - Append a new function to an
.soin a fresh executable segment, complete with new read-only data, and have the new code call existing extern functions through the PLT. - Reach any libc symbol via a single
dlsymanchor. If the source library importsdlsym(one PLT entry is enough), appended code can calldlsym(RTLD_DEFAULT, "name")to resolve any symbol the dynamic loader can find —printf,strlen,getenv, anything — without needing it to be pre-anchored in the source. - Force-load another library at this one's load time.
add_library_dependency("libfoo.so")appends the name to a rebuilt.dynstr(in the appended segment) and inserts aDT_NEEDEDtag in.dynamic. The dynamic linker resolves everyDT_NEEDEDbefore firing this library's constructors, so the named library is guaranteed to be present and its symbols reachable throughdlsym(RTLD_DEFAULT, ...)by the time any code in this library runs. Pairs naturally withadd_initialiserfor "load this side library, then run my init code that uses it." - Run code at library load time.
add_initialiser(name, body, position)registers a load-time constructor. Three insertion strategies:InitialiserPosition::First— hijack.init_array[0]so the appended code runs ahead of every other ctor (including CRT helpers likeframe_dummy); a wrapper chain-tails to the original ctor so it still runs.InitialiserPosition::Last— hijack the final.init_arrayslot; same chain-back semantics, but the appended code runs after every other ctor and before only the last one.InitialiserPosition::Append— add a brand-new slot rather than hijacking. The appended code runs after all original ctors as a separate constructor. The new slot plus the extended.rela.dyn(with one newR_AARCH64_RELATIVEper slot) live in the appended segment;.dynamicis patched to point at the rebuilt sections viaDT_INIT_ARRAY/DT_INIT_ARRAYSZ/DT_RELA/DT_RELASZ/DT_RELACOUNT. Works on inputs that have no.init_arrayat all, provided the input has an existing.rela.dynand at least two unusedDT_NULLslots in.dynamic.
- Export the new function as a
.dynsymentry resolvable viadlopen+dlsymfrom any caller. The writer rebuilds.dynsym/.dynstr/.gnu.versionand regenerates.gnu.hashfrom scratch, then points the.dynamictags at the new copies in the appended segment.
End-to-end runtime tests confirm rewritten libraries load and run correctly under aarch64 Linux (via QEMU on macOS host).
The crate is split into four layers, bottom-up: container → ISA → mc → rewrite. Each layer only knows about the ones below it.
Path: src/container
Reads Mach-O and ELF object files into a neutral, format-agnostic
model: sections, symbols, relocations, optional DWARF debug info, and
Function views derived from both. The object crate handles format
parsing and gimli handles DWARF; the container layer hides both so
the rest of the crate sees one shape regardless of source.
The layer's input is &[u8]; the output is a
Container ready to feed into disassembly
or rewriting. AArch64-relevant relocations are mapped onto a neutral
enum:
Branch26,Branch19,Branch14(PC-relative branches)AdrpPage21(adrp page reference)AddPageOffset12(add immediate companion to adrp)LoadStorePageOffset12 { access_width_bytes }(ldr/str companion)Absolute(data references / GOT)Other(raw_code)(unrecognized — preserved structurally)
A ContainerKind classifies inputs as
Relocatable / SharedObject / Executable / Other.
Container::to_bytes() dispatches on this:
- Relocatable ELF / Mach-O
.o→ emitted viaobject::write::Object. Round-trip is structurally compatible. - SharedObject / Executable ELF (ET_DYN / ET_EXEC) → emitted via
object::write::elf::Writerdriven through reserve/write phases bysrc/container/elf_writer.rs. The companionElfImagestruct captures everything the neutral types deliberately don't model (program headers,.dynamictag list,.gnu.hash,.gnu.version*,.eh_frame_hdr, build-ID,.interp, per-sectionsh_offset/sh_size/sh_link/sh_info, PLT stub addresses for every dynsym extern). The writer reproduces the input's layout faithfully — file offsets and section header positions may shift, but program-header virtual addresses,.dynamictags, and dynsym/PLT-resolved call sites stay valid. - Anything else →
UnsupportedKinderror.
When a binary carries .debug_info / __debug_info, the container
also exposes a DwarfInfo with one DwarfFunction per
DW_TAG_subprogram. Container::functions() merges symbol-derived
and DWARF-derived entries — symbols take precedence, DWARF fills in
the gap when the binary is stripped.
Path: src/isa
The ISA layer owns architecture-specific instruction knowledge: raw instruction encodings, opcode tables, operand schemas, operand extraction and insertion, validation rules, aliases and canonical forms, architecture feature/version constraints.
The current AArch64 implementation lives under src/isa/aarch64. It
uses an imported opcode table as the matching foundation, decodes
table operands into typed Rust values (registers, immediates, memory
operands, branch/page targets, vector registers, vector elements,
system operands), implements
InstructionInfo for control-flow
classification, and exposes table-driven encoding alongside helpers
used by the rewrite layer.
Two disassembler entry points live here:
disassemble_bytes does fail-fast linear
sweep (every word must decode), and
disassemble_recursive walks control
flow from a set of entry points and classifies anything it doesn't
reach as data. The latter is what works on real shipped binaries.
Path: src/mc
The machine-code layer is architecture-neutral. It models decoded code in terms useful for analysis and rewriting:
- the
InstructionInfotrait that ISA crates implement so analysis stays generic - the
ControlFlowclassification (Fall,Jump,ConditionalJump,Call,Return,IndirectJump,IndirectCall,Trap) - basic blocks and the
ControlFlowGraphbuilt bymc::build_cfg
Path: src/rewrite
The rewrite layer turns a decoded code region into an editable IR
whose PC-relative operands are symbolic: instead of carrying a
hard-coded address, each branch target carries a
Target — a reference to a basic block, an
extern symbol, a constant pool entry, or a literal address. This is
what lets the layout pass move things around freely without
invalidating displacements.
The pipeline is:
bytes ──► sweep ──► instructions ──► CFG
│
▼
RewritePlan::lift
│
▼ edit operations
(mutate operands, blocks, terminators)
│
▼
lay_out(plan, base)
│
▼
emit(plan, layout) ──► bytes
│
▼
commit_to_container(...) ──► Container
│
▼
Container::to_bytes() ──► byte stream
Layout iterates to a fixed point: if an edit pushes a conditional
branch past its pcrel19 (b.cond, cbz, cbnz) or pcrel14
(tbz, tbnz) range, it widens the branch into
<inverted_cond> .Lskip ; b far_target ; .Lskip:, which can in turn
push other branches out of range — repeat until stable.
Each block stores Vec<RewriteOp> where RewriteOp is either a
single RewriteInstruction or a MacroOp (a fused multi-instruction
idiom). Two macros are recognised today:
MacroKind::LoadAddress—adrp Rd, page; add Rd, Rd, #lo12for computing a symbol's absolute address.MacroKind::AccessValue—adrp Rd, page; ldr/str Rt, [Rd, #lo12]for loading from / storing to a symbol's address.
Macros are recognised at lift time, edited as a unit, and expanded back to their component instructions on emit.
The BinaryEditor wraps the lift → edit →
layout → emit → commit pipeline behind a smaller, typed surface.
It splits its operations across two scoped sub-views:
editor.binary([BinaryState]) — whole-binary methods: append functions, declare dependencies, register exports, etc.editor.text([LiftedTextSection], optional) — section-scoped methods: redirect branches, replace instructions, etc. Populated by [BinaryEditor::lift_text_section].
use armv8_encode::container::Container;
use armv8_encode::rewrite::{BinaryEditor, Target};
let bytes = std::fs::read("libgreet.so")?;
let container = Container::from_bytes(&bytes)?;
let mut editor = BinaryEditor::for_section(&container, ".text")?;
let printf = editor.binary.symbol_by_name("printf")?;
editor
.text
.as_mut()
.unwrap()
.redirect_branch_at(0x1234, Target::Symbol(printf))?;
let new_bytes = editor.commit_to_bytes()?;
std::fs::write("libgreet.rewritten.so", new_bytes)?;When you need to interleave whole-binary and section-scoped edits,
destructure once so both &mut references coexist:
let BinaryEditor { binary, text, .. } = &mut editor;
let text = text.as_mut().unwrap();
let log = binary.add_function("hello_log", body)?;
let target = binary.function_address("greet_double").unwrap();
text.replace_instruction_at(target, /* b log */)?;The editor proxies the rewrite primitives:
redirect_branch_at(address, target)— change a branch's destination.redirect_macro_target_at(address, target)— change a macro's target.replace_instruction_at(address, instruction)— overwrite a singleton instruction.insert_after_address(address, instructions)— splice in new instructions.remove_at_address(address)— drop an op.add_function(name, instructions) -> SymbolId— append a new function in a fresh PT_LOAD R-X segment past the input's mapped range. Returns aSymbolIdcallers can pass back asTarget::Symbolto redirect existing branches at the new code. The function is registered in the static.symtabonly — it isn't visible todlsymcallers.add_function_exported(name, instructions) -> SymbolId— same asadd_function, plus promotes the new symbol to.dynsymsodlopen/dlsymcallers can resolve it by name. The writer rebuilds.dynsym,.dynstr,.gnu.version, and regenerates.gnu.hash(withnbuckets = 1for layout simplicity), then updates the captured.dynamictags so the loader follows the new copies (placed in the appended segment). The original sections stay in the file but are ignored at runtime.add_data(name, bytes, align) -> SymbolId— append read-only data alongside the new functions in the same segment. The new function can compute the blob's address via the standardadrp + addpair againstTarget::Symbol(blob_id); the rewriter's macro-fusion pass folds the pair into aLoadAddressmacro that resolves at the appended segment's vaddr.editor.binary.add_library_dependency(library_name) -> ()— force the dynamic linker to load another shared library when this one is loaded. Appends the name to.dynstr(rebuilt in the appended segment) and inserts aDT_NEEDEDtag in.dynamic. If the input's.dynamicdoesn't have a trailingDT_NULLslot to absorb the new tag in place,.dynamicis automatically relocated into the appended segment with headroom andPT_DYNAMICis rewritten to point at the new copy — works on real-world binaries (e.g. Android NDK output) that ship a single trailing DT_NULL.add_initialiser(name, body, position) -> SymbolId— register a function that runs at library load time. Three positions:First/Last— hijack an existing.init_arrayslot via a wrapper that chain-tails to the displaced original ctor. Requires the input to have a non-empty.init_array; returnsNoExistingInitArrayotherwise.Append— add a brand-new.init_arrayslot without hijacking. The new slot, plus a rebuilt.rela.dynwith one extraR_AARCH64_RELATIVE, lives in the appended segment, and.dynamicis patched to point at the new copies viaDT_INIT_ARRAY/DT_INIT_ARRAYSZ/DT_RELA/DT_RELASZ/DT_RELACOUNT. ReturnsNoExistingRelaDynif the input has no.rela.dyn. If the input's.dynamicdoesn't have DT_NULL room for the new tags,.dynamicis relocated into the appended segment automatically (same mechanism asadd_library_dependency).
commit() -> Containerandcommit_to_bytes() -> Vec<u8>— drive the layout/emit/commit pipeline.
commit_to_bytes automatically routes through the right writer
path: in-place when no functions were appended (existing .text
edits stay within the source extent), or the append-PT_LOAD path
when new functions live in a fresh segment.
For ET_DYN inputs, the reader populates ElfImage.plt_stubs with a
SymbolId → plt_stub_vaddr map by walking .rela.plt.
Container::callable_address_of_symbol
returns the PLT stub address for any extern that has one. Emit folds
Target::Symbol(extern_id) into a direct bl <stub> at write time,
so appended code can call existing PLT-bound externs (puts,
printf, etc.) without adding new dynsym entries.
Direct access to the underlying primitives is still available for callers who need to bypass the editor:
RewritePlan::lift(cfg, instructions)RewritePlan::lift_with_container(cfg, instructions, container)RewritePlan::from_instructions(instructions, container)— build a plan from a raw instruction list (used byadd_functionto fuseadrp + addmacros in user-supplied bodies).lay_out(&plan, base, container) -> Layoutemit(&plan, &layout, container) -> EmitOutputcommit_to_container(&container, section, output) -> Container
For data sections there's a parallel API:
DataSection::lift(container, section_id) -> DataLiftDataSection::redirect_pointer_at(index, new_target)emit_data_section(plan) -> DataEmitOutputcommit_to_data_container(container, section, output, unhandled)
The examples/ directory contains runnable demonstrations of each
capability:
examples/dump.rs— inspect any Mach-O / ELF binary. Prints the container header, section table, symbol tables, derived functions, DWARF subprograms, relocation summary, and a symbol-resolved disassembly of every text section.--cfg NAMEdraws the control-flow graph of a single function as boxed instruction blocks linked by labelled edges.examples/elf_inspect.rs— deep ELF surface inventory: program headers, section headers,.dynamictags,.dynsym,.gnu.version*,.gnu.hash, build-ID,.eh_frame_hdr,.interp. Useful for understanding what an ET_DYN input carries before editing.examples/text_edit_so.rs— readlibgreet.so, findgreet_double'slslinstruction, replace it (changingn*2ton*4), write the result, re-parse to confirm. Demonstrates theBinaryEditorAPI end-to-end on an ET_DYN.examples/decorate_so.rs— append a new functiongreet_quintupletolibgreet.soand patchgreet_doubleto tail-call it. Demonstratesadd_function+replace_instruction_atfor the "decorator" pattern.examples/decorate_so_with_log.rs— the most ambitious static-decorator example. Appends two new strings viaadd_data(the symbol name"puts"and the message), appends a new function viaadd_functionthat:- resolves
putsat runtime viadlsym(RTLD_DEFAULT, "puts")— the only PLT-bound extern the appended code needs isdlsymitself; - computes the message address via fused
adrp + addagainst the appended symbol; - calls the resolved
putsand returns the originaln*2result. Patchesgreet_doubleto tail-call the new function so each call prints a line and returns its usual answer.
- resolves
examples/call_printf_via_dlsym.rs— proves the dlsym anchor is a universal resolver. libgreet.so does not importprintf, yet appended code callsprintfby going throughdlsym(RTLD_DEFAULT, "printf")and invoking the resolved pointer. One PLT anchor (dlsym) subsumes the need for any number of imported externs.examples/add_initialiser.rs— appends a function that runs at library load time before the host'smain()reaches any libgreet code. Hijacks the last.init_arrayslot (originally pointing at libgreet's own__attribute__((constructor))), redirects it to a freshly-appended wrapper, and chains the wrapper back to the original ctor — so both run, in user-first order. Verify the result with./host ctor(expectedctor_marker=17, proving both ran) and./host(stilldouble=42 offset=107, proving normal functionality is intact).examples/export_function.rs— appends a new functiongreet_quintupleviaadd_function_exportedso it appears in.dynsymand is resolvable viadlopen/dlsym. Demonstrates the.gnu.hashregeneration path; pair with the runtime fixture'shost_dlopenbinary to verify the export end-to-end.
Run any example with cargo run --example NAME. Examples that
target libgreet.so need the runtime fixture built first; see
tests/elf_runtime/README.md for setup.
236 unit tests cover the decoder, encoder, sweep, recursive descent,
container reader/writer, rewrite IR, data IR, the editor API, and
operand-kind coverage assertions. Fixtures live under
tests/fixtures/aarch64. Each fixture contains source assembly plus
encoded instruction words and otool-format mnemonics; the unit
tests parse those, decode the words, format the result, and compare
against otool. They also exercise the linear sweep, CFG
construction, and rewrite pipelines on the same inputs to catch
regressions end-to-end.
cargo testFor wider local comparison against real Mach-O binaries:
cargo test --test otool_compare -- --ignored --nocapture
ARMV8_COMPARE_BINARY=/path/to/binary cargo test --test otool_compare -- --ignored --nocapture
ARMV8_COMPARE_STRICT=1 cargo test --test otool_compare -- --ignored --nocaptureThe tests/elf_runtime/ directory builds a small aarch64 Linux
fixture (libgreet.so + libdep.so + host + host_dlopen) inside a Docker
image, exercises the full read → edit → write → load → run
pipeline, and asserts on host stdout. Seventeen tests cover:
- baseline (sanity: harness works, fixture runs).
- identity round-trip —
Container::to_bytes()produces a loadable.sowith no edits. - no-op text rewrite — full lift → emit → commit pipeline with no edits, host still works.
- data-section edit — redirect a function-pointer slot in
.data. - ET_DYN round-trip — read
libgreet.so, write it back, host loads and runs against the rewritten copy. - in-place text edit — patch
greet_double'slslconstant viaBinaryEditor, host observes new return value. - appended function — add
greet_quintupleviaadd_function, redirectgreet_doubleto it, host observes new return value. - appended function resolving extern via dlsym — add
greet_log_doublethat callsdlsym(RTLD_DEFAULT, "puts")then invokes the resolved pointer to print a string before returningn*2. Host observes both behaviours. - appended function calling unimported extern via dlsym — add
greet_printf_doublethat callsprintf(which libgreet.so does not import) by going throughdlsym(RTLD_DEFAULT, "printf"). Demonstrates that a single PLT anchor (dlsym) is enough to reach any libc function from appended code. - exported appended function — add
greet_quintupleviaadd_function_exported, thenhost_dlopenresolves it by name through the regenerated.gnu.hashand calls it. - appended initialiser hijacks
.init_array(Last) — appends an init function that writes a marker, redirects the last.init_arrayslot to a wrapper around it, chains the wrapper back to the library's original constructor, and verifies viahost ctorthat both ran (final marker 17 = 0x10 appended | 0x1 chained). - appended initialiser hijacks
.init_array(First) — same body, but redirects slot[0] (originallyframe_dummy) so the appended code runs ahead of every other library ctor. Marker still ends at 17 because the original ctor still runs from slot[1]. - appended initialiser adds a brand-new
.init_arrayslot (Append) — rebuilt.init_arrayand.rela.dynlive in the appended segment,.dynamicpatched accordingly. Marker ends at 16 (greet_ctor sets bit 0x1, then the appended slot overwrites with 0x10 because it runs after the originals rather than via a hijack-and-chain-back). - forced library load via
add_library_dependency— fixture ships an unlinkedlibdep.sowhose ctor sets a marker. Without rewriting, host's dlsym lookup of the marker returns 0 (libdep not loaded). Afteradd_library_dependencyinjects a DT_NEEDED for libdep into libgreet's.dynamic, the loader pulls libdep in and the marker reads 0xab. .dynamicrelocation when DT_NULL room is short — synthesises a libgreet variant whose.dynamichas exactly one trailing DT_NULL (matching real-world Android NDK output), callsadd_library_dependency, and verifies that.dynamicis relocated to the appended segment, PT_DYNAMIC is rewritten, and the loader still honours the new dep at runtime.- many-deps via relocation — N=8
add_library_dependencycalls on the same one-DT_NULL fixture, verifying that the relocated.dynamic's DT_NULL headroom reserve absorbs all of them without re-relocating. - PT_PHDR-bearing input — synthesises a PT_PHDR program
header on libgreet's container (matching Android NDK
output, which ships PT_PHDR by default), calls
add_library_dependency, and verifies the writer drops PT_PHDR cleanly and the loader still accepts the file at runtime.
Setup (one-time):
tests/elf_runtime/setup.sh(Probes for Docker, installs QEMU arm64 binfmt handlers if needed, builds the runtime image. Idempotent.)
Run:
cargo test --test elf_runtime -- --ignored --nocaptureThe tests/macho_runtime/ directory mirrors the ELF harness but
runs natively on Apple Silicon — no Docker/QEMU. Build script
uses clang -dynamiclib plus codesign -s - (ad-hoc) so dyld
loads the rewritten dylib. Eight tests cover:
- baseline (sanity: fixture builds, signs, loads, runs).
- ET_DYN-shaped round-trip — read
libgreet.dylib, write it back throughContainer::to_bytes(Phase 1 passthrough writer + ad-hoc re-sign), host loads and runs the rewritten copy with identical stdout. - in-place __text edit — patch
_greet_double'slsl Wd, Wn, #1to#2viaBinaryEditor::replace_instruction_at, commit throughcommit_to_bytes, host observesdouble=84instead ofdouble=42. Validates that BinaryEditor composes with the Mach-O writer end-to-end. - appended function — add
_greet_quintupleviaBinaryEditor::add_function, redirect_greet_double's first instruction to tail-call it. The new function lands in a fresh R-XLC_SEGMENT_64placed before__LINKEDITin the file (so codesign's signature extension doesn't swallow it) and at vmaddr past__LINKEDIT's mapped range. Host observesdouble=105(21*5) confirming dyld loaded the new segment and PC-relative branches in/out of it resolve. - appended data referenced by appended function — call
add_datawith a 4-byteu32literal, thenadd_functionwith a body that loads the literal viaadrp + add(macro-fused into aLoadAddressagainst the data symbol) andldr w0, [x0]. Patches_greet_doubleto tail-call the new function. Host observes the loaded value, proving the appended data lives at the expected vaddr in the R-X segment and the appended code reads from it correctly. - exported appended function —
add_function_exportedappends_greet_quintupleand registers it for export. The Mach-O writer rebuilds the export trie (LC_DYLD_EXPORTS_TRIE) with the new entry, extends LC_SYMTAB's symbol + string tables, and bumps LC_DYSYMTAB.nextdefsym. host_dlopen looks up the new symbol viadlsymat runtime and calls it; result=35 proves the trie + symtab regeneration is correct. - forced library load via
add_library_dependency— fixture ships an unlinkedlibdep.dylibwhose ctor sets a marker. Without rewriting, host's dlsym lookup of the marker returns 0 (libdep not loaded). Afteradd_library_dependencyinjects an LC_LOAD_DYLIB into libgreet.dylib's load-command list (using headerpad room reserved at link time via-Wl,-headerpad,0x1000), dyld pulls libdep in alongside libgreet and the marker reads 0xab. - appended initialiser hijacks
__init_offsets—add_initialiser(Mach-O path) appends a wrapper that preserves the dyld-supplied(argc, argv, envp)registers, calls the user body, and chain-tail-calls the original_greet_ctor. The Mach-O__init_offsetssection's first slot (4-byte image-base offset) is overridden to point at the wrapper. Host observesctor_marker=17(= 0x10 from appended | 0x1 from chained), proving both ran in order. Test also asserts the output has no__APPENDEDsegment — the wrapper lands in__TEXTfree-region padding via Phase 6.5's intra-segment placement, which is required for App Store submissions.
By default the Mach-O writer prefers intra-__TEXT
placement: appended functions / data land in free space
inside the existing __TEXT segment (typically a few KB of
padding between sections + at the segment's tail). This
keeps the output compatible with App Store review (which
rejects dylibs with multiple R-X segments). Operations that
need the __APPENDED-segment fallback —
add_function_exported (rebuilds export trie + symtab,
needs __LINKEDIT shifting) and add_library_dependency
(splices a new load command and shifts content) — flag this
explicitly and produce output that loads correctly on macOS
but won't pass App Store review.
For App Store builds you can call
editor.binary.prohibit_new_segments() to enforce the
constraint statically: any subsequent operation that would
require an __APPENDED segment (incompatible exports,
library deps, or an oversized payload) errors at queue time
with TextEditorError::WouldCreateNewSegment instead of
silently producing an output that won't pass review. The
flag is Mach-O-only; ELF treats it as a no-op since
appended PT_LOAD is the standard pattern there.
add_data writes read-only bytes into the same R-X segment
as add_function. Writable appended data is future work —
macOS rejects RWX mappings, so writable data would need a
separate RW segment.
cargo test --test macho_runtime -- --ignored --nocaptureRequires macOS on aarch64 with Xcode command-line tools
(clang, codesign) on PATH.
use armv8_encode::isa::aarch64::{self, DecodedOperand};
fn branches_to_target(
base_address: u64,
words: &[u32],
target: u64,
) -> Vec<u64> {
words
.iter()
.enumerate()
.filter_map(|(index, word)| {
let address = base_address + (index as u64 * 4);
let instruction = aarch64::decode_instruction(address, *word).ok()?;
let has_target = instruction.operands.iter().any(|operand| {
matches!(operand, DecodedOperand::BranchTarget(value) if *value == target)
});
has_target.then_some(address)
})
.collect()
}use armv8_encode::container::Container;
use armv8_encode::isa::aarch64;
let bytes = std::fs::read("hello.o")?;
let container = Container::from_bytes(&bytes)?;
for section in container.text_sections() {
let (base, code) = section.for_disassembly().unwrap();
let instructions = aarch64::disassemble_bytes(base, code)?;
println!(
"{}: {} instructions at {:#x}",
section.name,
instructions.len(),
base
);
}
for function in container.functions() {
println!("fn {} @ {:#x} ({} bytes)", function.name, function.address, function.size);
}use armv8_encode::container::Container;
use armv8_encode::isa::aarch64::{Aarch64Mnemonic, DecodedOperand};
use armv8_encode::rewrite::{BinaryEditor, RewriteInstruction, RewriteOperand};
let bytes = std::fs::read("libgreet.so")?;
let container = Container::from_bytes(&bytes)?;
let mut editor = BinaryEditor::for_section(&container, ".text")?;
// Find an instruction by walking editor.text.instructions(),
// build a replacement, install it.
# let lsl_addr = 0u64;
# let rd = todo!(); let rn = todo!();
let new_lsl = RewriteInstruction {
mnemonic: Aarch64Mnemonic::Lsl,
operands: vec![
RewriteOperand::Decoded(DecodedOperand::Register(rd)),
RewriteOperand::Decoded(DecodedOperand::Register(rn)),
RewriteOperand::Decoded(DecodedOperand::Immediate(2)),
],
original_address: Some(lsl_addr),
};
editor
.text
.as_mut()
.unwrap()
.replace_instruction_at(lsl_addr, new_lsl)?;
let rewritten = editor.commit_to_bytes()?;
std::fs::write("libgreet.rewritten.so", rewritten)?;use armv8_encode::container::Container;
use armv8_encode::isa::aarch64::{self, Aarch64Mnemonic, DecodedOperand};
use armv8_encode::rewrite::{BinaryEditor, RewriteInstruction, RewriteOperand, Target};
let bytes = std::fs::read("libgreet.so")?;
let container = Container::from_bytes(&bytes)?;
let mut editor = BinaryEditor::for_section(&container, ".text")?;
// Destructure once so both scopes are usable side-by-side.
let BinaryEditor { binary, text, .. } = &mut editor;
let text = text.as_mut().unwrap();
// Resolve an existing PLT-bound extern. The reader populated
// elf_image.plt_stubs at parse time; emit will fold a call to
// this symbol into `bl <plt_stub>`.
let puts = binary.symbol_by_name("puts@GLIBC_2.17")?;
// Append a string in the new segment.
let msg_id = binary.add_data("hello_msg", b"hello from new code\0", 1)?;
// Build a new function (instructions elided — see
// examples/decorate_so_with_log.rs for the full body). The adrp+add
// pair against Target::Symbol(msg_id) fuses into a LoadAddress
// macro; bl Target::Symbol(puts) folds to the existing PLT stub.
# let body: Vec<RewriteInstruction> = vec![];
let log_id = binary.add_function("hello_log", body)?;
// Redirect an existing function to call the new one.
let target = binary.function_address("greet_double").unwrap();
text.replace_instruction_at(
target,
RewriteInstruction {
mnemonic: Aarch64Mnemonic::B,
operands: vec![RewriteOperand::Branch(Target::Symbol(log_id))],
original_address: Some(target),
},
)?;
let rewritten = editor.commit_to_bytes()?;
std::fs::write("libgreet.rewritten.so", rewritten)?;What's deliberately not yet implemented:
- Length-growing in-place text edits. Inserting instructions
into an existing function past its source extent needs the
rewrite layer to relocate the function to a new vaddr and update
PC-relative addressing. Workaround today: use
add_functionto put the new code in a fresh segment and tail-call into it. - Adding new dynsym imports via fresh PLT stubs. We don't
synthesise new
.rela.pltentries or PLT stubs for imports the source library didn't already carry. In practice this rarely matters: if the source importsdlsym(one PLT entry is enough), appended code can resolve any other libc symbol at runtime viadlsym(RTLD_DEFAULT, "name")— seeexamples/call_printf_via_dlsym.rs. The "grow the import table" path is still future work, but the dlsym pattern usually obviates the need for it. - Versioned exports.
add_function_exportedemits the new symbol with versym = 1 (unversioned/base). Producing a versioned export with a.gnu.version_ddefinition isn't yet supported. .eh_frame_hdrregeneration. When existing functions move (Stage 6.3 grown-text path), the binary-search table inside.eh_frame_hdrbecomes stale. Today we copy.eh_frame_hdrverbatim — fine for in-place edits and for appending new code (the new code has no FDEs).- PT_PHDR rewriting isn't done. The append-PT_LOAD writer
relocates the program header table to file end and drops
any input PT_PHDR rather than rewriting it. The dynamic
linker uses
e_phofffor the canonical lookup; PT_PHDR is the runtime-introspection convenience copy and bionic / glibc tolerate its absence (dl_iterate_phdrfalls back, the libgcc unwinder copes). Required for Android NDK inputs, which carry PT_PHDR by default. Rewriting PT_PHDR to point at the relocated phdr table (Option 2 in the proposal) is the proper fix for completeness but isn't needed in practice today. - Mach-O ET_DYN rewriting.
object::write::machois much weaker thanobject::write::elf::Writer; the Mach-O dylib writer is its own substantial project. Mach-O.oround-trip works. - PE/COFF. Not started.
- DWARF line tables (file/line lookup for arbitrary
addresses) and inlined-callsite metadata — only
DW_TAG_subprogramis lifted today. - Jump-table / vtable / indirect-branch analysis to recover targets recursive descent currently can't follow.
- Branch islands for
b/bldisplacements beyond ±128 MiB. - Resolution of
Target::Constant(literal-pool layer).
cargo check # fast type-check
cargo test # 236 unit tests
cargo test -- --ignored # also runs the runtime harness (needs Docker)Correctness matters more than surface area. The project prefers generated or externally validated ISA data over hand-maintained instruction semantics where possible.