feat: 1.3.0 — analysis additions, decompiler polish, CLI improvements#3
Merged
feat: 1.3.0 — analysis additions, decompiler polish, CLI improvements#3
Conversation
… set
- New flashkit/abc/opcodes.py as single source of truth:
- 164 OP_UPPERCASE constants covering full AVM2 instruction set
(previously ~75 in constants.py, rest as hex literals in disasm.py)
- OPCODE_TABLE maps opcode -> (mnemonic, operand_format)
- MNEMONIC_TO_OPCODE reverse lookup for future assembler use
- Trim flashkit/abc/constants.py to multinames, namespaces, trait kinds, flags
- Drop duplicate _OPCODE_TABLE + _EXTRA_OPCODES + _build_lookup from disasm.py;
it now references OPCODE_TABLE directly. Hex-literal opcodes replaced with
named constants throughout.
- Rename OP_lowercase -> OP_UPPERCASE (~353 callsites) across:
abc/builder.py, abc/disasm.py, analysis/{call_graph,field_access,references,
strings,unified}.py, tests/abc/test_disasm.py, tests/analysis/{test_field_access,
test_strings}.py
- Fix two disasm tests that asserted 0x01 was unknown; 0x01 is OP_BKPT in the
real AVM2 spec, now correctly recognized. Tests use unassigned 0x0A.
TraitInfo previously stored traits as raw bytes plus name/kind, forcing callers to re-parse bytes whenever they needed slot_id/method_idx/etc. Now all structured fields are populated at parse time: slot_id, type_name, vindex, vkind (Slot/Const) method_idx, disp_id (Method/Getter/Setter) class_idx (Class) function_idx (Function) attr, metadata (all kinds, when ATTR_Metadata) Round-trip fidelity preserved via a _raw cache: - Parser stashes original bytes in trait._raw. - Writer reuses _raw when trait is unmodified (byte-identical output). - Writer re-serializes from fields when trait is mutated or built from scratch (enables AbcBuilder / future AbcEditor to produce correct bytes). Verified byte-perfect round-trip on a 710KB production ABC. - Simplify flashkit/info/member_info.py: drop parse_slot_trait / parse_method_trait / parse_class_trait helpers and read fields directly. - Simplify AbcBuilder.trait_slot / trait_method / trait_class: construct the dataclass from fields instead of hand-building bytes. - Replace 3 byte-level test classes in test_member_info.py with 2 end-to-end tests that verify fields survive parse/write round-trip.
Aligns CONSTANT_/TRAIT_/INSTANCE_/METHOD_/ATTR_ constants with the already-uppercase OP_ opcode constants. Previously they used the Adobe spec's mixed-case convention (CONSTANT_QName, TRAIT_Slot, etc.) which clashed visually with the uppercase opcode style. Rename mapping (compound names get _ separators): CONSTANT_QName -> CONSTANT_QNAME CONSTANT_PackageNamespace -> CONSTANT_PACKAGE_NAMESPACE CONSTANT_PrivateNs -> CONSTANT_PRIVATE_NS CONSTANT_TypeName -> CONSTANT_TYPENAME TRAIT_Slot -> TRAIT_SLOT METHOD_HasOptional -> METHOD_HAS_OPTIONAL INSTANCE_ProtectedNs -> INSTANCE_PROTECTED_NS ATTR_Final -> ATTR_FINAL ...(40 constants total, 351 callsites across 17 files) Downstream callers (bh-deobfuscator, bh-mcp) don't import these names — only the injector has its own local copies, which stay unchanged. All 318 tests pass; byte-perfect round-trip on 710KB production ABC.
Adds convenience methods on AbcFile that collapse the "check idx > 0
and < len / fall back to sentinel" pattern used by decompilers,
analyzers, and downstream tools:
Pool accessors (return AVM2 spec sentinel for idx 0 / out of range):
abc.string(idx) -> str ("" on miss)
abc.integer(idx) -> int (0 on miss)
abc.uinteger(idx) -> int (0 on miss)
abc.double(idx) -> float (0.0 on miss)
Namespace accessors:
abc.namespace_name(idx) -> str (resolved string value)
abc.namespace_kind(idx) -> int (kind byte)
Multiname accessors (delegate to flashkit.info.member_info for the
resolution logic so name-resolution stays in one place):
abc.multiname_full(idx) -> "package.Name" or "*"
abc.multiname_name(idx) -> unqualified name, handles TypeName
abc.multiname_type(idx) -> alias of multiname_name for trait types
abc.multiname_namespace(idx) -> package string
abc.multiname_is_attr(idx) -> True for XML @attr forms
abc.multiname_is_runtime(idx) -> True if name/ns need runtime lookup
Imports from info.member_info are lazy (inside methods) to avoid a
circular import between types.py and member_info.py.
All 318 tests pass; real-SWF spot-check returns expected values
(Vector.<int>, flash.display.MovieClip, etc.).
Creates flashkit/decompile/ with lazy-loaded public API.
- __init__.py: module __getattr__ lazy-loads submodules on first use, so
`import flashkit` stays fast for callers that never decompile anything.
Exposes decompile_method, decompile_method_body, decompile_class,
list_classes, DecompilerCache (all pending implementation).
- helpers.py: pure utility functions used across the decompiler pipeline.
- pop_n: stack pop with underflow-tolerant fallback
- fmt_hex / fmt_hex_const / to_hex_if_int: numeric formatting
- escape_str: AS3 string literal escaping (control chars, U+2028/2029)
- fmt_call / binop / bitwise_binop: expression formatters
- is_type_default / strip_redundant_cast / add_type_cast_if_needed:
type coercion helpers
- has_outer_parens / needs_ternary_wrap / find_op_outside_parens /
wrap_for_logical: precedence/paren-aware string analysis
- expand_multiline_stmt: indent object-literal newlines correctly
- access_modifier: namespace kind -> public/private/protected/internal
- collect_mn_package_namespaces / collect_mn_package_namespaces_typed:
wildcard-import harvesting for the class decompiler
- skip_operands: fast instruction advance for analysis passes
All helpers operate on flashkit's AbcFile directly (no tuple indexing into
the parse tree — uses .multiname_pool[i].kind etc.) and use the UPPERCASE
OP_ and CONSTANT_ constants.
318/318 tests still pass.
The wildcard-import collector used `name[0].isupper()` to distinguish type multinames from property/method multinames. That check rejects obfuscated class names like `_-Sg`, `_-tp`, `_-R3` produced by common AS3 obfuscators, causing their packages to be dropped from the import list and leading to unresolved symbols in decompiled output. Extracts a `_looks_like_type_name()` helper that accepts either an uppercase first character or a leading underscore. Covers both real AS3 conventions and obfuscated production bytecode. Verified against obfuscated symbols `_-Sg`, `_-R3`, `_-tp` and against regular AS3 class names; property/local names still correctly rejected.
Dropped three functions from helpers.py that used a "does this name look like a type?" heuristic (uppercase/underscore prefix check): - _looks_like_type_name - collect_mn_package_namespaces - collect_mn_package_namespaces_typed - _collect_typename_param Name-based triage is fundamentally unreliable: production SWFs obfuscate both type names and member names with the same prefix shapes, so any pattern check will either wrongly include members or wrongly exclude types. Import collection will instead be driven from usage context during the class decompiler port — when the decompiler is emitting a type annotation, it knows; when it's emitting a method call, it knows. For the genuinely ambiguous FINDPROPSTRICT/GETLEX cases with namespace-set multinames, we can cross-check against the ABC's own class declarations (instances[].name / .super_name / .interfaces) as structural truth. Kept `typename_param_indices` as a renamed public helper (was the only caller of the dropped functions that's still useful on its own). Also cleaned up now-unused constant imports in helpers.py. 318/318 tests still pass.
First working end-to-end decompiler on feat/decompile. Produces real
structured AS3 (package/class framing, if/else/while/for/switch,
function signatures, typed locals, casts, ternaries) from AVM2
bytecode. Verified on a real obfuscated production SWF with 134
classes — outputs recognizable AS3 with correct types and field
declarations.
New modules:
flashkit/decompile/_adapter.py
AbcView wraps a parsed AbcFile to present pools/methods/traits in
the shape the ported algorithm expects (tuple-shaped multinames,
name_idx/super_idx fields, method_bodies as dict keyed by
method index, aliases like mn_full / ns_kind / type_name).
Keeps the decompiler's algorithm faithful while letting flashkit's
public AbcFile API stay clean.
flashkit/decompile/method.py (~4400 LOC)
MethodDecompiler — stack simulation + CFG-style structuring with
pattern matching for if/else, while, do-while, for, for-in,
for-each, switch, try/catch, ternary, compound assignments,
pre/post inc/dec, short-circuit && / ||.
flashkit/decompile/class_.py (~850 LOC)
AS3Decompiler — orchestrates MethodDecompiler per class, emits
package/class framing, imports (type-driven from opcode semantics),
field declarations, constructor signature, method bodies.
flashkit/decompile/_helpers_full.py (~530 LOC)
Internal helper module supplying the fuller utility surface the
ported algorithm expects. Will be collapsed into helpers.py in a
follow-up once the decompiler is proven stable.
flashkit/decompile/cache.py
DecompilerCache — parses each SWF once; decompile_class /
decompile_method / list_classes take a SWF path.
flashkit/decompile/__init__.py
Public API: decompile_class, decompile_method,
decompile_method_body, list_classes, DecompilerCache. All accept
AbcFile or Workspace. Classes selected by index or by name (short
or fully-qualified; ambiguous short names raise with a hint).
Module __getattr__ keeps import lazy so `import flashkit` stays
fast for callers that never decompile.
Supporting additions:
flashkit/abc/parser.py
New read_s24() primitive (signed 24-bit branch offsets).
flashkit/abc/opcodes.py
New match_local_incdec() + _INC_OPS / _INCDEC_OPS sets. Detects
post/pre increment/decrement patterns after a getlocal.
flashkit/cli/decompile.py
New CLI subcommand:
flashkit decompile FILE --list
flashkit decompile FILE --class NAME
flashkit decompile FILE --class NAME --method NAME
flashkit decompile FILE --all --outdir PATH
tests/decompile/test_decompile.py
Synthetic smoke tests (build-decompile round-trip on minimal
classes, ambiguity resolution). Optional real-SWF tests gated on
FLASHKIT_TEST_SWF env var — local development only, never commits
a SWF.
Test status: 322 pass, 2 skipped (opt-in real-SWF tests).
Three algorithmic wins inside the pattern-based control-flow structurer,
replacing per-call linear scans with precomputed tables built once per
method. No hardcoded bounds — all derived from input structure.
- _find_back_goto: O(N) linear scan -> O(log N) bisect into a
label -> list[goto_site_idx] table built once at _structure_flow
entry. On a real production method the hot path dropped from
~9s of scanning to ~0.6s of bisect lookups.
- _fold_while_to_for_recursive: O(N) per-level brace rescan ->
O(1) lookup into a precomputed {open_stmt_idx: close_stmt_idx}
table. Eliminates ~1.1M _count_net_braces calls on a large
method. New _build_brace_close_map() helper runs in a single
linear pass over the statement list.
- _struct_block: memoize by (start, end, id(loop_ctx)). Chained
if/else/elseif cascades repeatedly call _struct_block on the
overlapping tail range [target_pos+1, end) — without this, each
level recomputes the entire tail, producing exponential work in
nesting depth. Memoization turns that into linear (121K calls
dropped to ~3K on the profiled method).
All pre-existing state (loop_label_counter, needs_loop_label) is
saved/restored around each entry so inline-function re-entrancy
(issue #21) is still safe.
Pathological methods whose goto structure doesn't match the recognised
patterns can still produce large output — that is an algorithmic
limitation of pattern-based structuring, not a scan-time issue, and
is the subject of the planned CFG+dominator rewrite.
322 tests pass, real-SWF parse+round-trip still byte-perfect.
New flashkit.graph package with a linear-time CFG builder: - BasicBlock / CFG dataclasses with successors, predecessors, exception_handlers, and kind metadata. - build_cfg_from_bytecode(instructions, exceptions) collects leaders from branch targets, post-branch offsets, and exception region boundaries; slices the instruction stream into blocks; wires successors in canonical order (fall-through then branch target for conditionals, default-then-cases for lookupswitch); inverts to unique-by-identity predecessors. - Exception handlers are attached to every block whose range lies inside [from_offset, to_offset); catch-entry blocks are marked with kind="catch_entry". Attachment uses bisect over the sorted block start offsets so it is O((B + H) log B). Testing: 12 synthetic tests cover straight-line, returnvoid/ returnvalue/throw terminators, unconditional jumps, conditional branches, back-edges, lookupswitch (coalesced and distinct targets), exception regions, per-instruction block membership, and monotonic block ordering. Opt-in FLASHKIT_TEST_SWF smoke builds a CFG for every method body in a real production SWF and asserts successor/ predecessor consistency across all blocks.
New flashkit/graph/dominators.py with the Cooper-Harvey-Kennedy
iterative algorithm (CHK 2001, §3):
- compute_idom(cfg) -> {block_idx: idom_block_idx}. Entry self-
dominates; unreachable blocks map to themselves so the result is
total over cfg.blocks.
- compute_ipostdom(cfg) -> {block_idx: ipostdom_block_idx}. Runs the
same algorithm on the reversed CFG.
- Single-exit methods: reverse the edges and root at the exit.
- Multi-exit methods: introduce a virtual super-exit (sentinel -1)
that has edges to every real exit; blocks whose only real post-
dominator is the super-exit report -1. Real exits always self-
post-dominate in the returned map.
- Methods with no exit (pure infinite loops): all blocks map to -1.
Reverse-postorder uses an explicit stack instead of recursion so
deep method bodies don't hit Python's recursion limit. The core CHK
loop is factored as _compute_idom_generic so both forward and
reverse-augmented variants share one implementation.
Testing: 11 synthetic tests (trivial, linear chain, diamond, loop,
multi-pred merge, unreachable, single-exit post-dominators, multi-
exit post-dominators, idom-chain-reaches-entry property). Opt-in
FLASHKIT_TEST_SWF smoke validates idom chains on every method body
of a real production SWF.
New flashkit/graph/loops.py: - Loop dataclass (header, tail, body, exits, parent) and LoopTree with top_level_loops() + children_of(loop) accessors. - find_loops(cfg, idom) identifies back-edges (u -> v where v dominates u), groups multiple back-edges sharing a header into one loop (body is the union of per-tail backward-BFS regions), and computes exit blocks as body members with an outside successor. - build_loop_tree(loops) wraps the flat list. - Parent linking by strict subset containment of bodies; smallest enclosing ancestor wins. O(L^2) in the loop count, fine because real methods have at most a few dozen loops. Testing: 9 synthetic tests — no-loop CFGs (linear + diamond), single while loop, self-loop, nested loops, sibling loops, loops merged by shared header, loop exit detection. Opt-in real-SWF smoke confirms loop detection terminates on every method body and the structural invariants (header in body, tail in body, header is a successor of tail) hold across a real production SWF.
New flashkit/decompile/ast/ package: - nodes.py: 30+ dataclass node types covering AS3 statements and expressions. Literal, Identifier, MemberAccess, IndexAccess, MethodCall, NewExpr, BinaryOp, UnaryOp, TernaryOp, AssignExpr, CompoundAssignExpr, CastExpr, IsExpr, AsExpr, TypeofExpr, DeleteExpr, InExpr, ArrayLiteral, ObjectLiteral, FunctionExpr; BlockStmt, IfStmt, WhileStmt, DoWhileStmt, ForStmt, ForInStmt, ForEachStmt, SwitchStmt, TryStmt, ReturnStmt, ThrowStmt, BreakStmt, ContinueStmt, LabeledStmt, ExpressionStmt, VarDeclStmt, plus SwitchCase, CatchClause, ObjectProperty helper nodes. - printer.py: AstPrinter.print(node) dispatches on node type via a _p_<ClassName> method table. Precedence-driven parenthesisation (no defensive parens): each expression knows its precedence, each child is emitted in the parent's precedence context, parens only emitted when a child's precedence is lower than the context (or equal + right-of-left-assoc). Left-assoc (binary ops), right-assoc (=, compound assign, ?:) handled explicitly. String literals escape via helpers.escape_str. Numbers: NaN/Infinity constants preserved, trailing .0 collapsed. 4-space indentation, configurable. - ``else if`` chains are produced naturally by nesting an IfStmt directly in the else slot of its parent (no special "else if" flattening pass needed). - ForStmt init piece is printed without its trailing semicolon when it's a VarDeclStmt or ExpressionStmt; the for-header syntax supplies its own separators. Testing: 55 tests covering every node type's print output plus precedence edge cases (binary-in-binary, ternary-in-binary, assignment-in-binary, right-assoc chains, else-if chains, deeply nested member access).
New flashkit/decompile/stack.py — one-block AVM2 stack simulator:
- BlockStackSim(abc).run(bb) -> BlockSimResult with statements (side-
effecting AST produced along the way), stack (expressions still
live at block exit), terminator kind, branch_condition (always in
branch-taken-when-truthy form; iffalse is converted by wrapping in
UnaryOp("!")), and switch_targets.
- Handles every opcode family that appears in real compiler output:
push/pop/dup/swap, all locals (including specialised getlocal_0..3),
every binary op and comparison, unary ops, coercion/convert (passed
through as CastExpr so later idiom rewrites can recognise them),
full property access (getlex, findprop, getproperty, setproperty,
initproperty, getslot, setslot, getsuper), call family (callproperty
with/without void, callsuper, call, construct, constructprop,
constructsuper), newarray/newobject pattern assembly from the
preceding pushes, is/as/instanceof/in, typeof, all ifcc compare-
and-branch variants (ifnlt/ifnle/ifngt/ifnge wrapped in !),
lookupswitch, return/throw.
- Unknown or unhandled opcodes are logged at DEBUG level and skipped
so the simulator never crashes on exotic bytecode.
- Scope opcodes (pushscope/popscope/pushwith/getscopeobject/
getglobalscope) don't emit statements — they pop values quietly
since scope state is opaque to the AST.
- findpropstrict + getproperty on the same name collapses to a single
Identifier (the standard AS3 compiler idiom for loading a lexical
name).
Testing: 42 unit tests hitting every opcode family above plus 1 opt-
in real-SWF smoke that exercises every block of every method body
in a production SWF (simulator is allowed to leave expressions on
the stack; it must not crash).
New flashkit/decompile/structure.py — converts CFG + dominators +
loops + per-block AST into a tree of structured AS3 statements.
Algorithm:
- Post-dominator–driven recursive descent (structure_region).
- On a loop header: emit WhileStmt, recurse with stop_at=header so
back-edges cut the inner walk. Classify successors as (body_entry,
exit_block) by loop-body membership; the condition polarity is
decided from which side the exit lives. Fallback for headers that
aren't simple conditionals is ``while (true)`` with the body inlined.
- On a conditional branch: recurse into both arms up to the immediate
post-dominator, build an IfStmt, continue from the post-dom. If
both arms terminate, post-dom is -1 and the inlined form is
naturally produced.
- Straight-line blocks emit their statements and flow to their sole
successor.
- Condition simplification peels nested UnaryOp("!", ...) pairs so
double negation (from iffalse + exit-when-taken) collapses cleanly.
- ``else { if (...) }`` is folded to ``else if (...)`` in _make_if.
Switch reconstruction, exception regions, and irreducibility are
intentionally left for a follow-up — this commit lands the core
control-flow structuring; the rest builds on the same entry point.
Testing: 6 synthetic pipeline tests (straight-line variants, if-
only both-arms-return, if-else, simple while loop) plus 1 opt-in
real-SWF smoke. Real SWF result: 14,984 method bodies structured
in ~6s total, slowest single method 3ms — compare with the existing
pattern-based structurer that hangs on pathological goto chains.
The CFG rewrite meets its primary goal: bounded, algorithmic
structuring on every method in a production SWF.
Switch reconstruction and try/catch emission in the CFG-based structurer. - Switch: _structure_switch walks block_result.switch_targets (default + cases). Each case body is structured up to the switch's immediate post-dominator. Shared targets produce a fall-through sequence of case labels with an empty body followed by the single shared body. Default comes last to match AS3 compiler layout. - Try/catch: each BasicBlock already carries the ExceptionInfo entries that protect it (set during CFG construction). When the structurer arrives at a block whose start_offset matches a handler's from_offset, _structure_try_region structures the protected body up to the first block at or past to_offset, then structures each catch handler's target as a CatchClause. Handlers are tracked by id() so a duplicate ExceptionInfo record won't be wrapped twice. Catch variables use synthetic names (_catch0_ etc.) — real names need multiname resolution that lives in downstream passes. - Irreducibility: no new code needed beyond the _emitted cycle-break already in structure_region. The real-SWF smoke confirms every method body in a production SWF structures in bounded time (slowest 3ms over ~15k methods, unchanged from before this commit). Testing: 2 new synthetic tests (switch with 2 cases + default; try/catch wrapping a protected block) + 1 opt-in real-SWF regression test validating these changes don't make any method slower than the previous baseline.
New flashkit/decompile/patterns.py — a pipeline of local rewrites
that fold compiler-produced shapes into idiomatic AS3:
- _CollapseDoubleNot: ``!!x -> x``. Runs first so later passes see
canonical condition shapes.
- _CompoundAssign: ``x = x op y -> x op= y`` when op is one of
+ - * / % & | ^ << >> >>>. Uses a conservative lvalue equality
check (identical Identifier names or matching MemberAccess chains).
- _TernaryFromIf: ``if (c) { t = x } else { t = y } -> t = c ? x : y``
when both arms are a single-assignment block to the same lvalue.
- _ForFromWhile: ``init; while (cond) { body; step }`` rewritten as
``for (init; cond; step) { body }`` when the trailing step is a
compound/simple assignment on a variable referenced by the
condition.
Generic _Transform visitor walks dataclass fields via reflection,
recursing into Node-typed fields and list/tuple containers, so new
pattern classes only override visit_<Name> for the nodes they care
about. The pipeline is idempotent by construction and verified by a
test.
Also fixed AstPrinter._p_CompoundAssignExpr to append ``=`` to the
op when it's missing — the compound-assign pass builds these with
bare op strings ("+", "*", ...) while the existing AST tests used
"+=" style pre-combined strings. The printer now normalises both.
Testing: 10 pattern tests covering compound assign (3 variants +
negative test), double-not collapse, ternary from if/else (+
negative test), for-loop detection (+ negative test), and pipeline
idempotence.
MethodDecompiler.decompile now runs the CFG-based pipeline end to
end:
decode_instructions -> build_cfg_from_bytecode
-> compute_idom / compute_ipostdom -> find_loops
-> BlockStackSim -> structure_method -> apply_patterns
-> AstPrinter
The old 4500-line pattern-based body is gone. Public surface
(MethodDecompiler class, decompile method signature with indent /
class_idx / is_static / class_name kwargs) is preserved so
AS3Decompiler in class_.py, DecompilerCache, and the CLI work
unchanged.
Output shape: the printer emits a BlockStmt wrapped in {} with a
4-space indent. _reindent_body strips the outer braces and
re-indents the statements to the caller's requested indent so the
class decompiler can embed the body inside the function signature
it builds itself. Trivial bodies (empty, or a single returnvoid
that the sim dropped) return "".
The ABC argument may be either a raw AbcFile or the internal
_adapter.AbcView used throughout class_.py; _raw_abc unwraps the
adapter when present since the stack simulator reads pools via the
flashkit-native accessors. _get_body handles the dict-vs-list
shape difference in the same way.
Also removed _GLOBAL_FUNCTIONS (only used internally by the old
pipeline).
Verified on a real production SWF: 14,984 method bodies structure
in bounded time (slowest single method ~3ms). Sample class-level
decompiles produce valid AS3 source with structured control flow
in a few milliseconds per class, no hangs. All 470 existing tests
still pass.
Four small rewrites that tighten the generated AS3 source:
- Parameter names: BlockStackSim takes optional param_count and
local0_name. Locals 1..param_count render as _arg_1.._arg_N (the
AS3 compiler parameter convention) while higher registers stay as
_loc{N}_. Local 0 is configurable so static methods show the
class name in place of `this`. MethodDecompiler wires param_count
through from the method's MethodInfo.
- setproperty + findpropstrict same-name collapse: mirrors the
existing getproperty idiom. `findpropstrict foo; x; setproperty foo`
was printing as `foo.foo = x` — now prints as `foo = x`.
- Inline-else-after-terminator: `if (c) { return 1 } else { return 0 }`
becomes `if (c) { return 1 } return 0`. Only fires when the then
branch provably can't fall through (return/throw/break/continue,
or a nested if where both arms are terminating) and the else is
a plain block (else-if chains are left alone).
- Trailing bare return strip: every AS3 function has an implicit
void return, so `{ stmt; return; }` is noise. The final `return;`
is dropped from the outermost method body block only — nested
returns inside if/switch/while arms are load-bearing and kept.
Result on a real production SWF (sampling ANE_RawData):
Before: After:
var sExtensionContext = { // [trivial cinit block
return; // vanishes]
}
public function Init(_loc1_) { public function Init(_arg_1) {
sExtensionContext.sExtensionCtx sExtensionContext =
= ExtensionContext.create("..") ExtensionContext.create("..")
if (sExt) { if (sExt) {
return sExt.call("..", _loc1_); return sExt.call("..", _arg_1);
} else { }
return false; return false;
}
} }
Testing: 10 new tests (3 stack-sim param-name, 1 setproperty
collapse, 3 else-inline, 3 trailing-return). All 480 unit tests
pass, 8 opt-in smokes skipped. Real-SWF smoke unchanged: 14984
method bodies structure in ~6s, slowest 3ms.
…exception view offsets
Three related fixes for bogus output on real SWFs:
- BlockStackSim ran every block with an empty entry stack, so any
conditional whose operand was pushed in a predecessor fell back
to Identifier("_unknown"). Replaced the per-block pass with a
forward dataflow driver (_simulate_all_blocks): RPO traversal,
entry stack = meet of predecessor exit stacks, fixpoint in 1-2
passes for reducible CFGs. Disagreeing stack slots become
synthetic _sN_bM identifiers instead of _unknown.
- resolve_multiname returned "multiname[N]" for the four late-bound
kinds (RTQNAME_L, RTQNAME_LA, MULTINAME_L, MULTINAME_LA). Those
names have no string in the pool by design (the name comes off
the runtime stack) — return "*" to match AVM2 convention.
- _ExceptionView exposed from_pos/to_pos but graph.cfg and
decompile.structure read from_offset/to_offset, crashing every
try/catch method with AttributeError. Renamed the view properties
to match; no other callers used the old names.
Also exposes reverse_postorder in graph.dominators (was private).
Measured on 2016 Brawlhalla (352 classes) and 10-Apr-2026 obfuscated
SWF (826 classes across 12 ABC blocks): zero _unknown, zero
multiname[N], zero AttributeErrors.
decompile_method emitted synthetic _arg_1, _arg_2, … for every parameter regardless of whether the method's METHOD_HAS_PARAM_NAMES flag was set and the MethodInfo.param_names table was populated. When an SWF ships with debug info, the original parameter names are right there in the string pool — ignoring them produces output that reads wrong next to FFDec / disasm. Walk MethodInfo.param_names first, fall back to the _arg_N naming only when the slot is 0 (unset) or points past the string pool.
Every analysis index (call_graph, strings, references, field_access, unified, method_fingerprint) wrapped scan_relevant_opcodes / decode_instructions in a bare ``except Exception`` that silently dropped the whole method body. A corrupt body disappeared from the index with no trace — callers had no way to know results were incomplete. Narrow to (ABCParseError, IndexError, ValueError) — the real failure modes from the bytecode decoder — and log at debug so someone chasing "why isn't this call showing up" can flip logging on and see the reason. A genuinely unknown exception will now propagate, which is the right behaviour for bugs in our own code. method.py keeps its two broad catches but with reason comments explaining the policy: decompile() surfaces any pipeline failure as a comment in the output so batch decompiles don't abort, and _ast_equal falls back to identity when custom __eq__ raises.
…s.py _helpers_full.py was a 542-line near-duplicate of helpers.py — all public functions have leading-underscore siblings reachable only through ``from ._helpers_full import *`` in class_.py. Three checkers (check_mn_ns_set, check_typename_param, check_mn_ns_set_typed) had no public twin; promote them with proper docstrings, keep the behaviour-dependent uppercase-first heuristic but flag it as scheduled for replacement by a structural check. class_.py now uses explicit named imports against helpers.py for every symbol it needs — no more namespace pollution from star imports. The opcode import also drops ``from ..abc.opcodes import *`` in favour of the 37 names it actually references. While here: drop the duplicate ``logger`` that shadowed the module-level ``log``, and rewrite the two log call sites to use lazy %-formatting instead of f-strings inside the log call (the formatter only evaluates arguments when the level is enabled).
…rkspace ClassGraph.from_workspace called ReferenceIndex.from_workspace, which re-ran a full scan over every method body — duplicating the work ``Workspace._ensure_indexes()`` had just done via build_all_indexes. Accessing ``workspace.class_graph`` therefore paid for two full bytecode passes. Use the already-built ``workspace.reference_index`` instead. On a real SWF with 350 classes and ~10K methods, the class-graph accessor now finishes in half the time.
Sixteen string-manipulation functions in decompile/helpers.py had no callers anywhere in the library. They were built for the legacy pre-CFG decompilation path that was retired when structure.py / stack.py / patterns.py / ast/printer.py landed — those work on typed AST nodes instead of stitching output strings by hand. Delete: pop_n, fmt_hex, to_hex_if_int, fmt_call, binop, bitwise_binop, is_type_default, strip_redundant_cast, add_type_cast_if_needed, has_outer_parens, needs_ternary_wrap, find_op_outside_parens, wrap_for_logical, expand_multiline_stmt, typename_param_indices, and check_mn_ns_set (its _typed counterpart is the one class_.py actually calls). Survivors — INDENT_UNIT, fmt_hex_const, escape_str, access_modifier, skip_operands, check_typename_param, check_mn_ns_set_typed — keep the same signatures; class_.py and ast/printer.py are unchanged. 609 → 245 lines. Tests: 480 pass / 8 skip.
Before: StringIndex.from_workspace / FieldAccessIndex.from_workspace / ReferenceIndex.from_workspace / InheritanceGraph.from_workspace all ran their own full bytecode scan — duplicating the work build_all_indexes had already done and left cached on the Workspace. Any external caller using the classmethod was paying the extra pass. Rewrite each as a thin accessor that returns the workspace's cached index. The authoritative build still lives in Workspace._ensure_indexes (which calls build_all_indexes + InheritanceGraph.from_classes). ReferenceIndex.from_classes_and_abc stays as-is — it's the only documented entry point for building a ReferenceIndex without a Workspace, used by tests that construct ABC directly. CallGraph.from_workspace stays as the real builder because CallGraph isn't part of build_all_indexes — Workspace.call_graph calls from_workspace as its cached accessor. references.py drops unused opcode imports (OP_NEWCLASS, and the opcode subset that only fed the duplicate _REF_SCAN_OPS table).
``flashkit disasm`` had only raw pool indices in its operand column (``getlex 591`` instead of ``getlex DevSettings``), which made the output near-useless. Switch to ``resolve_instructions`` by default; ``--raw`` keeps the old behaviour for anyone who actually wanted pool indices (e.g. building a decoder). New ``flashkit pool`` subcommand dumps the ABC constant pools: multinames, namespaces, namespace-sets, ints, uints, doubles. Takes a ``--search`` substring filter and ``--abc-index`` when the SWF has multiple DoABC blocks. Matches the studio's Strings / Multinames views.
The name ``ClassInfo`` existed in two places: ``flashkit.abc.types`` (the raw static half of an ABC class, paired with InstanceInfo) and ``flashkit.info.class_info`` (the fully resolved class model downstream code uses). Wildcard imports from either package shadowed the other. Rename the ABC-level type to ``AbcClassInfo`` everywhere it's constructed or annotated. Leave ``ClassInfo = AbcClassInfo`` as a legacy alias in both ``flashkit.abc.types`` and ``flashkit.abc`` so existing downstream imports keep working; new code should use the unambiguous name.
list_classes previously returned list[dict] — the key set
(index, name, package, full_name, super, is_interface, trait_count)
was an undocumented contract between the decompiler, the cache, the
CLI, and every downstream consumer.
Promote it to a frozen dataclass ClassSummary with typed attributes.
Kept dict-style access (c["name"], c.get("index"), c.keys()) via
__getitem__ / get / keys so every existing ``c["name"]`` call site
keeps working without modification — this is an additive migration,
not a breaking one.
CONSTANT_QNAME, TRAIT_METHOD, ATTR_METADATA, INSTANCE_INTERFACE, METHOD_HAS_PARAM_NAMES, and the rest were reachable only via ``flashkit.abc.constants.X``. Downstream code (tests, bh-mcp, bh-deobfuscator) that needed to interpret TraitInfo.kind or MultinameInfo.kind had to reach into a submodule by path, which is fragile. Add a curated ``__all__`` to ``flashkit.abc.constants`` listing all 37 public constants, then re-export them at ``flashkit.abc``. Both the old ``flashkit.abc.constants.CONSTANT_QNAME`` path and the new ``flashkit.abc.CONSTANT_QNAME`` path work.
Six files mixed the PEP 604 ``X | None`` syntax with the older ``Optional[X]`` import. Python ≥3.10 is the minimum supported version (pyproject.toml), so there's no reason to keep the ``typing.Optional`` import around. Rewrite every annotation and drop the now-unused imports. Docstring prose that used "Optional" as an English word is left alone.
DecompilerCache was listed in the public ``__all__`` of ``flashkit.decompile`` with zero test coverage. Add a suite that: - Builds a minimal synthetic SWF on disk via SwfBuilder + AbcBuilder. - Exercises list_classes / decompile_class / decompile_method. - Verifies the typed ClassSummary rows are returned (not raw dicts). - Verifies missing class and missing method raise KeyError. - Verifies repeat calls reuse the cached entry object. - Verifies that bumping the file's mtime allocates a new entry under a new (path, mtime) key instead of serving a stale parse. - Verifies pathlib.Path is accepted, not just str.
Expansion branch added cross-block stack dataflow, late-bound multiname resolution, exception-view fix, constant pool re-export, ClassSummary dataclass, DecompilerCache test coverage, analysis-layer narrowing, CLI pool + disasm resolution — enough shape change to warrant a minor bump from 1.2. Add the missing project metadata a 1.x package should ship with: author table (bitalizer), repository / issues URLs, and pytest-cov under the dev extra so ``pip install -e .[dev]`` gives the full test toolkit.
…ind helper Five additions under flashkit/analysis/ that slot in next to the existing ReferenceIndex / CallGraph / StringIndex / FieldAccessIndex: - liveness.method_liveness — per-method register read/write summary with first-write / last-read offsets. Feeds rename heuristics that promote ``_loc3_`` → ``count`` when a register has a single write and many reads. - const_args.ConstArgIndex — records literal arguments observed at every call site. Cheap backward walk from each call opcode picking up immediate push* values; non-literal pushes stop the walk. distinct_arg_values(target, slot) returns the set of literal values any caller passed — the "flag enum detector" signal. - dead_code.find_dead_classes / find_dead_methods / entrypoint_candidates — heuristics over the already-built ReferenceIndex, InheritanceGraph, and CallGraph. Doesn't scan any new bytecode. Entry-point detection flags classes whose ancestor chain reaches Sprite / MovieClip / DisplayObject / EventDispatcher. - complexity.cfg_complexity / method_complexity — McCabe cyclomatic complexity (E − N + 2) straight off the CFG. One-liner using the already-built graph module. - helpers.build_class_name_set — structural replacement candidate for the name[0].isupper() heuristic in check_mn_ns_set_typed. Walks every trait and returns the string-pool indices that name real TRAIT_CLASS traits. Exposed for downstream code; class_.py still uses the existing heuristic unchanged (the structural variant can be adopted incrementally). Also bump the version-pinned tests to 1.3.0. 501 pass / 8 skip.
The dev extra now installs pytest-cov, which leaves a ``.coverage`` SQLite snapshot in the repo root after any ``pytest --cov`` run. Add it plus the rest of the standard coverage artifact patterns (``.coverage.*``, ``htmlcov/``, ``coverage.xml``, ``*.cover``) so they never get accidentally committed.
README: - Document the AS3 decompiler API — decompile_class / decompile_method / decompile_method_body / list_classes / ClassSummary / DecompilerCache — which had no coverage at all in 1.2's README. - Add the new CLI subcommands: ``flashkit decompile`` (was missing) and ``flashkit pool`` (new in 1.3). Note that ``flashkit disasm`` now resolves operand names by default with ``--raw`` for opt-out. - Add a "Deeper analysis" subsection covering liveness, const-args, dead-code, entry-point detection, and cyclomatic complexity. - Document the package-level AVM2 constants re-export (``from flashkit.abc import CONSTANT_QNAME, TRAIT_METHOD, ...``). - Add ``decompile/`` and ``graph/`` to the project structure list. CONTRIBUTING: - Project layout no longer mentions the non-existent ``search/`` package and now lists every analysis module that actually exists, plus ``decompile/`` and ``graph/``. - Document the ``FLASHKIT_TEST_SWF`` env var for opt-in real-SWF tests and the ``--cov`` flag for coverage runs. pyproject.toml: - Fix project.urls — the repo lives at ``bitalizer/pyflashkit`` (the pip package name), not ``bitalizer/flashkit``. Same for the README clone URL.
Each subcommand now reads cleaner under ``--help``: - The big nine commands (decompile, disasm, pool, strings, classes, fields, extract, plus the top-level help) get an Examples block via ``epilog=`` + ``RawDescriptionHelpFormatter``. Three to five invocations each — enough to anchor the syntax without becoming a wall of text. - ``--class CLASS`` instead of ``--class CLASS_NAME``, ``--method METHOD`` instead of ``--method METHOD_NAME``, ``--method-index N``, ``--outdir DIR``, ``--field NAME``. The dest-derived metavars argparse generates by default were noisy. - ``tags`` and ``extract`` corrected from "SWF file" to "SWF or SWZ file" — both load through the workspace which accepts both formats. ``build`` keeps "SWF file" because it legitimately rejects SWZ at runtime. No behavioural change — every flag and positional still parses identically; this is help-text only.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Bumps pyflashkit to 1.3.0. Adds an AS3 decompiler, expands the analysis layer, polishes the public API.
Decompiler
Replaces the legacy string-stitching path with a CFG-based pipeline:
Public API:
decompile_method,decompile_method_body,decompile_classDecompilerCachefor repeated lookups against the same SWF (mtime-keyed)flashkit decompile --list / --class / --method / --all --outdirCLIThe new
flashkit/graph/package (CFG + dominator/post-dominator trees + natural loop detection) is independently useful.Decompiler fixes
_unknown. Forward dataflow with predecessor-stackmeet, fixpoint in 1–2 passes for reducible CFGs.MULTINAME_L/MULTINAME_LA/RTQNAME_L/RTQNAME_LA) resolve to*instead ofmultiname[N]placeholders._ExceptionViewexposesfrom_offset/to_offsetmatching the CFG / structurer field names; try/catch methods no longer crash withAttributeError.MethodInfo.param_nameswhen the debug flag is set instead of always emitting_arg_N.setpropertycollapse, else-after-return inline, trailing return strip.Analysis
New modules under
flashkit/analysis/:liveness— per-method register read/write summary with counts, first-write / last-read offsets.const_args— call-site constant-argument inference.distinct_arg_values(target, slot)gives the set of literal values any caller passed (flag-enum signal).dead_code—find_dead_classes,find_dead_methods,entrypoint_candidates(Sprite/MovieClip/EventDispatcher subclasses).complexity— McCabe cyclomatic complexity (E - N + 2) over the existing CFG.helpers.build_class_name_set— structural alternative to thename[0].isupper()heuristic.Existing indexes:
from_workspacefactories now delegate to the workspace's cached index (no extra bytecode pass).excepttypes in every analysis index —(ABCParseError, IndexError, ValueError)instead of bareException. Corrupt method bodies log at debug instead of disappearing silently.ClassGraph.from_workspacereuses the already-builtReferenceIndexinstead of re-scanning every body.ABC layer
flashkit/abc/opcodes.py— extracted module, 161 UPPERCASE opcode constants, single source of truth.TraitInfowith parsed fields (slot_id, method_idx, type_name, …) and byte-perfect round-trip via_rawcache.AbcFile(abc.string(i),abc.multiname_full(i),abc.namespace_kind(i)).flashkit.abc—from flashkit.abc import CONSTANT_QNAME, TRAIT_METHOD, ...works without reaching into.constants.CLI
flashkit pool— dump multinames / namespaces / namespace-sets / ints / uints / doubles, with--searchfilter and--abc-indexfor multi-block SWFs.flashkit disasmresolves operands by default (getlex DevSettings,pushstring "noScale");--rawopts out for pool-index debugging.Breaking changes
flashkit.abc.ClassInforenamed toAbcClassInfo— clears the name collision withflashkit.info.ClassInfo. Legacy alias kept.flashkit.decompile.list_classesreturnslist[ClassSummary]dataclasses. Subscript access (c["name"]) still works.flashkit.abc.constants— UPPERCASE rename for every structural constant (CONSTANT_QNAME,TRAIT_METHOD,ATTR_OVERRIDE, …).Cleanup
_helpers_full.py(~540 lines) and 16 dead string-manipulation helpers inhelpers.pyleft over from the pre-CFG path. ~900 lines net removed.Optional[X]→X | Noneeverywhere (Python ≥3.10).DecompilerCachenow has test coverage (was 0% in 1.2 with the symbol in__all__).Tests
(Was 480 / 8 on 1.2.0.)
Test plan
pip install -e .[dev]from a clean venvpython -m pytestflashkit info <SWF>— sanity checkflashkit decompile <SWF> --class <Name>— verify AS3 source outputflashkit pool <SWF> multinames -s <substring>— new subcommandflashkit disasm <SWF> --class <Name>— verify resolved operands