Skip to content

bikini/patchwork

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

patchwork

Python source obfuscator. Give it a .py file, get back a single self-contained .py that does the same thing but is hostile to read or debug.

Each build is unique by default. Pass --seed N if you want a reproducible one.

Version 0.8 adds a normalization layer that lowers f-strings and simple sequence match statements before the rest of the pipeline. This lets literal fragments pass through string protection and lets Abyss cover functions that previously fell back because of generator expressions or simple match/case syntax.

Version 0.7 hardens Abyss with a second sealed-packet layer: protected VM instructions, constants, globals, locals, and entry offsets are individually masked inside the encrypted Abyss asset, and the runtime decodes one tagged packet at a time instead of loading a clean VM bytecode stream.

Version 0.6 adds optional Abyss VM protection for selected functions. Protected function bodies are lowered into encrypted VM assets before CPython compilation, then executed by a generated per-build runtime instead of being restored as ordinary Python code objects.

Version 0.5 adds a shared encoded literal pool, randomized decode paths, more integer rewrite forms, and a reserved-import regression fix for reproducible stress-tested builds.

Version 0.4 improves literal obfuscation with split/shuffled encrypted string and bytes payloads while preserving deterministic builds under --seed.

Version 0.3 adds advanced accountability and reproducibility workflows for legitimate software-protection use: config files, effective-config dumps, keep-name files, dry runs, stats JSON, HTML reports, manifest verification, output budget gates, a version flag, and CI metadata that matches the supported Python versions.

0.8 Improvements

  • F-string literals are lowered into ordinary formatting and joining calls before string encryption, so fragments such as separators, prefixes, and format specs no longer bypass literal protection.
  • Simple sequence match cases are lowered into guarded if chains before obfuscation, allowing the normal transforms and Abyss to process their literal text and branch structure.
  • Abyss now supports simple single-generator generator expressions and list comprehensions by lowering them into VM-managed loops.
  • The public reverse-engineering sample now protects rotate_name, compact_checksum, describe_amounts, and build_report under broad --abyss with no skipped functions.

0.7 Improvements

  • Abyss assets now use a second internal sealing layer after the outer asset cipher: every VM instruction is stored as a masked packet with per-build, per-position byte transforms and split shares.
  • Protected function bytecode is merged into one packet stream with sealed entry offsets, so decrypted Abyss assets no longer expose one clean b list per function.
  • Constants, globals, locals, and packet operands are sealed separately; a decrypted Abyss JSON asset does not expose protected strings or opcode names.
  • The generated runtime decodes and verifies one packet at a time with per-build tags, then discards the decoded instruction after dispatch.

0.6 Improvements

  • --abyss and --abyss-functions add an opt-in virtualized protection tier for supported pure-ish functions.
  • Abyss-protected functions are replaced with wrappers before compile(...); their real bodies are stored as encrypted data for a generated stack VM, not as marshaled CPython function code objects.
  • The generated VM uses per-build randomized opcode values, shuffled dispatch block order, encrypted constant/code assets, and preservation of referenced globals so targeted functions still run under identifier renaming.
  • Unsupported syntax is refused for explicitly named Abyss functions and skipped during broad --abyss auto-protection, keeping the default profile stable.

0.5 Improvements

  • String and bytes literals are now stored in a shared encoded pool and decoded by index, so transformed call sites no longer carry the full encrypted payload argument list inline.
  • Literal pool entries keep the previous per-literal keying, rotation, and shuffled chunk reconstruction, then add randomized XOR decoder variants.
  • Integer constants have four more rewrite families: bitwise inversion, split-by-mask recomposition, divmod-style reconstruction, and indexed noise tables.
  • Reserved from module import name bindings now stay unaliased when preserved by --keep or annotation handling, avoiding false renames in future annotation-heavy code.
  • Regression coverage now includes direct transform tests for pooled literals, the added number rewrites, and preserved import bindings.

0.3 Improvements

  • --version for scriptable tool identification.
  • --config PATH for JSON build profiles.
  • --dump-config PATH for saving the effective merged configuration.
  • --keep-file PATH for larger preserve-name lists.
  • --dry-run for audit/config planning without writing obfuscated output.
  • --stats-json PATH for machine-readable size/hash/build stats.
  • --report PATH for standalone HTML audit/build reports.
  • --verify-manifest PATH for checking manifest input/output hashes.
  • --max-output-bytes, --max-ratio, and --max-review-indicators gates.
  • Python support metadata and CI aligned to Python 3.10+.

What it actually does

Source-level rewrites first:

  • Identifiers get renamed to confusable junk like _OoIl01l1OOIlO0. Variables, function names, classes, import aliases, exception bindings.
  • String and bytes literals get XOR'd with random per-literal keys, rotated, split into shuffled chunks, stored in a shared encoded pool, and replaced with indexed calls to randomized reconstructing decrypt helpers.
  • Integer literals get rewritten as random equivalent expressions: XOR cancellations, sum displacements, shift round-trips, int.from_bytes decodes, affine identities, bitwise inversions, mask recomposition, divmod rebuilds, and indexed noise tables.
  • Bitwise ops get pushed through MBA identities (a^b becomes (a|b) - (a&b) and similar).
  • if and while tests get wrapped with always-true tautologies anchored on a runtime seed (s*(s+1) % 2 == 0 and friends).
  • Random dead-branch blocks get inserted with realistic-looking code, gated by an always-false runtime predicate so they never actually run.

Then it compiles to bytecode and gets weirder:

  • Each user function and method's compiled code object is pulled out, encrypted into its own blob, and replaced in the marshaled module with a kind-matching stub (function / generator / coroutine / async-generator). On first call, the stub finds itself via frame inspection, decrypts its blob, swaps its own __code__ in place, and re-invokes. So a static dump of the marshaled module just shows function shells.
  • The whole module then goes through marshal -> zlib -> several rounds of cipher: XOR-CTR with a SHA-256 keystream, byte permutation, bit rotation. Order and keys are random per build.

Then the loader:

  • Two stages. Stage 1 is the file you ship, a small mangled script that decrypts and execs Stage 2. Stage 2 is compiled, marshaled, multi-cipher encrypted, and embedded as bytes inside Stage 1. It holds the lazy blob table, the resolver, and the user payload.
  • Both stages run anti-debug: sys.gettrace, sys.getprofile, sys.modules checks for pdb/bdb/pydevd/debugpy/etc, PYTHONBREAKPOINT env check, an audit hook that blocks subsequent imports of debugger modules and any call to sys.settrace/sys.setprofile, and a frame-stack walk looking for debugger frames.
  • Large bytes literals get split into 4-9 random chunks, shuffled, and reassembled at runtime, so the payload doesn't show up as one continuous blob in the source.

Requirements

Python 3.10 or newer. Stdlib only, no third-party packages.

The output has to run on the same Python major.minor you built it on. marshal format is version-bound.

Install

git clone https://github.com/bikini/patchwork.git
cd patchwork

That's it. If you want the patchwork console script on your PATH:

pip install -e .

Use it

python -m patchwork myscript.py
python myscript_obf.py

Custom output, reproducible build, more cipher layers:

python -m patchwork myscript.py -o protected.py --seed 12345 --layers 5

From Python:

from patchwork import Obfuscator, obfuscate_file

obfuscate_file('myscript.py', 'protected.py', seed=12345)

obf = Obfuscator(seed=12345, layers=4, stage2_layers=4)
out = obf.obfuscate(open('myscript.py').read())

Flags

python -m patchwork INPUT [-o OUTPUT] [options]

  --version                print Patchwork version and exit
  -o, --output PATH         output path (default: <input>_obf.py)
  --config PATH             load JSON build options
  --dump-config PATH        write the effective merged config JSON
  --verify-manifest PATH    verify input/output hashes in a manifest and exit
  --seed INT                RNG seed for reproducible builds
  --layers INT              cipher layers around the user payload (default 3)
  --stage2-layers INT       cipher layers around stage 2 (default 3)
  --keep NAME               identifier to leave un-renamed (repeatable)
  --keep-file PATH          read names to preserve from a text file
  --no-rename               turn off identifier renaming
  --no-encrypt-strings      turn off string/bytes literal encryption
  --no-obfuscate-numbers    turn off integer literal obfuscation
  --no-opaque               turn off opaque predicate injection
  --no-mba                  turn off Mixed Boolean-Arithmetic
  --no-junk                 turn off junk dead-branch injection
  --no-lazy                 turn off lazy per-function encryption
  --no-anti-debug           turn off runtime anti-debug probes
  --abyss                   virtualize eligible functions into encrypted VM assets
  --abyss-functions NAMES   virtualize only these functions, comma-separated or repeatable
  --no-lower-fstrings       leave f-strings as JoinedStr nodes
  --no-lower-match          leave match/case statements intact
  --audit-only              analyze the input and exit without writing output
  --audit-json PATH         write static audit metadata as JSON
  --manifest PATH           write build manifest with hashes/options/audit data
  --report PATH             write a standalone HTML audit/build report
  --stats-json PATH         write input/output stats JSON
  --strict-audit            refuse inputs that contain sensitive API indicators
  --dry-run                 show audit/config plan without writing output
  --max-output-bytes INT    refuse output larger than this size
  --max-ratio FLOAT         refuse output above this expansion ratio
  --max-review-indicators N refuse if review indicators exceed this count
  -q, --quiet               quiet mode

Audit and Manifest Workflow

Audit a file without generating an obfuscated output:

python -m patchwork app.py --audit-only --audit-json evidence/app.audit.json

Generate a manifest alongside a reproducible build:

python -m patchwork app.py --seed 12345 --manifest evidence/app.manifest.json --stats-json evidence/app.stats.json --report evidence/app.report.html

Verify a manifest later:

python -m patchwork --verify-manifest evidence/app.manifest.json

Use a JSON build profile and dump the effective merged config:

{
  "layers": 4,
  "stage2_layers": 4,
  "rename": true,
  "keep": ["public_api"]
}
python -m patchwork app.py --config patchwork.json --keep-file keep-names.txt --dump-config evidence/effective-config.json

Refuse to transform files that contain review indicators such as dynamic execution, process spawning, or sensitive standard-library imports:

python -m patchwork app.py --strict-audit

The manifest records the input and output SHA-256 hashes, Python version, Patchwork version, build seed, selected options, size/hash stats, and static audit metadata.

Python API

from patchwork import Obfuscator
from patchwork.audit import analyze_source, verify_manifest

obf = Obfuscator(
    seed=42,
    rename=True,
    encrypt_strings=True,
    obfuscate_numbers=True,
    opaque_predicates=True,
    mba=True,
    junk_branches=True,
    lazy_funcs=True,
    anti_debug=True,
    abyss=False,
    abyss_functions=[],
    lower_fstrings=True,
    lower_match=True,
    layers=3,
    stage2_layers=3,
    keep={'public_api_name'},
)
output_source = obf.obfuscate(input_source)

obfuscate(src, **kwargs) is the one-shot version. obfuscate_file(in_path, out_path=None, **kwargs) reads, obfuscates, writes, returns the output path.

Stuff that'll trip you up

Python version matters. Marshaled bytecode is tied to whatever Python major.minor built it. Build on 3.11, run on 3.11. Mismatch and it won't load.

One file at a time. This obfuscates a single module. For a package, run patchwork on each .py. Names that cross module boundaries (one obfuscated file importing another) keep their original spelling because we can't see across files.

Attribute access is left alone. obj.x doesn't get rewritten. The x could be addressing a stdlib method, a third-party API, anything. Class methods and class-body attributes are auto-skipped from renaming for the same reason. If you have other names that get accessed externally, throw them in --keep.

Generators, coroutines, async generators, decorators. All work. Stubs are kind-matched so you don't get flag-mismatch warnings. Decorators that call user functions at def time work too, because lazy resolution kicks in on first call.

sys.argv[0] will point at the obfuscated file, not the original.

This is not a cryptographic guarantee. Anything that runs on a target machine can eventually be reverse-engineered. A motivated person with a debugger can patch out the anti-debug checks, dump the unmarshaled code objects, and decompile back to source. What this buys you is depth: every layer has to be unwrapped before structure appears, and what you'd recover is heavily mangled. It's a speed bump, not a vault.

Examples

python -m patchwork examples/hello.py
python examples/hello_obf.py
# Hello, World! (from patchwork)
# number: 42
# computed: 285

examples/stress.py is the torture test - decorators, generators, classes with super(), match statements, walrus, exceptions, *args/**kwargs, globals. If something doesn't work, it'll usually break here first.

examples/modern.py covers newer Python constructs: dataclass, structural pattern matching with capture binders, mapping/list/star patterns, async functions, context managers, properties, f-strings with format specs, bytes, closures, and nonlocal.

examples/future_annotations.py covers from __future__ import annotations and verifies that observable annotation strings are preserved.

examples/literals.py covers Unicode strings, long strings, byte payloads, and marker strings. The regression suite checks both behavior and that marker strings do not appear as cleartext in generated output.

Tests

python tests/test_obfuscator.py
python tests/test_audit.py
python tests/test_stress_matrix.py

Runs every example through the obfuscator at multiple seeds, executes original and obfuscated versions, and checks stdout matches byte-for-byte. Also confirms different seeds produce different output and the same seed produces stable output.

The stress matrix generates deterministic Python programs with arithmetic, bitwise operations, pattern matching, closures, comprehensions, Unicode strings, bytes, and future annotations. It then runs them through multiple obfuscation seeds and option combinations, including disabled lazy loading, disabled renaming, disabled string encryption, and disabled opaque/junk transforms.

Layout

patchwork/
├── pyproject.toml
├── requirements.txt
├── README.md
├── .gitignore
├── examples/
│   ├── hello.py
│   ├── fizzbuzz.py
│   ├── classes.py
│   └── stress.py
├── patchwork/
│   ├── __init__.py
│   ├── __main__.py
│   ├── cli.py
│   ├── core.py
│   ├── crypto.py
│   ├── packer.py
│   ├── lazy.py
│   ├── loader.py
│   ├── util.py
│   └── transforms/
│       ├── identifiers.py
│       ├── strings.py
│       ├── numbers.py
│       ├── opaque.py
│       ├── mba.py
│       └── junk.py
└── tests/
    └── test_obfuscator.py

MIT.

About

A polymorphic multi-stage python obfuscator that mangles source via AST rewrites, encrypts each function's bytecode into a separate lazily-decrypted blob, and wraps everything in a multi-cipher anti-debug loader

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages