"Is this code AI-generated?" — Finally, a tool that answers that.
A local, offline CLI that detects AI-generated code and identifies which model wrote it. No API keys. No cloud. No telemetry. Just heuristics and math.
$ code-provenance scan ./src/auth.ts
Code Provenance v0.1.0
──────────────────────────────
📊 src/auth.ts (146 lines)
Lines 1-146: 🤖 AI-generated (87%) Claude-style patterns
Summary: 100% AI-generated | 0% Human-written
Confidence: HIGH
Companies are banning AI code. Universities are failing students. Open-source projects are rejecting AI PRs. Everyone asks "was this written by AI?" — nobody has a good answer.
Code Provenance answers it. Locally. In milliseconds.
npm install -g code-provenanceOr run without installing:
npx code-provenance scan ./src/code-provenance scan ./src/auth.tscode-provenance scan ./src/Code Provenance v0.1.0
──────────────────────────────
📂 src (18 files, 2307 lines)
🤖 100% AI src/core/analyzer.ts (claude)
🤖 100% AI src/core/confidence.ts (claude)
🤖 100% AI src/core/scanner.ts
🤖 87% AI src/core/segmenter.ts (claude)
🤖 85% AI src/detectors/comment-patterns.ts (claude)
🤖 75% AI src/detectors/entropy.ts (claude)
🤖 65% AI src/cli.ts
⚠️ 34% AI src/detectors/model-signatures.ts (claude)
👤 0% AI src/reports/terminal-report.ts
👤 0% AI src/reports/markdown-report.ts
...
──────────────────────────────
Summary: 67% AI-generated | 6% Human-written (18 files, 2307 lines)
Scanned in 178ms
Yes — we scanned our own source code. It detected that Claude wrote most of it. That's the kind of honesty you're getting.
# Machine-readable JSON (for CI/CD pipelines)
code-provenance scan ./src/ --json
# Audit-ready markdown report
code-provenance scan ./src/auth.ts --format markdown
# Colored terminal output (default)
code-provenance scan ./src/auth.ts --format terminal| Code | Meaning |
|---|---|
0 |
No AI-generated code detected |
1 |
AI-generated code detected |
2 |
Error (file not found, binary file, etc.) |
Use in CI/CD:
code-provenance scan ./src/ --json || echo "AI code detected!"Zero LLMs. Zero API calls. Pure offline heuristics + statistical analysis.
| Engine | What it detects | Signal |
|---|---|---|
| Entropy | Shannon entropy per code window | AI code is more predictable (lower entropy) |
| Comment Patterns | Pre-function comments, step-by-step, obvious explanations | AI over-explains; humans under-comment |
| Naming Patterns | Generic vs domain-specific identifiers | AI uses data, result, value; humans use domain terms |
| Structural | Import ordering, function length uniformity, human markers | AI is unnaturally consistent; humans leave TODOs and HACKs |
| Model Signatures | Claude/GPT/Copilot-specific coding patterns | Each model has a fingerprint |
Code Provenance doesn't just detect AI — it tells you which AI:
| Model | Key Patterns |
|---|---|
| Claude | Functional style, const over let, import type, immutability patterns |
| GPT | Verbose pre-function comments ("This function does X"), step-by-step explanations |
| Copilot | Short completions (5-15 lines), no surrounding comments, utility functions |
┌─────────────┐ ┌──────────────┐ ┌────────────────┐
│ Parsers │───>│ 5 Detectors │───>│ Segmenter │
│ TS / Generic│ │ (parallel) │ │ (20-line │
│ 20+ langs │ │ │ │ windows) │
└─────────────┘ └──────────────┘ └───────┬────────┘
│
┌──────────────┐ ┌───────▼────────┐
│ Model │───>│ Confidence │
│ Attribution │ │ Calibration │
└──────────────┘ └───────┬────────┘
│
┌───────▼────────┐
│ Report │
│ Terminal/JSON/ │
│ Markdown │
└────────────────┘
| Full AST Parsing | Generic Heuristics |
|---|---|
| TypeScript, JavaScript, JSX, TSX | Python, Ruby, Rust, Go, Java, C#, Kotlin, C, C++, Swift, Lua, Shell, PHP, R, Scala, Zig, Vue, Svelte |
The TypeScript parser uses the real TypeScript compiler API for accurate function/import/comment extraction. All other languages use a regex-based generic parser that still provides solid detection.
- Single file scan: < 200ms (typically 10-50ms)
- Directory scan (18 files, 2300 lines): < 200ms
- No network calls. No disk cache. No startup penalty.
From our constitution:
"Better to report 'unknown' than to falsely accuse human code of being AI-generated."
The tool uses conservative thresholds. It will say "unknown" before it risks a false positive. When it says "AI-generated" — it means it.
npm test18 integration tests covering:
- AI detection (Claude, GPT, Copilot styles)
- Zero false positives on human-written code
- Model attribution accuracy
- All 3 output formats
- Generic parser (Python fixtures)
- Deterministic results
- Performance budgets
- Edge cases
git clone https://github.com/DilawarShafiq/code-provenance.git
cd code-provenance
npm install
npm run build
npm test
# Try it
node dist/cli.js scan src/- AI detection (5 engines)
- Model attribution (Claude/GPT/Copilot)
- Directory scanning
- JSON/Markdown/Terminal output
- 20+ language support
- N-gram frequency detector
-
.codeprovenanceconfig file - GitHub Action
- VS Code extension
- Code theft detection
- License violation scanning
- Git blame integration
MIT
Muhammad Dilawar Shafiq (Dilawar Gopang) GitHub | Email
Every line of code has a story. We tell it.