Vulpine compiles human-facing Python into a compact, model-friendly representation (mined AST macros), then expands it back. It is an experiment in a bidirectional compiler layer for coding LLMs.
- Blog: Vulpine — experiment in compiler layer for coding LLMs
- Code: https://github.com/MehulG/vulpine
Status: Experimental research code — not a production compiler.
On a held-out FastAPI eval corpus (~13k files, 2,000 mined macros, cl100k_base):
| Metric | Result |
|---|---|
| Token reduction (weighted net, drop top 100 outliers) | ~13.8% |
| AST roundtrip success | ~99.8% |
Charts and tables live on the blog. Reproduce numbers locally: docs/reproduce_blog_benchmarks.md.
human Python → model-facing form → LLM → human Python
Source code is optimized for people. LLMs need meaning with fewer, cleaner tokens. Vulpine mines repeated AST patterns from real code, replaces matches with compact macro tokens (often single-token Unicode names), and leaves everything else as ordinary Python. See docs/architecture.md.
git clone https://github.com/MehulG/vulpine.git
cd vulpine
uv sync
./examples/try_vulpine.shUses examples/fixtures/ and examples/preprocessed.mini.json — no GitHub fetch required.
cp prep_ds/.env.blog.example prep_ds/.env # set GITHUB_TOKEN
cp experiments/fastapi_800_200_20260529.env.example .env
# fetch → mine → vulpanize (see docs)Full runbook: docs/reproduce_blog_benchmarks.md
Nothing under runs/ is committed; you build artifacts locally.
files.db (SQLite)
→ mine/main.py → normalized_samples.json
→ unicode_token_tool/ → unicode_tokens.json
→ vulpanizer/preprocessor.py → preprocessed.json
→ vulpanizer/main.py → *.macro
→ devulpanize/main.py → roundtrip *.py
→ test/compare_*.py → reports/
| Path | Role |
|---|---|
| mine/ | Pattern mining from corpus |
| unicode_token_tool/ | Macro id token pool |
| vulpanizer/ | Vulpanize (compile to macro form) |
| devulpanize/ | Devulpanize (expand back) |
| prep_ds/ | Build files.db from GitHub |
| test/ | Token/structure comparison utilities |
| examples/ | Fixtures and quick smoke script |
| docs/ | Architecture, reproduction, limitations |
| experiments/ | Blog pipeline env + prep_ds/.env.blog.example for fetch |
| scripts/ | Optional benchmark helpers |
Each stage has .env.example files. For the blog run, prefer sourcing experiments/fastapi_800_200_20260529.env.example. Never commit .env — see SECURITY.md.
- examples/README.md
- mine/README.md
- unicode_token_tool/README.md
- vulpanizer/vulpanize/README.md
- devulpanize/README.md
- test/README.md
- prep_ds/README.md
- scripts/README.md
uv sync --extra dev
uv run python -m unittest discover -s test -p 'test_*.py'
./examples/try_vulpine.shSee CONTRIBUTING.md.
- Blog: Vulpine — experiment in compiler layer for coding LLMs
- Code: this repository (MIT License) — CITATION.cff
MIT — see LICENSE.