Skip to content

MehulG/vulpine

Repository files navigation

Vulpine

Vulpine compiles human-facing Python into a compact, model-friendly representation (mined AST macros), then expands it back. It is an experiment in a bidirectional compiler layer for coding LLMs.

Status: Experimental research code — not a production compiler.

TL;DR (blog benchmarks)

On a held-out FastAPI eval corpus (~13k files, 2,000 mined macros, cl100k_base):

Metric Result
Token reduction (weighted net, drop top 100 outliers) ~13.8%
AST roundtrip success ~99.8%

Charts and tables live on the blog. Reproduce numbers locally: docs/reproduce_blog_benchmarks.md.

Why two representations?

human Python  →  model-facing form  →  LLM  →  human Python

Source code is optimized for people. LLMs need meaning with fewer, cleaner tokens. Vulpine mines repeated AST patterns from real code, replaces matches with compact macro tokens (often single-token Unicode names), and leaves everything else as ordinary Python. See docs/architecture.md.

Quick try (~5 minutes)

git clone https://github.com/MehulG/vulpine.git
cd vulpine
uv sync
./examples/try_vulpine.sh

Uses examples/fixtures/ and examples/preprocessed.mini.json — no GitHub fetch required.

Reproduce blog benchmarks (~hours)

cp prep_ds/.env.blog.example prep_ds/.env    # set GITHUB_TOKEN
cp experiments/fastapi_800_200_20260529.env.example .env
# fetch → mine → vulpanize (see docs)

Full runbook: docs/reproduce_blog_benchmarks.md

Nothing under runs/ is committed; you build artifacts locally.

Pipeline overview

files.db (SQLite)
  → mine/main.py              → normalized_samples.json
  → unicode_token_tool/       → unicode_tokens.json
  → vulpanizer/preprocessor.py → preprocessed.json
  → vulpanizer/main.py        → *.macro
  → devulpanize/main.py       → roundtrip *.py
  → test/compare_*.py         → reports/

Repository layout

Path Role
mine/ Pattern mining from corpus
unicode_token_tool/ Macro id token pool
vulpanizer/ Vulpanize (compile to macro form)
devulpanize/ Devulpanize (expand back)
prep_ds/ Build files.db from GitHub
test/ Token/structure comparison utilities
examples/ Fixtures and quick smoke script
docs/ Architecture, reproduction, limitations
experiments/ Blog pipeline env + prep_ds/.env.blog.example for fetch
scripts/ Optional benchmark helpers

Configuration

Each stage has .env.example files. For the blog run, prefer sourcing experiments/fastapi_800_200_20260529.env.example. Never commit .env — see SECURITY.md.

Component docs

Development

uv sync --extra dev
uv run python -m unittest discover -s test -p 'test_*.py'
./examples/try_vulpine.sh

See CONTRIBUTING.md.

Cite

License

MIT — see LICENSE.

About

compiler for coding LLMs

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors