Vulpine

Vulpine compiles human-facing Python into a compact, model-friendly representation (mined AST macros), then expands it back. It is an experiment in a bidirectional compiler layer for coding LLMs.

Blog: Vulpine — experiment in compiler layer for coding LLMs
Code: https://github.com/MehulG/vulpine

Status: Experimental research code — not a production compiler.

TL;DR (blog benchmarks)

On a held-out FastAPI eval corpus (~13k files, 2,000 mined macros, cl100k_base):

Metric	Result
Token reduction (weighted net, drop top 100 outliers)	~13.8%
AST roundtrip success	~99.8%

Charts and tables live on the blog. Reproduce numbers locally: docs/reproduce_blog_benchmarks.md.

Why two representations?

human Python  →  model-facing form  →  LLM  →  human Python

Source code is optimized for people. LLMs need meaning with fewer, cleaner tokens. Vulpine mines repeated AST patterns from real code, replaces matches with compact macro tokens (often single-token Unicode names), and leaves everything else as ordinary Python. See docs/architecture.md.

Quick try (~5 minutes)

git clone https://github.com/MehulG/vulpine.git
cd vulpine
uv sync
./examples/try_vulpine.sh

Uses examples/fixtures/ and examples/preprocessed.mini.json — no GitHub fetch required.

Reproduce blog benchmarks (~hours)

cp prep_ds/.env.blog.example prep_ds/.env    # set GITHUB_TOKEN
cp experiments/fastapi_800_200_20260529.env.example .env
# fetch → mine → vulpanize (see docs)

Full runbook: docs/reproduce_blog_benchmarks.md

Nothing under runs/ is committed; you build artifacts locally.

Pipeline overview

files.db (SQLite)
  → mine/main.py              → normalized_samples.json
  → unicode_token_tool/       → unicode_tokens.json
  → vulpanizer/preprocessor.py → preprocessed.json
  → vulpanizer/main.py        → *.macro
  → devulpanize/main.py       → roundtrip *.py
  → test/compare_*.py         → reports/

Repository layout

Path	Role
mine/	Pattern mining from corpus
unicode_token_tool/	Macro id token pool
vulpanizer/	Vulpanize (compile to macro form)
devulpanize/	Devulpanize (expand back)
prep_ds/	Build `files.db` from GitHub
test/	Token/structure comparison utilities
examples/	Fixtures and quick smoke script
docs/	Architecture, reproduction, limitations
experiments/	Blog pipeline env + prep_ds/.env.blog.example for fetch
scripts/	Optional benchmark helpers

Configuration

Each stage has .env.example files. For the blog run, prefer sourcing experiments/fastapi_800_200_20260529.env.example. Never commit .env — see SECURITY.md.

Component docs

Development

uv sync --extra dev
uv run python -m unittest discover -s test -p 'test_*.py'
./examples/try_vulpine.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vulpine

TL;DR (blog benchmarks)

Why two representations?

Quick try (~5 minutes)

Reproduce blog benchmarks (~hours)

Pipeline overview

Repository layout

Configuration

Component docs

Development

Cite

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
devulpanize		devulpanize
docs		docs
examples		examples
experiments		experiments
mine		mine
prep_ds		prep_ds
scripts		scripts
test		test
unicode_token_tool		unicode_token_tool
vulpanizer		vulpanizer
.env.example		.env.example
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Vulpine

TL;DR (blog benchmarks)

Why two representations?

Quick try (~5 minutes)

Reproduce blog benchmarks (~hours)

Pipeline overview

Repository layout

Configuration

Component docs

Development

Cite

License

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages