A language-agnostic CLI tool that traverses any codebase, generates structured descriptions of every file using a local LLM via Ollama, validates those descriptions through a quorum process, and outputs 1:1 markdown files.
- Language-agnostic — ships with profiles for Python, JavaScript/TypeScript, Java, Go, Ruby, Rust, PHP, and more
- Local-first — uses Ollama for all file analysis, no API keys required for core functionality
- Quorum validation — two independent LLM passes with a judge pass to ensure accuracy
- Resumable — SQLite-backed state means you can stop and resume at any time
- Relationship mapping — optional frontier model integration for cross-file dependency analysis
pip install codebase-analyzerFor development:
git clone https://github.com/avanrossum/codebase-analyzer.git
cd codebase-analyzer
pip install -e ".[dev]"- Python 3.9+
- A local LLM server running with a capable model. Supported backends:
- Context window: 8192 tokens minimum, 16384+ recommended. The tool sends a ~500 token system prompt plus the full file content, and the model must generate a structured JSON response. Files that exceed the context window will error and be skipped.
- JSON output quality matters. The model must reliably produce valid JSON. Larger models (30B+) perform significantly better at this than smaller ones.
- Default model:
qwen3:32b-q5_K_M(Ollama). Override with--model.
- Each file requires 3 LLM calls minimum (two analysis passes + one quorum judge), plus retries on disagreement. A 100-file repo means 300+ inference calls.
- Default concurrency is 1 (single-GPU safe). For multi-GPU setups, increase with
--concurrency. - Large codebases can take hours on consumer hardware. The tool is fully resumable — stop and restart at any time.
# Analyze a repository (auto-detects language profiles)
codebase-analyzer analyze /path/to/repo --output ./analysis
# Check progress
codebase-analyzer status ./analysis
# Resume an interrupted run (just re-run the same command)
codebase-analyzer analyze /path/to/repo --output ./analysis# Explicit language profiles
codebase-analyzer analyze /path/to/repo --output ./analysis --profiles python,web,config
# Custom profile file
codebase-analyzer analyze /path/to/repo --output ./analysis --profile-file ./my-project.yaml
# Include all text files
codebase-analyzer analyze /path/to/repo --output ./analysis --all-text-files
# Override model and concurrency
codebase-analyzer analyze /path/to/repo --output ./analysis \
--model qwen3:32b-q5_K_M \
--ollama-url http://localhost:11434 \
--max-retries 3 \
--max-file-size 100000 \
--concurrency 1
# Remote LLM server with authentication
codebase-analyzer analyze /path/to/repo --output ./analysis \
--ollama-url https://your-server.example.com \
--model your-model-name \
--api-token $LLM_API_TOKENIf your LLM server requires authentication, pass a bearer token via --api-token or the LLM_API_TOKEN environment variable. Here are some options for managing it securely:
macOS Keychain (recommended on Mac — encrypted at rest, never in a plaintext file):
# Store once
security add-generic-password -a "$USER" -s "llm-api-token" -w "your-token-here"
# Retrieve into env var
export LLM_API_TOKEN=$(security find-generic-password -a "$USER" -s "llm-api-token" -w)
# Or use an alias in ~/.zshrc
alias lm-token='export LLM_API_TOKEN=$(security find-generic-password -a "$USER" -s "llm-api-token" -w)'1Password / Bitwarden CLI (best for multi-machine setups):
export LLM_API_TOKEN=$(op read "op://Private/LLM Server/token") # 1Password
export LLM_API_TOKEN=$(bw get password "llm-api-token") # Bitwardendirenv (per-project, auto-loads when you cd into the project):
# .envrc in project root — make sure .envrc is in your .gitignore
export LLM_API_TOKEN="your-token"Avoid putting tokens directly in ~/.zshrc or ~/.bashrc — they're unencrypted, easy to accidentally commit, and visible to any process that reads your shell config.
After analysis completes, optionally map cross-file relationships:
# Via Claude API (automated)
codebase-analyzer relationships ./analysis --api-key $ANTHROPIC_API_KEY
# Export prompt for Claude Code (interactive)
codebase-analyzer relationships ./analysis --export-promptFiles that fail quorum after retries can be resolved with a frontier model:
# Via Claude API
codebase-analyzer resolve-flagged ./analysis --api-key $ANTHROPIC_API_KEY
# Export for manual review
codebase-analyzer resolve-flagged ./analysis --export-promptanalysis/
files/ # 1:1 markdown files mirroring repo structure
path/to/module.py.md
flagged/ # files that failed quorum (JSON with full history)
path/to/problem.py.json
relationships/ # cross-file dependency maps (if generated)
_index.md
module_map.md
analyzer_state.db # SQLite state for resume capability
run_report.md # summary statistics
The core analysis pipeline requires only Ollama. For automated relationship mapping and flagged file resolution via Claude API:
pip install "codebase-analyzer[api]"MIT