wikifi

wikifi walks a legacy codebase and writes a technology-agnostic wiki of what the system does — domains, entities, flows, integrations, and cross-cutting concerns — extracted from the source with citations back to the lines that prove it.

The output is what a migration team needs to re-implement the system on a fresh stack from the wiki alone, without recreating the legacy structure in a new language.

For the full rationale and content contract, see VISION.md. To see the output, browse .wikifi/ — wikifi run against its own source.

Quickstart

# 1. Install in the target project
uv add wikifi

# 2. Scaffold .wikifi/ and config
uv run wikifi init

# 3. Walk the codebase
uv run wikifi walk

LLM Config

.wikifi/config.toml

provider = "anthropic" # openai | local(default)
model = "claude-sonnet-4-6" 
# ollama_host = "http://localhost:11434"

By default wikifi runs against a local Ollama server (Qwen 3 27B at the highest reasoning level the model exposes) — no cloud dependency, no API key, no data leaving the machine. Hosted Anthropic and OpenAI backends are opt-in.

What you get

A .wikifi/ directory in the target repo containing the synthesized wiki. The on-disk layout is at the implementor's discretion; the content contract is fixed and lives in VISION.md:

Primary capture (extracted from source) — domains & subdomains, intent, capabilities, entities, integrations, external dependencies, cross-cutting concerns, hard specifications, and inline schematics.
Derivative capture (synthesized from the aggregate) — personas, user stories, and 10,000-foot diagrams produced after primary content is complete.

Every claim in the wiki carries a numbered citation back to a SourceRef (file + line range + content fingerprint). Conflicting evidence across files is preserved in a "Conflicts in source" block rather than silently resolved.

CLI

Command	Purpose
`wikifi init`	One-time setup. Scaffolds `.wikifi/` and local config.
`wikifi walk`	Walks the target codebase and produces the wiki.
`wikifi report`	Coverage + quality report (per-section file counts, findings, body sizes).
`wikifi chat`	Interactive REPL for iterative exploration of the wiki and the source.

walk flags:

--no-cache — force a clean re-walk; drops the on-disk extraction + aggregation caches.
--review — run the critic + reviser loop on derivative sections (personas, user stories, diagrams).
--provider {ollama|anthropic|openai} — override the configured provider for this walk.

report --score runs the critic on every populated section for a 0–10 quality score.

Providers

The LLM backend is reached through a provider abstraction; swapping it never touches the rest of the system.

OllamaProvider — default. Local server, no cloud dependency.
AnthropicProvider — WIKIFI_PROVIDER=anthropic. Uses prompt caching with cache_control: ephemeral on the system prompt so the multi-KB extraction prompt is paid for once across hundreds of per-file calls.
OpenAIProvider — WIKIFI_PROVIDER=openai. Relies on OpenAI's automatic prefix caching and routes the think knob to reasoning_effort on o* / gpt-5 reasoning models.

How the walk works

The walk has four responsibilities, in order:

Introspect — review the target's root structure (manifests, top-level layout, gitignore signals) and decide which paths carry production source worth analyzing. The walk that follows is deterministic; the agent does not re-pick scope mid-walk.
Filter — recognize and skip unstructured or near-empty files (stub __init__, empty fixtures, generated lockfiles) before they reach the agent. Empty input must never stall the walk.
Extract — for each in-scope file, extract structured findings against the primary capture sections in VISION.md. Each finding carries a SourceRef for downstream citation.
Synthesize — primary sections are aggregated from the per-file findings into an EvidenceBundle (body + claims + contradictions). Derivative sections (personas, user stories, diagrams) are produced after primary content is complete, never inferred from a single file.

Supporting machinery:

Repo graph (wikifi/repograph.py) — regex-driven static analysis builds an import / reference graph and classifies each file's FileKind (application code, SQL, OpenAPI, Protobuf, GraphQL, migration, other). Each file's neighborhood is injected into the extraction prompt so per-file findings can describe cross-file flows.
Specialized extractors (wikifi/specialized/) — schema files (SQL, OpenAPI, Protobuf, GraphQL, migrations) bypass the LLM and run through deterministic parsers. Structured findings reach the same notes store, so the rest of the pipeline is unchanged.
Content-addressed cache (wikifi/cache.py) — extraction findings are keyed by (rel_path, sha256(file_bytes)); aggregation bodies are keyed by a hash of the section's notes payload. Re-walks skip every file whose fingerprint hasn't changed; resumability after a crash is a free property of the same cache.
Critic + reviser (wikifi/critic.py) — opt-in via walk --review. Scores derivative sections against their brief and upstream evidence, identifies unsupported claims, and re-synthesizes when the score is below threshold. Only accepts a revision if it scores at least as well as the original.
Coverage + quality report (wikifi/report.py) — wikifi report produces a per-section view of files contributing, finding count, body size, and (with --score) critic-derived quality scores.

Configuration

wikifi reads configuration from environment variables. At minimum:

the LLM provider id and model identifier
the local Ollama endpoint (when using the default provider)
bounds on file size and stripped-content size, so unstructured or oversized files never reach the agent
the agent's thinking / reasoning level — defaults to the highest the chosen model supports

A .env.example will land once the surface is finalized.

Tech stack

Python 3.12+, packaged with uv
Local LLM via Ollama as the default runtime; thinking-capable model at the highest available reasoning level
Provider abstraction — Ollama default; hosted Anthropic and OpenAI slot in without touching the rest of the system
ruff as the single tool for lint and format
pytest + pytest-cov for tests (≥85% coverage gate)
GitHub Actions for CI

Development

make hooks       # one-time: enables .githooks/ pre-commit + pre-push
uv sync          # install dependencies
make test        # run the test suite

See CLAUDE.md for the full development process — commands, code rules, agent workflow, and debug escalation.

Distribution

wikifi ships as a Python library (PyPI / private index) and operates as a CLI invoked from a target project rather than as a server.

License

MIT.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
.claude		.claude
.githooks		.githooks
.github		.github
.wikifi		.wikifi
assets		assets
docs		docs
tests		tests
wikifi		wikifi
.env.example		.env.example
.gitignore		.gitignore
.mcp.json		.mcp.json
CLAUDE.md		CLAUDE.md
CODE-FORMAT.md		CODE-FORMAT.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
TESTING-AND-DEMO.md		TESTING-AND-DEMO.md
VISION.md		VISION.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

wikifi

Quickstart

LLM Config

What you get

CLI

Providers

How the walk works

Configuration

Tech stack

Development

Distribution

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

wikifi

Quickstart

LLM Config

What you get

CLI

Providers

How the walk works

Configuration

Tech stack

Development

Distribution

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages