Generate repo-specific AI context for Codex, Claude Code, Copilot, and Cursor.
Turn any repository into canonical AI-readable project context.
RepoCanon is a Python CLI that analyzes a local codebase and generates project-specific instruction files for AI coding tools from a single internal repo model.
Instead of manually maintaining separate context for different tools, RepoCanon infers your repo’s structure, commands, conventions, and boundaries, then generates outputs such as:
AGENTS.mdCLAUDE.md- Copilot repository instructions
- Cursor project rules
The goal is simple: make AI coding tools behave like they already understand your repo.
AI coding tools are useful, but they usually guess:
- where things live
- how the repo is structured
- which commands to run
- what patterns are preferred
- what boundaries should not be crossed
RepoCanon reduces that guesswork by turning repo-specific knowledge into maintainable instruction files.
RepoCanon:
- analyzes a local repository
- detects languages, frameworks, commands, and topology
- infers conventions and architectural boundaries
- builds a normalized project model
- generates tool-specific AI context files from that model
RepoCanon is deterministic-first. It does not require an LLM to work.
- Codex via
AGENTS.md - Claude Code via
CLAUDE.md - GitHub Copilot via
.github/copilot-instructions.md(and optional path-scoped files) - Cursor via
.cursor/rules/*.mdc
pip install repocanonRequires Python 3.11+.
# 1. Analyze the current repo and persist a normalized model.
repocanon analyze .
# 2. Inspect what was inferred and how confident RepoCanon is.
repocanon audit .
# 3. Preview generated outputs without touching the filesystem.
repocanon preview .
# 4. Write the generated files into the repo (defaults to all targets).
repocanon generate .You can also generate one or more specific targets:
repocanon generate . -t agents
repocanon generate . -t claude -t copilot
repocanon list-targets # see what's availableNeed machine-readable output? Append --json to analyze, audit, or diff.
Need to undo? repocanon clean . (or scope it with -t copilot) removes only
RepoCanon-authored files (those that carry the generator header marker).
A real run produces files like:
AGENTS.md
CLAUDE.md
.cursor/rules/project-overview.mdc
.cursor/rules/commands-and-validation.mdc
.cursor/rules/code-style-and-conventions.mdc
.cursor/rules/architecture-boundaries.mdc
.github/copilot-instructions.md
.github/instructions/tests.instructions.md
See docs/samples/ for sample generated files from the bundled fixture repos.
RepoCanon has three layers:
It scans the local repo and extracts:
- languages
- frameworks
- package managers
- commands
- configs
- directory structure
- file patterns
It infers patterns such as:
- test layout (centralized vs colocated)
- frontend/backend split
- monorepo structure (apps/packages/libs/services)
- architectural boundaries
- naming conventions
- preferred libraries
- common anti-pattern risks (e.g. editing existing migrations)
It maps one normalized project model into tool-specific outputs.
That means the same repo understanding can be reused across multiple AI coding tools.
- deterministic first
- local-first (no telemetry, no network calls)
- tool-agnostic core
- small, readable outputs
- no generic filler — every section is grounded in repo facts
- explicit uncertainty when confidence is low
- human-editable generated files (sections between
<!-- repocanon:manual:* -->markers survive regeneration)
Analyze the repository and write a normalized model to:
.repocanon/project-model.json
Generate output for the selected targets (defaults to all).
Useful flags:
-t, --target— repeat to pick targets (agents,claude,copilot,cursor, orall)--dry-run— render and report without writing or persisting the model--output-dir— write into a sibling directory (path-traversal-safe)--force— replace files even if RepoCanon would otherwise preserve manual edits
Print generated output to the terminal without writing files. Same -t flag as generate.
List every target the current build supports.
Delete RepoCanon-authored files for the selected targets. Files without the
RepoCanon header marker (i.e. anything you wrote yourself) are skipped. Pair
with --dry-run to see what would happen.
Show inferred conventions, rationale, and confidence levels. Pass --json to
emit the audit as JSON for piping into other tools.
Compare the current repo scan with the saved model and report meaningful changes
(specific commands added/removed, package list deltas, structural fingerprint
changes). Pass --json for a machine-readable diff.
Create a local RepoCanon config file at .repocanon/config.toml.
RepoCanon stores project config in:
.repocanon/config.toml
Example:
[project]
name = "my-repo"
[scan]
include = ["src/**", "app/**", "packages/**"]
exclude = ["node_modules/**", ".next/**", "dist/**", "build/**"]
[generate]
targets = ["agents", "claude", "copilot", "cursor"]
safe_overwrite = truerepocanon/
├── analyzer/ # deterministic repo scanning + inference
├── models/ # Pydantic v2 project model
├── generators/ # one module per AI target
├── output/ # writers, preview, diff
├── report/ # audit + summary tables
└── cli.py # Typer entry point
The analyzer is a straight pipeline: file inventory → manifest parsing → framework/package-manager detection → command extraction → topology + conventions → final ProjectModel. Generators only consume that model — they never touch the filesystem.
RepoCanon is inference-based. It can detect a lot, but not everything.
It may be less accurate when:
- the repo is highly unconventional
- conventions are implicit rather than visible in files
- commands live outside standard manifests
- architecture is unclear from structure alone
When confidence is low, RepoCanon says so rather than inventing detail.
- more framework detectors (Django, Rails, .NET, Spring, etc.)
- stronger monorepo inference (Bazel, Pants, Nx graph)
- better path-scoped output generation
- safer merge/update behavior for edited generated files
- optional LLM-assisted summarization (off by default)
- additional target formats
You can. But in practice:
- they drift out of date
- they are inconsistent across tools
- they are often generic
- they rarely reflect the actual repo structure
RepoCanon keeps those files grounded in the codebase.
RepoCanon is intentionally a many-to-one-to-many pipeline:
repo files ─┐ ┌─► AGENTS.md (Codex)
├─► analyzer ─► ProjectModel ──┼─► CLAUDE.md (Claude Code)
manifests ─┘ ├─► copilot-instructions (Copilot)
└─► .cursor/rules/*.mdc (Cursor)
The analyzer collapses everything it sees into a single normalized ProjectModel (Pydantic v2). That model is the only thing target generators read; they never touch the filesystem. This gives RepoCanon two important properties:
- One source of truth. Languages, frameworks, commands, conventions, anti-patterns, and architecture boundaries all live in one place. Adding a new target means writing a new generator that consumes the same model — not re-implementing detection.
- Idiomatic outputs per tool. Each generator picks the parts of the model that make sense for its target and renders them in that tool's idiom: a verbose AGENTS.md for Codex, a terse CLAUDE.md for Claude Code, a repo-wide instructions file (plus optional path-scoped ones) for Copilot, and a small set of focused
.mdcrule files for Cursor.
The same model also powers audit, diff, and preview, so you can verify what RepoCanon inferred before any file is written.
Contributions are welcome. See CONTRIBUTING.md for local setup, tests, and development workflow.
MIT