A terminal-native agentic coding runtime that adapts to your model, not the other way around.
Codii is a self-contained terminal tool that turns any local or cloud language model into a working coding agent. Point it at an Ollama server, an LM Studio instance, a vLLM deployment, or OpenRouter — Codii probes the model's real capabilities, selects the right scaffolding strategy, and runs a full agentic loop that reads, edits, and executes code in your codebase.
No cloud subscription required for local models. No vendor lock-in.
Agentic coding runtimes were designed around frontier models — large, well-behaved, with native function calling, reliable JSON output, and enormous context windows. Smaller open-weight models don't share those properties, and behavior degrades as a result: the model narrates what it would do instead of doing it. Files aren't created. Commands aren't run. The loop doesn't close.
The cause isn't model quality. It's scaffolding mismatch.
Research consistently shows that varying the scaffolding around a fixed model produces larger performance swings than varying the model inside a fixed scaffold. Codii acts on this finding. Before any agentic action is taken, it probes the connected model across four dimensions, builds a capability fingerprint, and selects a system prompt, tool-call format, and guard-rail configuration tuned to that specific model's measured behavior.
The harness adapts to the model.
Recommended (isolates codii from your project environments):
pipx install codiiAlternative:
pip install codiiBoth produce a codii command available immediately in your terminal.
Python 3.11 or later required.
Optional — serve mode (Anthropic Messages API shim backed by your local model):
pip install "codii[serve]"First run — interactive backend setup wizard:
codiiCodii auto-detects any locally-running Ollama or LM Studio instance, lists available models, probes the selected one, and drops you into a session.
Specify a backend explicitly:
codii --backend ollama
codii --backend openrouter
codii --backend vllmConnect to any OpenAI-compatible endpoint:
codii connect http://localhost:8000Other useful commands:
codii probe # Re-run capability probe for the current model
codii decisions # View the decision log from the last session
codii replay # List and replay past session transcripts
codii fingerprint show # Display the current model fingerprint
codii fingerprint edit # Open fingerprint in $EDITOR for manual tuning
codii fingerprint list # List all stored fingerprints
codii serve # Start Anthropic Messages API shim (requires [serve])Multi-backend support
Ollama (default :11434), LM Studio (default :1234), vLLM, OpenRouter, and any generic OpenAI-compatible endpoint. The first-run wizard auto-detects local backends. OpenRouter setup includes a masked API key prompt.
Automatic capability probing Four sequential probes run before the first session and results are cached per model:
- Tool-call format detection — sends a test tool definition and classifies the model's raw response format (
gemma4_tokens,hermes_xml,qwen_json,mistral_json,openai_json, or none) - Effective context window measurement — needle-in-haystack test at 25%, 50%, and 90% of the model's claimed context depth
- Reasoning token support detection — identifies
<think>,<thinking>,<|begin_of_thought|>, and similar delimiters - Structured output reliability — five JSON-schema requests scored as a 0–1 reliability fraction
Three-tier adaptive scaffolding System prompt, tool format, and guard rails are selected from the fingerprint. Tiers adapt dynamically within a session on repeated parse failures or successes.
| Tier | Condition | Tool Format |
|---|---|---|
| 1 | Native tool calling AND reliability ≥ 0.9 AND context ≥ 16k tokens | Native JSON function definitions |
| 2 | Structured output reliability ≥ 0.5 | Prompt-engineered XML with inline examples |
| 3 | Structured output reliability < 0.5 | Guided JSON, one call at a time |
Four interactive modes
- Auto (default) — full tool suite, all guards active
- Edit — scoped file editing with
edit_filediscipline enforced - Plan — LLM generates a numbered plan; keyboard UI to approve, edit individual steps inline, or reject before execution
- Chat — read-only Q&A with two-phase workspace indexing
Cycle through modes with Shift+Tab.
Complete tool suite
| Tool | Description |
|---|---|
read_file |
UTF-8 file read, sandboxed to workspace |
write_file |
Atomic write via temp-file swap (new files only) |
edit_file |
old_str → new_str replacement with two-pass whitespace-tolerant matching |
bash |
Shell execution (PowerShell on Windows, sh on Unix); timeout 60–300s; blocks interactive commands |
list_dir |
Directory listing; skips .git, node_modules, __pycache__, .venv |
spawn_subagent |
Delegates subtasks to built-in researcher / reviewer / planner agents with restricted tool sets |
Slash commands with autocomplete dropdown
/help /clear /compact /context /init /index /chat /edit /plan /scope /auto /exit
Dynamic Context Block (DCB) An ephemeral user-role message injected before each inner LLM call. It contains the active tool list, a tier-specific format hint, the last 10 session actions, and the current plan state. It is never stored in history and never consumes persistent context.
WorkflowLock
A state machine that arms when read_file is called and forces the next tool call to be edit_file or write_file on the same path. Prevents the common drift pattern where a model reads a file and then moves on without making the edit.
Weak Model Bridge
Detects when a smaller model writes correct code as a prose text block instead of a tool call, extracts the code, and injects a precision edit_file directive so the edit lands regardless.
Circuit Breaker Detects stuck patterns — the same read-only tool called on the same file repeatedly with no substantive progress — and injects a redirection to break the loop.
Session-wide codebase cache
Chat mode caches file contents and a codebase summary per session. Edit mode preloads scoped files into the conversation so the model can call edit_file directly without a redundant read.
Context auto-compaction
When conversation history reaches 75% of the effective context window, a summarization pass compresses old turns. A cooldown prevents thrashing after low-value compactions. Also available on demand with /compact.
Decision logging
Every guard evaluation, tier adaptation, parse attempt, repair heuristic, and tool execution is recorded to ~/.codii/sessions/<id>/decisions.jsonl. Review with codii decisions.
Session replay
Full conversation transcripts are saved per session and can be replayed with codii replay.
Beautiful terminal UI Rich library panels, spinning probe loader, live token counter (used / cap with percentage), keyboard-navigable menus, and masked password input for OpenRouter setup.
| Family | Format Detected | Typical Tier |
|---|---|---|
| Gemma 4 | gemma4_tokens |
1 |
| Qwen / QwQ (Qwen2.5, Qwen3) | qwen_json |
1 or 2 |
| Mistral / Devstral / Mixtral | mistral_json |
1 or 2 |
| Hermes (NousResearch) | hermes_xml |
2 |
| LLaMA / LLaMA-3 | openai_json |
1 or 2 |
| DeepSeek | openai_json |
1 |
| Phi | detected by name | 2 or 3 |
| Any OpenAI-compatible | native | 1 |
| Other / unknown | fallback parser | 2 or 3 |
If a model emits an unrecognized format, Codii falls back to the generic parser, which applies multiple repair heuristics before giving up.
The full agent loop. All tools available, all guards active. Best for open-ended "implement this feature" or "fix this bug" requests where Codii should determine the steps autonomously.
Read-only. Only read_file and list_dir are available. On first invocation Codii indexes the workspace: reads the most important files, builds a codebase summary, and caches it for the session. Subsequent questions answer from the cache without re-reading. Best for "how does X work" or "where is Y defined" questions.
Scoped file editing. Mention files in your prompt or use /scope to restrict the edit surface. Scoped files are preloaded into the conversation so the model can call edit_file directly without a redundant read. Best for targeted, surgical changes to known files.
Two-phase execution. First, Codii generates a numbered step-by-step plan and presents it in a keyboard-navigable UI — approve, edit individual steps inline, or reject. After approval, execution begins with the active plan shown in the Dynamic Context Block at every step. Best for multi-file refactors or complex tasks where you want to review before any files change.
| Command | Description |
|---|---|
/help |
Show available commands |
/clear |
Clear conversation history (preserves session metadata) |
/compact |
Summarize history to reclaim context tokens |
/context |
Show token usage breakdown (used / cap / %) |
/init |
Generate CODII.md — LLM-analyzed project documentation |
/index |
Re-index workspace files for Chat mode |
/chat |
Switch to Chat mode |
/edit |
Switch to Edit mode |
/plan |
Switch to Plan mode |
/plan edit |
Open the current plan for inline step editing |
/scope |
Show or update the current edit scope |
/auto |
Toggle auto-approval (skip per-tool confirmation prompts) |
/exit |
Exit the session |
| Command | Description |
|---|---|
codii |
Start a session (first-run setup if unconfigured) |
codii probe |
Re-run the capability probe for the current model |
codii connect <url> |
Connect to a new backend, list models, probe |
codii decisions |
Show the decision log from the most recent session |
codii replay |
List and replay past session transcripts |
codii fingerprint show |
Display the current model fingerprint |
codii fingerprint edit |
Open the fingerprint in $EDITOR for manual tuning |
codii fingerprint list |
List all stored fingerprints |
codii serve |
Start Anthropic Messages API shim (requires [serve] extra) |
Global flags: --backend, --auto
codii probe additionally accepts: --backend, --endpoint, --model, --verbose
codii (CLI)
│
├── BackendAdapter ── Ollama / vLLM / LM Studio / OpenRouter / Generic OpenAI-compat
│
├── Probe Pipeline ── tool_call → context_window → reasoning → structured_output
│ └── CapabilityFingerprint ── stored in ~/.codii/fingerprints/
│
├── Scaffolding Selector ── picks Tier 1 / 2 / 3 from fingerprint
│ └── tier{1,2,3}.txt ── system prompts per tier
│
├── Parser Dispatch ── gemma4 / qwen / hermes / mistral / generic
│
└── Session (TAOR Loop)
├── AgentCore ── Think → Execute → Verify per turn
│ ├── WorkflowLock ── arm on read_file, force next call to edit/write
│ ├── Weak Model Bridge ── extract code-as-text → inject edit_file
│ ├── Circuit Breaker ── detect and interrupt stuck patterns
│ └── ContextInjector ── Dynamic Context Block (ephemeral, not stored)
│
├── Session State
│ ├── conversation history
│ ├── reads_this_session (read-before-write guard)
│ ├── action_log (last 20 entries)
│ ├── file_cache (chat indexing + edit preloading)
│ └── PlanState (step index + DCB rendering)
│
└── Tools
read_file / write_file / edit_file / bash / list_dir / spawn_subagent
Everything is sandboxed to the workspace directory. Tools reject paths that escape via symlinks or .. traversal. Sensitive paths (.env, .ssh, .aws, .gnupg, credentials) are blocked regardless of workspace location.
v0.1.0 — current release.
The probe pipeline, fingerprint system, tier-based scaffolding, and agentic core are stable. All four modes (auto, chat, edit, plan), the full slash command set, and the terminal UI are implemented and working.
With capable models (31B+ quantized, or cloud models via OpenRouter): the agent loop is reliable. WorkflowLock, the Circuit Breaker, and the Dynamic Context Block collectively keep the model on track through long multi-step tasks.
With smaller models (7B, 2B): models generally produce correct code but can struggle with consistent tool-call formatting. The Weak Model Bridge handles the most common failure mode (code written as a text block) but doesn't resolve every formatting failure. Tier 2 and Tier 3 scaffolding improve reliability significantly for these models.
Active development continues. The public API is not yet stable.
git clone <repo-url>
cd codii
pip install -e ".[dev]"pytest # run tests
ruff check src/ tests/ # lint
ruff format src/ # format
mypy --strict src/codii # type checkOpen an issue to discuss large changes before submitting a PR. If a PR changes behavior described in SYSTEM_DESIGN_DOCUMENT.md, update the document in the same commit.
Codii makes no outbound requests except to your configured endpoint. No usage data, no error reporting, no analytics. For local backends, all traffic stays on your machine. Verifiable by reading src/codii/connection/ — the only HTTP client in the project.
Local models, full agentic coding, no subscription.