Codii — Local Models with Agentic Power

A terminal-native agentic coding runtime that adapts to your model, not the other way around.

What is Codii?

Codii is a self-contained terminal tool that turns any local or cloud language model into a working coding agent. Point it at an Ollama server, an LM Studio instance, a vLLM deployment, or OpenRouter — Codii probes the model's real capabilities, selects the right scaffolding strategy, and runs a full agentic loop that reads, edits, and executes code in your codebase.

No cloud subscription required for local models. No vendor lock-in.

Why Codii Exists

Agentic coding runtimes were designed around frontier models — large, well-behaved, with native function calling, reliable JSON output, and enormous context windows. Smaller open-weight models don't share those properties, and behavior degrades as a result: the model narrates what it would do instead of doing it. Files aren't created. Commands aren't run. The loop doesn't close.

The cause isn't model quality. It's scaffolding mismatch.

Research consistently shows that varying the scaffolding around a fixed model produces larger performance swings than varying the model inside a fixed scaffold. Codii acts on this finding. Before any agentic action is taken, it probes the connected model across four dimensions, builds a capability fingerprint, and selects a system prompt, tool-call format, and guard-rail configuration tuned to that specific model's measured behavior.

The harness adapts to the model.

Installation

Recommended (isolates codii from your project environments):

pipx install codii

Alternative:

pip install codii

Both produce a codii command available immediately in your terminal.

Python 3.11 or later required.

Optional — serve mode (Anthropic Messages API shim backed by your local model):

pip install "codii[serve]"

Quick Start

First run — interactive backend setup wizard:

codii

Codii auto-detects any locally-running Ollama or LM Studio instance, lists available models, probes the selected one, and drops you into a session.

Specify a backend explicitly:

codii --backend ollama
codii --backend openrouter
codii --backend vllm

Connect to any OpenAI-compatible endpoint:

codii connect http://localhost:8000

Other useful commands:

codii probe                  # Re-run capability probe for the current model
codii decisions              # View the decision log from the last session
codii replay                 # List and replay past session transcripts
codii fingerprint show       # Display the current model fingerprint
codii fingerprint edit       # Open fingerprint in $EDITOR for manual tuning
codii fingerprint list       # List all stored fingerprints
codii serve                  # Start Anthropic Messages API shim (requires [serve])

Key Features

Multi-backend support Ollama (default :11434), LM Studio (default :1234), vLLM, OpenRouter, and any generic OpenAI-compatible endpoint. The first-run wizard auto-detects local backends. OpenRouter setup includes a masked API key prompt.

Automatic capability probing Four sequential probes run before the first session and results are cached per model:

Tool-call format detection — sends a test tool definition and classifies the model's raw response format (gemma4_tokens, hermes_xml, qwen_json, mistral_json, openai_json, or none)
Effective context window measurement — needle-in-haystack test at 25%, 50%, and 90% of the model's claimed context depth
Reasoning token support detection — identifies <think>, <thinking>, <|begin_of_thought|>, and similar delimiters
Structured output reliability — five JSON-schema requests scored as a 0–1 reliability fraction

Three-tier adaptive scaffolding System prompt, tool format, and guard rails are selected from the fingerprint. Tiers adapt dynamically within a session on repeated parse failures or successes.

Tier	Condition	Tool Format
1	Native tool calling AND reliability ≥ 0.9 AND context ≥ 16k tokens	Native JSON function definitions
2	Structured output reliability ≥ 0.5	Prompt-engineered XML with inline examples
3	Structured output reliability < 0.5	Guided JSON, one call at a time

Four interactive modes

Auto (default) — full tool suite, all guards active
Edit — scoped file editing with edit_file discipline enforced
Plan — LLM generates a numbered plan; keyboard UI to approve, edit individual steps inline, or reject before execution
Chat — read-only Q&A with two-phase workspace indexing

Cycle through modes with Shift+Tab.

Complete tool suite

Tool	Description
`read_file`	UTF-8 file read, sandboxed to workspace
`write_file`	Atomic write via temp-file swap (new files only)
`edit_file`	`old_str` → `new_str` replacement with two-pass whitespace-tolerant matching
`bash`	Shell execution (PowerShell on Windows, sh on Unix); timeout 60–300s; blocks interactive commands
`list_dir`	Directory listing; skips `.git`, `node_modules`, `__pycache__`, `.venv`
`spawn_subagent`	Delegates subtasks to built-in researcher / reviewer / planner agents with restricted tool sets

Slash commands with autocomplete dropdown /help /clear /compact /context /init /index /chat /edit /plan /scope /auto /exit

Dynamic Context Block (DCB) An ephemeral user-role message injected before each inner LLM call. It contains the active tool list, a tier-specific format hint, the last 10 session actions, and the current plan state. It is never stored in history and never consumes persistent context.

WorkflowLock A state machine that arms when read_file is called and forces the next tool call to be edit_file or write_file on the same path. Prevents the common drift pattern where a model reads a file and then moves on without making the edit.

Weak Model Bridge Detects when a smaller model writes correct code as a prose text block instead of a tool call, extracts the code, and injects a precision edit_file directive so the edit lands regardless.

Circuit Breaker Detects stuck patterns — the same read-only tool called on the same file repeatedly with no substantive progress — and injects a redirection to break the loop.

Session-wide codebase cache Chat mode caches file contents and a codebase summary per session. Edit mode preloads scoped files into the conversation so the model can call edit_file directly without a redundant read.

Context auto-compaction When conversation history reaches 75% of the effective context window, a summarization pass compresses old turns. A cooldown prevents thrashing after low-value compactions. Also available on demand with /compact.

Decision logging Every guard evaluation, tier adaptation, parse attempt, repair heuristic, and tool execution is recorded to ~/.codii/sessions/<id>/decisions.jsonl. Review with codii decisions.

Session replay Full conversation transcripts are saved per session and can be replayed with codii replay.

Beautiful terminal UI Rich library panels, spinning probe loader, live token counter (used / cap with percentage), keyboard-navigable menus, and masked password input for OpenRouter setup.

Supported Model Families

Family	Format Detected	Typical Tier
Gemma 4	`gemma4_tokens`	1
Qwen / QwQ (Qwen2.5, Qwen3)	`qwen_json`	1 or 2
Mistral / Devstral / Mixtral	`mistral_json`	1 or 2
Hermes (NousResearch)	`hermes_xml`	2
LLaMA / LLaMA-3	`openai_json`	1 or 2
DeepSeek	`openai_json`	1
Phi	detected by name	2 or 3
Any OpenAI-compatible	native	1
Other / unknown	fallback parser	2 or 3

If a model emits an unrecognized format, Codii falls back to the generic parser, which applies multiple repair heuristics before giving up.

Modes

Auto (default)

The full agent loop. All tools available, all guards active. Best for open-ended "implement this feature" or "fix this bug" requests where Codii should determine the steps autonomously.

Chat (`/chat`)

Read-only. Only read_file and list_dir are available. On first invocation Codii indexes the workspace: reads the most important files, builds a codebase summary, and caches it for the session. Subsequent questions answer from the cache without re-reading. Best for "how does X work" or "where is Y defined" questions.

Edit (`/edit`)

Scoped file editing. Mention files in your prompt or use /scope to restrict the edit surface. Scoped files are preloaded into the conversation so the model can call edit_file directly without a redundant read. Best for targeted, surgical changes to known files.

Plan (`/plan`)

Two-phase execution. First, Codii generates a numbered step-by-step plan and presents it in a keyboard-navigable UI — approve, edit individual steps inline, or reject. After approval, execution begins with the active plan shown in the Dynamic Context Block at every step. Best for multi-file refactors or complex tasks where you want to review before any files change.

Slash Commands

Command	Description
`/help`	Show available commands
`/clear`	Clear conversation history (preserves session metadata)
`/compact`	Summarize history to reclaim context tokens
`/context`	Show token usage breakdown (used / cap / %)
`/init`	Generate `CODII.md` — LLM-analyzed project documentation
`/index`	Re-index workspace files for Chat mode
`/chat`	Switch to Chat mode
`/edit`	Switch to Edit mode
`/plan`	Switch to Plan mode
`/plan edit`	Open the current plan for inline step editing
`/scope`	Show or update the current edit scope
`/auto`	Toggle auto-approval (skip per-tool confirmation prompts)
`/exit`	Exit the session

CLI Reference

Command	Description
`codii`	Start a session (first-run setup if unconfigured)
`codii probe`	Re-run the capability probe for the current model
`codii connect <url>`	Connect to a new backend, list models, probe
`codii decisions`	Show the decision log from the most recent session
`codii replay`	List and replay past session transcripts
`codii fingerprint show`	Display the current model fingerprint
`codii fingerprint edit`	Open the fingerprint in `$EDITOR` for manual tuning
`codii fingerprint list`	List all stored fingerprints
`codii serve`	Start Anthropic Messages API shim (requires `[serve]` extra)

Global flags: --backend, --auto

codii probe additionally accepts: --backend, --endpoint, --model, --verbose

Architecture Overview

codii (CLI)
  │
  ├── BackendAdapter ── Ollama / vLLM / LM Studio / OpenRouter / Generic OpenAI-compat
  │
  ├── Probe Pipeline ── tool_call → context_window → reasoning → structured_output
  │       └── CapabilityFingerprint ── stored in ~/.codii/fingerprints/
  │
  ├── Scaffolding Selector ── picks Tier 1 / 2 / 3 from fingerprint
  │       └── tier{1,2,3}.txt ── system prompts per tier
  │
  ├── Parser Dispatch ── gemma4 / qwen / hermes / mistral / generic
  │
  └── Session (TAOR Loop)
        ├── AgentCore ── Think → Execute → Verify per turn
        │     ├── WorkflowLock ── arm on read_file, force next call to edit/write
        │     ├── Weak Model Bridge ── extract code-as-text → inject edit_file
        │     ├── Circuit Breaker ── detect and interrupt stuck patterns
        │     └── ContextInjector ── Dynamic Context Block (ephemeral, not stored)
        │
        ├── Session State
        │     ├── conversation history
        │     ├── reads_this_session (read-before-write guard)
        │     ├── action_log (last 20 entries)
        │     ├── file_cache (chat indexing + edit preloading)
        │     └── PlanState (step index + DCB rendering)
        │
        └── Tools
              read_file / write_file / edit_file / bash / list_dir / spawn_subagent

Everything is sandboxed to the workspace directory. Tools reject paths that escape via symlinks or .. traversal. Sensitive paths (.env, .ssh, .aws, .gnupg, credentials) are blocked regardless of workspace location.

Project Status

v0.1.0 — current release.

The probe pipeline, fingerprint system, tier-based scaffolding, and agentic core are stable. All four modes (auto, chat, edit, plan), the full slash command set, and the terminal UI are implemented and working.

With capable models (31B+ quantized, or cloud models via OpenRouter): the agent loop is reliable. WorkflowLock, the Circuit Breaker, and the Dynamic Context Block collectively keep the model on track through long multi-step tasks.

With smaller models (7B, 2B): models generally produce correct code but can struggle with consistent tool-call formatting. The Weak Model Bridge handles the most common failure mode (code written as a text block) but doesn't resolve every formatting failure. Tier 2 and Tier 3 scaffolding improve reliability significantly for these models.

Active development continues. The public API is not yet stable.

Contributing

git clone <repo-url>
cd codii
pip install -e ".[dev]"

pytest                           # run tests
ruff check src/ tests/           # lint
ruff format src/                 # format
mypy --strict src/codii          # type check

Open an issue to discuss large changes before submitting a PR. If a PR changes behavior described in SYSTEM_DESIGN_DOCUMENT.md, update the document in the same commit.

No Telemetry

Codii makes no outbound requests except to your configured endpoint. No usage data, no error reporting, no analytics. For local backends, all traffic stays on your machine. Verifiable by reading src/codii/connection/ — the only HTTP client in the project.

Local models, full agentic coding, no subscription.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
colab		colab
src/codii		src/codii
tests		tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Codii — Local Models with Agentic Power

What is Codii?

Why Codii Exists

Installation

Quick Start

Key Features

Supported Model Families

Modes

Auto (default)

Chat (`/chat`)

Edit (`/edit`)

Plan (`/plan`)

Slash Commands

CLI Reference

Architecture Overview

Project Status

Contributing

No Telemetry

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Codii — Local Models with Agentic Power

What is Codii?

Why Codii Exists

Installation

Quick Start

Key Features

Supported Model Families

Modes

Auto (default)

Chat (/chat)

Edit (/edit)

Plan (/plan)

Slash Commands

CLI Reference

Architecture Overview

Project Status

Contributing

No Telemetry

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Chat (`/chat`)

Edit (`/edit`)

Plan (`/plan`)

Packages