Skip to content

Serhii2009/codii

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Codii — Local Models with Agentic Power

A terminal-native agentic coding runtime that adapts to your model, not the other way around.

PyPI version Python 3.11+ License: MIT


What is Codii?

Codii is a self-contained terminal tool that turns any local or cloud language model into a working coding agent. Point it at an Ollama server, an LM Studio instance, a vLLM deployment, or OpenRouter — Codii probes the model's real capabilities, selects the right scaffolding strategy, and runs a full agentic loop that reads, edits, and executes code in your codebase.

No cloud subscription required for local models. No vendor lock-in.


Why Codii Exists

Agentic coding runtimes were designed around frontier models — large, well-behaved, with native function calling, reliable JSON output, and enormous context windows. Smaller open-weight models don't share those properties, and behavior degrades as a result: the model narrates what it would do instead of doing it. Files aren't created. Commands aren't run. The loop doesn't close.

The cause isn't model quality. It's scaffolding mismatch.

Research consistently shows that varying the scaffolding around a fixed model produces larger performance swings than varying the model inside a fixed scaffold. Codii acts on this finding. Before any agentic action is taken, it probes the connected model across four dimensions, builds a capability fingerprint, and selects a system prompt, tool-call format, and guard-rail configuration tuned to that specific model's measured behavior.

The harness adapts to the model.


Installation

Recommended (isolates codii from your project environments):

pipx install codii

Alternative:

pip install codii

Both produce a codii command available immediately in your terminal.

Python 3.11 or later required.

Optional — serve mode (Anthropic Messages API shim backed by your local model):

pip install "codii[serve]"

Quick Start

First run — interactive backend setup wizard:

codii

Codii auto-detects any locally-running Ollama or LM Studio instance, lists available models, probes the selected one, and drops you into a session.

Specify a backend explicitly:

codii --backend ollama
codii --backend openrouter
codii --backend vllm

Connect to any OpenAI-compatible endpoint:

codii connect http://localhost:8000

Other useful commands:

codii probe                  # Re-run capability probe for the current model
codii decisions              # View the decision log from the last session
codii replay                 # List and replay past session transcripts
codii fingerprint show       # Display the current model fingerprint
codii fingerprint edit       # Open fingerprint in $EDITOR for manual tuning
codii fingerprint list       # List all stored fingerprints
codii serve                  # Start Anthropic Messages API shim (requires [serve])

Key Features

Multi-backend support Ollama (default :11434), LM Studio (default :1234), vLLM, OpenRouter, and any generic OpenAI-compatible endpoint. The first-run wizard auto-detects local backends. OpenRouter setup includes a masked API key prompt.

Automatic capability probing Four sequential probes run before the first session and results are cached per model:

  • Tool-call format detection — sends a test tool definition and classifies the model's raw response format (gemma4_tokens, hermes_xml, qwen_json, mistral_json, openai_json, or none)
  • Effective context window measurement — needle-in-haystack test at 25%, 50%, and 90% of the model's claimed context depth
  • Reasoning token support detection — identifies <think>, <thinking>, <|begin_of_thought|>, and similar delimiters
  • Structured output reliability — five JSON-schema requests scored as a 0–1 reliability fraction

Three-tier adaptive scaffolding System prompt, tool format, and guard rails are selected from the fingerprint. Tiers adapt dynamically within a session on repeated parse failures or successes.

Tier Condition Tool Format
1 Native tool calling AND reliability ≥ 0.9 AND context ≥ 16k tokens Native JSON function definitions
2 Structured output reliability ≥ 0.5 Prompt-engineered XML with inline examples
3 Structured output reliability < 0.5 Guided JSON, one call at a time

Four interactive modes

  • Auto (default) — full tool suite, all guards active
  • Edit — scoped file editing with edit_file discipline enforced
  • Plan — LLM generates a numbered plan; keyboard UI to approve, edit individual steps inline, or reject before execution
  • Chat — read-only Q&A with two-phase workspace indexing

Cycle through modes with Shift+Tab.

Complete tool suite

Tool Description
read_file UTF-8 file read, sandboxed to workspace
write_file Atomic write via temp-file swap (new files only)
edit_file old_strnew_str replacement with two-pass whitespace-tolerant matching
bash Shell execution (PowerShell on Windows, sh on Unix); timeout 60–300s; blocks interactive commands
list_dir Directory listing; skips .git, node_modules, __pycache__, .venv
spawn_subagent Delegates subtasks to built-in researcher / reviewer / planner agents with restricted tool sets

Slash commands with autocomplete dropdown /help /clear /compact /context /init /index /chat /edit /plan /scope /auto /exit

Dynamic Context Block (DCB) An ephemeral user-role message injected before each inner LLM call. It contains the active tool list, a tier-specific format hint, the last 10 session actions, and the current plan state. It is never stored in history and never consumes persistent context.

WorkflowLock A state machine that arms when read_file is called and forces the next tool call to be edit_file or write_file on the same path. Prevents the common drift pattern where a model reads a file and then moves on without making the edit.

Weak Model Bridge Detects when a smaller model writes correct code as a prose text block instead of a tool call, extracts the code, and injects a precision edit_file directive so the edit lands regardless.

Circuit Breaker Detects stuck patterns — the same read-only tool called on the same file repeatedly with no substantive progress — and injects a redirection to break the loop.

Session-wide codebase cache Chat mode caches file contents and a codebase summary per session. Edit mode preloads scoped files into the conversation so the model can call edit_file directly without a redundant read.

Context auto-compaction When conversation history reaches 75% of the effective context window, a summarization pass compresses old turns. A cooldown prevents thrashing after low-value compactions. Also available on demand with /compact.

Decision logging Every guard evaluation, tier adaptation, parse attempt, repair heuristic, and tool execution is recorded to ~/.codii/sessions/<id>/decisions.jsonl. Review with codii decisions.

Session replay Full conversation transcripts are saved per session and can be replayed with codii replay.

Beautiful terminal UI Rich library panels, spinning probe loader, live token counter (used / cap with percentage), keyboard-navigable menus, and masked password input for OpenRouter setup.


Supported Model Families

Family Format Detected Typical Tier
Gemma 4 gemma4_tokens 1
Qwen / QwQ (Qwen2.5, Qwen3) qwen_json 1 or 2
Mistral / Devstral / Mixtral mistral_json 1 or 2
Hermes (NousResearch) hermes_xml 2
LLaMA / LLaMA-3 openai_json 1 or 2
DeepSeek openai_json 1
Phi detected by name 2 or 3
Any OpenAI-compatible native 1
Other / unknown fallback parser 2 or 3

If a model emits an unrecognized format, Codii falls back to the generic parser, which applies multiple repair heuristics before giving up.


Modes

Auto (default)

The full agent loop. All tools available, all guards active. Best for open-ended "implement this feature" or "fix this bug" requests where Codii should determine the steps autonomously.

Chat (/chat)

Read-only. Only read_file and list_dir are available. On first invocation Codii indexes the workspace: reads the most important files, builds a codebase summary, and caches it for the session. Subsequent questions answer from the cache without re-reading. Best for "how does X work" or "where is Y defined" questions.

Edit (/edit)

Scoped file editing. Mention files in your prompt or use /scope to restrict the edit surface. Scoped files are preloaded into the conversation so the model can call edit_file directly without a redundant read. Best for targeted, surgical changes to known files.

Plan (/plan)

Two-phase execution. First, Codii generates a numbered step-by-step plan and presents it in a keyboard-navigable UI — approve, edit individual steps inline, or reject. After approval, execution begins with the active plan shown in the Dynamic Context Block at every step. Best for multi-file refactors or complex tasks where you want to review before any files change.


Slash Commands

Command Description
/help Show available commands
/clear Clear conversation history (preserves session metadata)
/compact Summarize history to reclaim context tokens
/context Show token usage breakdown (used / cap / %)
/init Generate CODII.md — LLM-analyzed project documentation
/index Re-index workspace files for Chat mode
/chat Switch to Chat mode
/edit Switch to Edit mode
/plan Switch to Plan mode
/plan edit Open the current plan for inline step editing
/scope Show or update the current edit scope
/auto Toggle auto-approval (skip per-tool confirmation prompts)
/exit Exit the session

CLI Reference

Command Description
codii Start a session (first-run setup if unconfigured)
codii probe Re-run the capability probe for the current model
codii connect <url> Connect to a new backend, list models, probe
codii decisions Show the decision log from the most recent session
codii replay List and replay past session transcripts
codii fingerprint show Display the current model fingerprint
codii fingerprint edit Open the fingerprint in $EDITOR for manual tuning
codii fingerprint list List all stored fingerprints
codii serve Start Anthropic Messages API shim (requires [serve] extra)

Global flags: --backend, --auto

codii probe additionally accepts: --backend, --endpoint, --model, --verbose


Architecture Overview

codii (CLI)
  │
  ├── BackendAdapter ── Ollama / vLLM / LM Studio / OpenRouter / Generic OpenAI-compat
  │
  ├── Probe Pipeline ── tool_call → context_window → reasoning → structured_output
  │       └── CapabilityFingerprint ── stored in ~/.codii/fingerprints/
  │
  ├── Scaffolding Selector ── picks Tier 1 / 2 / 3 from fingerprint
  │       └── tier{1,2,3}.txt ── system prompts per tier
  │
  ├── Parser Dispatch ── gemma4 / qwen / hermes / mistral / generic
  │
  └── Session (TAOR Loop)
        ├── AgentCore ── Think → Execute → Verify per turn
        │     ├── WorkflowLock ── arm on read_file, force next call to edit/write
        │     ├── Weak Model Bridge ── extract code-as-text → inject edit_file
        │     ├── Circuit Breaker ── detect and interrupt stuck patterns
        │     └── ContextInjector ── Dynamic Context Block (ephemeral, not stored)
        │
        ├── Session State
        │     ├── conversation history
        │     ├── reads_this_session (read-before-write guard)
        │     ├── action_log (last 20 entries)
        │     ├── file_cache (chat indexing + edit preloading)
        │     └── PlanState (step index + DCB rendering)
        │
        └── Tools
              read_file / write_file / edit_file / bash / list_dir / spawn_subagent

Everything is sandboxed to the workspace directory. Tools reject paths that escape via symlinks or .. traversal. Sensitive paths (.env, .ssh, .aws, .gnupg, credentials) are blocked regardless of workspace location.


Project Status

v0.1.0 — current release.

The probe pipeline, fingerprint system, tier-based scaffolding, and agentic core are stable. All four modes (auto, chat, edit, plan), the full slash command set, and the terminal UI are implemented and working.

With capable models (31B+ quantized, or cloud models via OpenRouter): the agent loop is reliable. WorkflowLock, the Circuit Breaker, and the Dynamic Context Block collectively keep the model on track through long multi-step tasks.

With smaller models (7B, 2B): models generally produce correct code but can struggle with consistent tool-call formatting. The Weak Model Bridge handles the most common failure mode (code written as a text block) but doesn't resolve every formatting failure. Tier 2 and Tier 3 scaffolding improve reliability significantly for these models.

Active development continues. The public API is not yet stable.


Contributing

git clone <repo-url>
cd codii
pip install -e ".[dev]"
pytest                           # run tests
ruff check src/ tests/           # lint
ruff format src/                 # format
mypy --strict src/codii          # type check

Open an issue to discuss large changes before submitting a PR. If a PR changes behavior described in SYSTEM_DESIGN_DOCUMENT.md, update the document in the same commit.


No Telemetry

Codii makes no outbound requests except to your configured endpoint. No usage data, no error reporting, no analytics. For local backends, all traffic stays on your machine. Verifiable by reading src/codii/connection/ — the only HTTP client in the project.


Local models, full agentic coding, no subscription.

About

Terminal-native agentic coding runtime for local & cloud models — Ollama, OpenRouter, Colab tunnels. No subscriptions, no limits.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors