AI coding agent optimized for small LLMs (8B-35B parameters)
SmallCode is a terminal-native coding agent designed from the ground up to extract useful work from local models (8B-35B) running on consumer hardware. While tools like OpenCode assume frontier models with 128k+ context and perfect tool calling, SmallCode compensates for the limitations of small models through intelligent architecture.
Recommended model size: 8B-35B parameters. Smaller models (≤4B) struggle with multi-step tool use and lose context across turns. Larger models (>35B) don't need SmallCode's adaptations and are better served by tools designed for frontier models.
| OpenCode | SmallCode | |
|---|---|---|
| Target | Frontier models (Claude, GPT-5) | 8B-35B local models |
| Context | Dumps everything | Budget-managed, summarized |
| Tool calling | Assumes reliable JSON | Forgiving multi-format parser |
| Planning | Single-shot | TODO-file decomposed steps |
| Editing | Full file write | Search-and-replace patch |
| Privacy | API calls to cloud | Fully local, no network needed |
# Install globally via npm
npm install -g smallcode
# Or run directly with npx
npx smallcode
# Start in your project directory
cd my-project
smallcodePre-compiled tarballs for Windows, macOS, and Linux are built on every release — they bundle Node.js plus all native addons so you never need node-gyp or C++ build tools.
| Platform | One‑line install |
|---|---|
| Linux / macOS | bash <(curl -fsSL https://raw.githubusercontent.com/Doorman11991/smallcode/main/install.sh) |
| Windows | iwr -Uri https://raw.githubusercontent.com/Doorman11991/smallcode/main/install.ps1 -UseBasicParsing | iex |
The install script downloads the correct tarball for your platform, extracts it to ~/.smallcode, and adds it to your PATH. Run smallcode --help to verify.
SmallCode includes BoneScript and budget-aware-mcp as dependencies — everything installs in one go.
- Node.js 18+ (LTS recommended — 20.x or 22.x have prebuilt binaries for SQLite)
- A local LLM server (LM Studio, Ollama, or any OpenAI-compatible endpoint)
Optional (for code graph + FTS5 memory search):
better-sqlite3needs native compilation if prebuilt binaries aren't available for your Node version- Prebuilt binaries exist for Node LTS (20.x, 22.x) on Linux/macOS/Windows. no build tools needed
- If you're on a non-LTS Node (23+, 25+), you'll need:
- Linux:
python3,make,gcc/g++(sudo apt install build-essential python3orpacman -S base-devel python) - macOS: Xcode Command Line Tools (
xcode-select --install) - Windows: Visual Studio Build Tools with "Desktop development with C++" workload, or
npm install -g windows-build-tools
- Linux:
- If build fails, SmallCode still works — it falls back to JSON-based memory automatically
Create a .env file in your project root:
# Required
SMALLCODE_MODEL=your-model-name
SMALLCODE_BASE_URL=http://localhost:1234/v1
# Optional: escalation (auto-fallback to cloud on hard fail)
# ANTHROPIC_API_KEY=sk-ant-...
# OPENAI_API_KEY=sk-...
# DEEPSEEK_API_KEY=sk-...See .env.example for all options. Also supports smallcode.toml for backwards compatibility.
SmallCode is built with a modular architecture:
bin/
├── smallcode.js Entry point, agent loop, TUI orchestration (1570 lines)
├── config.js Config loading, endpoint detection, auth headers
├── executor.js Tool execution (all 18 tools)
├── tools.js Tool definitions + 2-stage routing
├── mcp_bridge.js Built-in code graph MCP communication
├── model_client.js LLM API calls, streaming, validation
├── governor.js Tool scoring, verification, decompose
├── escalation.js Cloud model fallback (Claude/OpenAI/DeepSeek)
├── commands.js TUI slash commands
├── tui.js Classic TUI renderer
└── bonescript_guide.js BoneScript syntax reference
src/
├── api/index.js Programmatic API (require('smallcode'))
├── tui/fullscreen.js Fullscreen alternate-buffer TUI
├── plugins/loader.js Plugin system
├── plugins/skills.js Skill system
├── tools/ Tool routing, MCP client, validators
├── governor/ Early-stop detection, verifier, tool scorer
├── model/ Multi-model profiles + routing
└── session/ Persistence, undo, sharing, references
SmallCode's intelligence is declared in MarrowScript and compiled to a production runtime. One 50-line .marrow declaration generates 1400+ lines of TypeScript with caching, retry, validation, traces, and budget enforcement — all for free.
prompt classify_task_type(user_message: string) {
model: TinyClassifier
timeout: 3s
cache: { key: hash(user_message), ttl: 10m }
retry: { max_attempts: 2, backoff: fixed, interval: 100ms }
constraints: [output in ["coding", "editing", "search", ...]]
}
The compiled cognition layer provides:
- Prompt caching — 0ms on cache hit, content-hash keys with TTL
- Structured traces — trace_id/span_id for every LLM call (enable with
SMALLCODE_COGNITION_LOG=stderr) - Tier-based routing — trivial tasks → tiny model, complex tasks → medium model
- Token budgets — per-cost-class enforcement, never overspend
- Validation + repair — schema checks with auto-retry on malformed output
For Node.js/TypeScript backends, SmallCode uses BoneScript — write ONE .bone file and compile it to a complete project (routes, auth, DB, events, migrations, SDK, admin panel, Docker, CI). Reduces 8-15 tool calls to 1-2, dramatically improving reliability with small models.
When the local model hard fails after retry + decompose, SmallCode can optionally escalate to a stronger cloud model (Claude, OpenAI, DeepSeek). Fully opt-in — requires an API key. Session-limited to prevent runaway costs.
Escalation targets (cloud, used only on hard fail):
- Claude Sonnet 4.5 / 4.6, Haiku 4.5
- GPT-5.4 Mini / Nano
- DeepSeek V4 / V4 Pro / V4 Flash
Never exceeds your model's context window. Tool results capped at 4k chars, mid-turn eviction drops old results when context grows too large, and semantic compression summarizes history instead of dropping it.
Halves the schema context overhead. Model picks a category (read/write/search/run/plan) first, then gets only relevant tool schemas. Critical for models with 8-16k context.
Detects repetition loops, patch spirals (stuck on corrupted file → forces rewrite), and greeting regression (model lost context → re-injects task). Saves tokens and time.
Small models produce messy output. SmallCode parses tool calls from JSON, YAML, XML, Hermes format, or plain text. Auto-repairs common mistakes (wrong param names, type mismatches).
Search-and-replace as the primary edit primitive. Small models can't reliably reproduce entire files — they truncate, hallucinate, or drift. patch is safer and more context-efficient.
Complex tasks get decomposed into atomic steps. The model reads a TODO file each turn to know where it is. Each step is validated (lint/compile) before moving on.
Per-model configuration: context length, tool format (native/hermes/json/xml/text), chat template, strengths/weaknesses. Auto-adapts prompting strategy.
Persistent scratchpad that survives across turns. Compensates for limited reasoning depth — the model can write notes to itself.
bash calls share a long-lived shell process so cd, env vars, and shell variables persist across calls. Without this, every bash call is a fresh process, breaking multi-step tasks like "cd src then run pytest". Optional cwd-containment refuses any cd (or pushd/chdir/sub-shell escape) that would leave the project root. Disable with SMALLCODE_SHELL_PERSIST=false.
Modern reasoning models (Qwen3, DeepSeek R1, GPT-5 reasoning) can spend thousands of tokens "thinking" about trivial tasks. SmallCode caps thinking budget per call (Anthropic budget_tokens, OpenAI reasoning_effort, Qwen enable_thinking, DeepSeek style — all set defensively) and hard-truncates oversize thinking blocks before they enter conversation history. Configure with SMALLCODE_THINKING_BUDGET=2000 (default), or SMALLCODE_THINKING_DISABLE=true to turn off entirely.
Drop short reference notes into a knowledge/ directory and the most relevant ones get injected into the system prompt based on keyword overlap with your message. Designed for small models that benefit from algorithm cheat sheets or syntax reminders inline. See knowledge/README.md for the format. Configurable budget (default 1500 tokens) via SMALLCODE_KNOWLEDGE_MAX_TOKENS.
Tracks which paths the model has read this session. First write_file to an existing unread file is refused with a hint to read_file first; second attempt allowed (so legitimate full-replace intents succeed). New files always permitted. patch counts as a read. Disable with SMALLCODE_WRITE_GUARD=false.
Identical pure-tool calls within a sliding window are short-circuited with a cached result instead of re-executing. Only applies to read-only tools (read_file, search, graph_search, etc.) — never to anything with side effects. Saves both context and latency on small models that loop. Disable with SMALLCODE_DEDUP=false.
Automated capture of "what was tried, what worked, what failed" per task. Stored as searchable memory objects in the existing memory MCP module so they flow through FTS5 + staleness-decay loading on future tasks rather than always hogging context. The model learns from past sessions: it sees that pip install failed last time on this Python version, or that npm test hangs without --run. Disable with SMALLCODE_EVIDENCE_DISABLE=true.
For multi-step tasks (refactors, multi-file features, multi-imperative prompts), SmallCode asks the model to emit a numbered plan FIRST, then re-injects that plan as an anchor on subsequent turns. Reduces drift on long traces — the model can't "forget" step 3 by the time it finishes step 1. Heuristic-based — simple tasks like "create hello.py" don't trigger planning. Configure with SMALLCODE_PLAN=true|false.
Before each agent turn, SmallCode opens a file snapshot checkpoint. Every write_file and patch records its pre-edit content. If validation hard-fails and all retries are exhausted, set SMALLCODE_SNAPSHOT_AUTO_ROLLBACK=true to automatically revert all edits in the turn back to the checkpoint state. All snapshots persisted to .smallcode/snapshots/ for manual audit. Disable with SMALLCODE_SNAPSHOT=false.
Detects your project's test command from config files (package.json, pytest.ini, pyproject.toml, Cargo.toml, go.mod, pom.xml, etc.) and injects it into the system prompt once. The model knows how to run tests without wasting tool calls on discovery. Also surfaces in AUTO-VALIDATE fix prompts. Override with SMALLCODE_TEST_RUNNER=<cmd> or disable with SMALLCODE_TEST_DISABLE=true.
On first turn, scans the workspace and injects a compact project summary: runtime + version, package manager, framework (Next.js/FastAPI/Express/Django/React/Vue/…), entry point, and build/test/run commands. Covers Node, Python, Rust, Go, .NET, Java, Ruby. Eliminates the 3-5 tool calls small models usually spend figuring out what kind of project they're in. Disable with SMALLCODE_BOOTSTRAP=false.
When the improvement loop retries a failed edit, each attempt uses a different temperature so it doesn't produce the same broken output three times. Attempt 1 lowers temperature (deterministic fix), attempt 2 raises it (explore alternatives), attempt 3 returns to base. Delta defaults to 0.15. Disable with SMALLCODE_TEMP_ADAPT=false.
Tracks consecutive failures per tool within a session. Tools that fail 3+ times in a row are soft-demoted (schema list back). Tools that fail 5+ times are dropped from the schema entirely for the session. Prevents the model from looping on a broken MCP server or a search that keeps returning nothing. Resets between runs. Disable with SMALLCODE_TRUST_DECAY=false.
A new code_intel tool routing category detects semantic code questions ("how does X work", "what calls Y", "who inherits from Z"). Routes exclusively to [graph_search, explain_symbol, read_file, find_files, search] — skipping write/run tools. Placed before search in the priority order so inheritance/callers questions get the right tools without any write noise.
When a bash command exits non-zero, diagnoseError() makes a quick LLM call to classify the error type (syntax|runtime|permission|notfound|timeout|unknown), locate the relevant file/line, and emit a one-line fix suggestion. The structured hint is prepended as [ERROR-DIAGNOSIS] to the tool result so the model has typed, located context to act on immediately. Cached 5 min. TTL configurable.
decomposeTask() replaces the hand-rolled pickDecomposeStrategy() regex when a file keeps failing after all retries. The LLM selects a strategy (split_file|one_error_at_a_time|rewrite_section|extract_function) with a reason and concrete 2-3 sentence instruction. Falls back to the regex governor. Cached 5 min.
When 3 or more files are edited in a single agent turn, coordinateMultiFileEdit() injects a [MULTI-FILE-EDIT] header listing all files that need changes. Keeps small models from forgetting file 3 while editing file 2. De-duplicates: only injects once per turn even if called repeatedly.
When patch fails because old_str no longer exists in the file, semanticMerge() asks the model to merge the intended change into the current file content. Returns the complete corrected file. Replaces the hard error with a recovery attempt. TTL 1 min (content-specific).
AdaptiveModelRouter in src/model/adaptive_router.js tracks per-model call/fail counts. When the primary model's failure rate exceeds 0.3 (medium) or 0.6 (strong), chatCompletion automatically overrides body.model with SMALLCODE_MODEL_MEDIUM or SMALLCODE_MODEL_STRONG. Requires at least 3 calls before routing decisions activate. Reset via router.reset().
# Optional: configure fallback models for adaptive routing
SMALLCODE_MODEL_MEDIUM=qwen2.5-coder:32b
SMALLCODE_MODEL_STRONG=gpt-4oRun the included benchmark suite against any local model to measure pass rate across small coding tasks. Three suites: smoke (5 trivial tasks, ~30s), polyglot-mini (19 tasks across Python/JS/TS/Bash/Markdown/JSON), tool-use (10 multi-step tool sequencing tasks). Results persisted to .smallcode/benchmarks/.
npm run bench:smoke
npm run bench:polyglot
npm run bench:tools| Command | Description |
|---|---|
/quit, /q |
Exit SmallCode |
/clear |
Reset conversation |
/stats |
Show session statistics |
/tokens |
Detailed token usage report |
/budget |
Context window budget + visual bar |
/trace |
List/show/export execution traces |
/eval |
Run prompt evaluation suites |
/memory |
Show working memory |
/plan |
Show current task plan |
/model |
Show/switch model |
/profile |
Show detected model profile + routing mode |
/cognition |
Show MarrowScript cognition layer status |
/mcp |
Show connected external MCP servers |
/skill |
Manage reusable skills |
/plugin |
Install/manage plugins |
/sessions |
List/resume saved sessions |
/help |
Show all commands |
SmallCode tracks token usage and execution traces automatically:
- Token Monitor — Every LLM call records prompt/completion tokens. View with
/tokens. - Context Budget — Visual indicator of context window usage. View with
/budget. - Execution Traces — Every agent turn is recorded to
.smallcode/traces/. View with/trace list. - Trace-to-Test — Generate regression tests from traces:
/trace test <id>. - Prompt Evaluations — Measure classifier accuracy and tool selection:
/eval classify_accuracy.
# Run evaluations from CLI
smallcode --eval classify_accuracy
smallcode --eval tool_selectionUse SmallCode as a library in your own tools, CI pipelines, or TypeScript frameworks:
const { SmallCode } = require('smallcode');
const agent = new SmallCode({
model: 'gemma-4-e4b',
baseUrl: 'http://localhost:1234/v1',
});
// Run a task
const result = await agent.run("create hello.py that prints hello world");
console.log(result.filesCreated); // ['hello.py']
console.log(result.toolCalls.length); // 1
console.log(result.success); // true
// Subscribe to events
agent.on('tool_start', ({ name, args }) => console.log(`Using: ${name}`));
agent.on('tool_end', ({ name, ms }) => console.log(`Done: ${name} (${ms}ms)`));
agent.on('error', (err) => console.error(err));Returns a structured RunResult with: response text, tool call records, files created/edited, token usage, duration, and success status.
| Tool | Description |
|---|---|
bone_compile |
Compile .bone to full backend project |
bone_check |
Validate .bone file (type errors, constraints) |
list_projects |
List all indexed projects with stats |
graph_search |
Code graph symbol search |
explain_symbol |
Full symbol explanation (callers, callees) |
read_file |
Read file contents |
write_file |
Create/overwrite files |
patch |
Search-and-replace edit |
bash |
Run shell commands |
search |
Regex search (ripgrep) |
find_files |
Glob file search |
memory_load |
Load relevant project memory |
memory_remember |
Save knowledge to memory |
web_search |
Search the web via DuckDuckGo (requires SMALLCODE_WEB_BROWSE=true) |
web_fetch |
Fetch and extract text from a URL (requires SMALLCODE_WEB_BROWSE=true) |
SmallCode includes Playwright with stealth mode for undetected web browsing. Disabled by default — enable for medium/large models (20B+) that can synthesize web context effectively:
# In your .env
SMALLCODE_WEB_BROWSE=trueWhen enabled, the model can search the web and fetch documentation during tasks. Uses headless Chromium with anti-detection to avoid CAPTCHAs and bot blocks. Falls back to simple HTTP fetch if Playwright isn't available.
MIT