SmallCode

AI coding agent optimized for small LLMs (8B-35B parameters)

SmallCode is a terminal-native coding agent designed from the ground up to extract useful work from local models (8B-35B) running on consumer hardware. While tools like OpenCode assume frontier models with 128k+ context and perfect tool calling, SmallCode compensates for the limitations of small models through intelligent architecture.

Recommended model size: 8B-35B parameters. Smaller models (≤4B) struggle with multi-step tool use and lose context across turns. Larger models (>35B) don't need SmallCode's adaptations and are better served by tools designed for frontier models.

Why SmallCode?

	OpenCode	SmallCode
Target	Frontier models (Claude, GPT-5)	8B-35B local models
Context	Dumps everything	Budget-managed, summarized
Tool calling	Assumes reliable JSON	Forgiving multi-format parser
Planning	Single-shot	TODO-file decomposed steps
Editing	Full file write	Search-and-replace patch
Privacy	API calls to cloud	Fully local, no network needed

Quick Start

# Install globally via npm
npm install -g smallcode

# Or run directly with npx
npx smallcode

# Start in your project directory
cd my-project
smallcode

Prebuilt Binaries (no Node.js needed)

Pre-compiled tarballs for Windows, macOS, and Linux are built on every release — they bundle Node.js plus all native addons so you never need node-gyp or C++ build tools.

Platform	One‑line install
Linux / macOS	`bash <(curl -fsSL https://raw.githubusercontent.com/Doorman11991/smallcode/main/install.sh)`
Windows	`iwr -Uri https://raw.githubusercontent.com/Doorman11991/smallcode/main/install.ps1 -UseBasicParsing \| iex`

The install script downloads the correct tarball for your platform, extracts it to ~/.smallcode, and adds it to your PATH. Run smallcode --help to verify.

SmallCode includes BoneScript and budget-aware-mcp as dependencies — everything installs in one go.

Requirements

Node.js 18+ (LTS recommended — 20.x or 22.x have prebuilt binaries for SQLite)
A local LLM server (LM Studio, Ollama, or any OpenAI-compatible endpoint)

Optional (for code graph + FTS5 memory search):

better-sqlite3 needs native compilation if prebuilt binaries aren't available for your Node version
Prebuilt binaries exist for Node LTS (20.x, 22.x) on Linux/macOS/Windows. no build tools needed
If you're on a non-LTS Node (23+, 25+), you'll need:
- Linux: python3, make, gcc/g++ (sudo apt install build-essential python3 or pacman -S base-devel python)
- macOS: Xcode Command Line Tools (xcode-select --install)
- Windows: Visual Studio Build Tools with "Desktop development with C++" workload, or npm install -g windows-build-tools
If build fails, SmallCode still works — it falls back to JSON-based memory automatically

Configuration

Create a .env file in your project root:

# Required
SMALLCODE_MODEL=your-model-name
SMALLCODE_BASE_URL=http://localhost:1234/v1

# Optional: escalation (auto-fallback to cloud on hard fail)
# ANTHROPIC_API_KEY=sk-ant-...
# OPENAI_API_KEY=sk-...
# DEEPSEEK_API_KEY=sk-...

See .env.example for all options. Also supports smallcode.toml for backwards compatibility.

Architecture

SmallCode is built with a modular architecture:

bin/
├── smallcode.js        Entry point, agent loop, TUI orchestration (1570 lines)
├── config.js           Config loading, endpoint detection, auth headers
├── executor.js         Tool execution (all 18 tools)
├── tools.js            Tool definitions + 2-stage routing
├── mcp_bridge.js       Built-in code graph MCP communication
├── model_client.js     LLM API calls, streaming, validation
├── governor.js         Tool scoring, verification, decompose
├── escalation.js       Cloud model fallback (Claude/OpenAI/DeepSeek)
├── commands.js         TUI slash commands
├── tui.js              Classic TUI renderer
└── bonescript_guide.js BoneScript syntax reference

src/
├── api/index.js        Programmatic API (require('smallcode'))
├── tui/fullscreen.js   Fullscreen alternate-buffer TUI
├── plugins/loader.js   Plugin system
├── plugins/skills.js   Skill system
├── tools/              Tool routing, MCP client, validators
├── governor/           Early-stop detection, verifier, tool scorer
├── model/              Multi-model profiles + routing
└── session/            Persistence, undo, sharing, references

Key Features

MarrowScript Cognition Layer

SmallCode's intelligence is declared in MarrowScript and compiled to a production runtime. One 50-line .marrow declaration generates 1400+ lines of TypeScript with caching, retry, validation, traces, and budget enforcement — all for free.

prompt classify_task_type(user_message: string) {
  model: TinyClassifier
  timeout: 3s
  cache: { key: hash(user_message), ttl: 10m }
  retry: { max_attempts: 2, backoff: fixed, interval: 100ms }
  constraints: [output in ["coding", "editing", "search", ...]]
}

The compiled cognition layer provides:

Prompt caching — 0ms on cache hit, content-hash keys with TTL
Structured traces — trace_id/span_id for every LLM call (enable with SMALLCODE_COGNITION_LOG=stderr)
Tier-based routing — trivial tasks → tiny model, complex tasks → medium model
Token budgets — per-cost-class enforcement, never overspend
Validation + repair — schema checks with auto-retry on malformed output

BoneScript Integration

For Node.js/TypeScript backends, SmallCode uses BoneScript — write ONE .bone file and compile it to a complete project (routes, auth, DB, events, migrations, SDK, admin panel, Docker, CI). Reduces 8-15 tool calls to 1-2, dramatically improving reliability with small models.

Model Escalation

When the local model hard fails after retry + decompose, SmallCode can optionally escalate to a stronger cloud model (Claude, OpenAI, DeepSeek). Fully opt-in — requires an API key. Session-limited to prevent runaway costs.

Escalation targets (cloud, used only on hard fail):

Claude Sonnet 4.5 / 4.6, Haiku 4.5
GPT-5.4 Mini / Nano
DeepSeek V4 / V4 Pro / V4 Flash

Context Budget Engine

Never exceeds your model's context window. Tool results capped at 4k chars, mid-turn eviction drops old results when context grows too large, and semantic compression summarizes history instead of dropping it.

2-Stage Tool Routing

Halves the schema context overhead. Model picks a category (read/write/search/run/plan) first, then gets only relevant tool schemas. Critical for models with 8-16k context.

Early-Stop Detection

Detects repetition loops, patch spirals (stuck on corrupted file → forces rewrite), and greeting regression (model lost context → re-injects task). Saves tokens and time.

Forgiving Tool Call Parser

Small models produce messy output. SmallCode parses tool calls from JSON, YAML, XML, Hermes format, or plain text. Auto-repairs common mistakes (wrong param names, type mismatches).

Patch-First Editing

Search-and-replace as the primary edit primitive. Small models can't reliably reproduce entire files — they truncate, hallucinate, or drift. patch is safer and more context-efficient.

TODO-Driven Planning

Complex tasks get decomposed into atomic steps. The model reads a TODO file each turn to know where it is. Each step is validated (lint/compile) before moving on.

Model Profiles

Per-model configuration: context length, tool format (native/hermes/json/xml/text), chat template, strengths/weaknesses. Auto-adapts prompting strategy.

Working Memory

Persistent scratchpad that survives across turns. Compensates for limited reasoning depth — the model can write notes to itself.

Persistent Shell Sessions

bash calls share a long-lived shell process so cd, env vars, and shell variables persist across calls. Without this, every bash call is a fresh process, breaking multi-step tasks like "cd src then run pytest". Optional cwd-containment refuses any cd (or pushd/chdir/sub-shell escape) that would leave the project root. Disable with SMALLCODE_SHELL_PERSIST=false.

Thinking Budget Control

Modern reasoning models (Qwen3, DeepSeek R1, GPT-5 reasoning) can spend thousands of tokens "thinking" about trivial tasks. SmallCode caps thinking budget per call (Anthropic budget_tokens, OpenAI reasoning_effort, Qwen enable_thinking, DeepSeek style — all set defensively) and hard-truncates oversize thinking blocks before they enter conversation history. Configure with SMALLCODE_THINKING_BUDGET=2000 (default), or SMALLCODE_THINKING_DISABLE=true to turn off entirely.

Knowledge Injection

Drop short reference notes into a knowledge/ directory and the most relevant ones get injected into the system prompt based on keyword overlap with your message. Designed for small models that benefit from algorithm cheat sheets or syntax reminders inline. See knowledge/README.md for the format. Configurable budget (default 1500 tokens) via SMALLCODE_KNOWLEDGE_MAX_TOKENS.

Read-Before-Write Guard

Tracks which paths the model has read this session. First write_file to an existing unread file is refused with a hint to read_file first; second attempt allowed (so legitimate full-replace intents succeed). New files always permitted. patch counts as a read. Disable with SMALLCODE_WRITE_GUARD=false.

Tool-Call Deduplication

Identical pure-tool calls within a sliding window are short-circuited with a cached result instead of re-executing. Only applies to read-only tools (read_file, search, graph_search, etc.) — never to anything with side effects. Saves both context and latency on small models that loop. Disable with SMALLCODE_DEDUP=false.

Evidence Store

Automated capture of "what was tried, what worked, what failed" per task. Stored as searchable memory objects in the existing memory MCP module so they flow through FTS5 + staleness-decay loading on future tasks rather than always hogging context. The model learns from past sessions: it sees that pip install failed last time on this Python version, or that npm test hangs without --run. Disable with SMALLCODE_EVIDENCE_DISABLE=true.

Plan-Then-Execute Mode

For multi-step tasks (refactors, multi-file features, multi-imperative prompts), SmallCode asks the model to emit a numbered plan FIRST, then re-injects that plan as an anchor on subsequent turns. Reduces drift on long traces — the model can't "forget" step 3 by the time it finishes step 1. Heuristic-based — simple tasks like "create hello.py" don't trigger planning. Configure with SMALLCODE_PLAN=true|false.

Snapshot & Auto-Rollback

Before each agent turn, SmallCode opens a file snapshot checkpoint. Every write_file and patch records its pre-edit content. If validation hard-fails and all retries are exhausted, set SMALLCODE_SNAPSHOT_AUTO_ROLLBACK=true to automatically revert all edits in the turn back to the checkpoint state. All snapshots persisted to .smallcode/snapshots/ for manual audit. Disable with SMALLCODE_SNAPSHOT=false.

Test-Runner Auto-Discovery

Detects your project's test command from config files (package.json, pytest.ini, pyproject.toml, Cargo.toml, go.mod, pom.xml, etc.) and injects it into the system prompt once. The model knows how to run tests without wasting tool calls on discovery. Also surfaces in AUTO-VALIDATE fix prompts. Override with SMALLCODE_TEST_RUNNER=<cmd> or disable with SMALLCODE_TEST_DISABLE=true.

Bootstrap Detection

On first turn, scans the workspace and injects a compact project summary: runtime + version, package manager, framework (Next.js/FastAPI/Express/Django/React/Vue/…), entry point, and build/test/run commands. Covers Node, Python, Rust, Go, .NET, Java, Ruby. Eliminates the 3-5 tool calls small models usually spend figuring out what kind of project they're in. Disable with SMALLCODE_BOOTSTRAP=false.

Adaptive Retry Temperature

When the improvement loop retries a failed edit, each attempt uses a different temperature so it doesn't produce the same broken output three times. Attempt 1 lowers temperature (deterministic fix), attempt 2 raises it (explore alternatives), attempt 3 returns to base. Delta defaults to 0.15. Disable with SMALLCODE_TEMP_ADAPT=false.

Per-Tool Trust Score Decay

Tracks consecutive failures per tool within a session. Tools that fail 3+ times in a row are soft-demoted (schema list back). Tools that fail 5+ times are dropped from the schema entirely for the session. Prevents the model from looping on a broken MCP server or a search that keeps returning nothing. Resets between runs. Disable with SMALLCODE_TRUST_DECAY=false.

Code Intel Category (Rank 2)

A new code_intel tool routing category detects semantic code questions ("how does X work", "what calls Y", "who inherits from Z"). Routes exclusively to [graph_search, explain_symbol, read_file, find_files, search] — skipping write/run tools. Placed before search in the priority order so inheritance/callers questions get the right tools without any write noise.

Error Diagnosis (Rank 4)

When a bash command exits non-zero, diagnoseError() makes a quick LLM call to classify the error type (syntax|runtime|permission|notfound|timeout|unknown), locate the relevant file/line, and emit a one-line fix suggestion. The structured hint is prepended as [ERROR-DIAGNOSIS] to the tool result so the model has typed, located context to act on immediately. Cached 5 min. TTL configurable.

Decompose Task (Rank 5)

decomposeTask() replaces the hand-rolled pickDecomposeStrategy() regex when a file keeps failing after all retries. The LLM selects a strategy (split_file|one_error_at_a_time|rewrite_section|extract_function) with a reason and concrete 2-3 sentence instruction. Falls back to the regex governor. Cached 5 min.

Multi-File Edit Coordination (Rank 6)

When 3 or more files are edited in a single agent turn, coordinateMultiFileEdit() injects a [MULTI-FILE-EDIT] header listing all files that need changes. Keeps small models from forgetting file 3 while editing file 2. De-duplicates: only injects once per turn even if called repeatedly.

Semantic Merge (Rank 7)

When patch fails because old_str no longer exists in the file, semanticMerge() asks the model to merge the intended change into the current file content. Returns the complete corrected file. Replaces the hard error with a recovery attempt. TTL 1 min (content-specific).

Adaptive Model Select (Rank 8)

AdaptiveModelRouter in src/model/adaptive_router.js tracks per-model call/fail counts. When the primary model's failure rate exceeds 0.3 (medium) or 0.6 (strong), chatCompletion automatically overrides body.model with SMALLCODE_MODEL_MEDIUM or SMALLCODE_MODEL_STRONG. Requires at least 3 calls before routing decisions activate. Reset via router.reset().

# Optional: configure fallback models for adaptive routing
SMALLCODE_MODEL_MEDIUM=qwen2.5-coder:32b
SMALLCODE_MODEL_STRONG=gpt-4o

Benchmark Harness

Run the included benchmark suite against any local model to measure pass rate across small coding tasks. Three suites: smoke (5 trivial tasks, ~30s), polyglot-mini (19 tasks across Python/JS/TS/Bash/Markdown/JSON), tool-use (10 multi-step tool sequencing tasks). Results persisted to .smallcode/benchmarks/.

npm run bench:smoke
npm run bench:polyglot
npm run bench:tools

Commands

Command	Description
`/quit`, `/q`	Exit SmallCode
`/clear`	Reset conversation
`/stats`	Show session statistics
`/tokens`	Detailed token usage report
`/budget`	Context window budget + visual bar
`/trace`	List/show/export execution traces
`/eval`	Run prompt evaluation suites
`/memory`	Show working memory
`/plan`	Show current task plan
`/model`	Show/switch model
`/profile`	Show detected model profile + routing mode
`/cognition`	Show MarrowScript cognition layer status
`/mcp`	Show connected external MCP servers
`/skill`	Manage reusable skills
`/plugin`	Install/manage plugins
`/sessions`	List/resume saved sessions
`/help`	Show all commands

Observability

SmallCode tracks token usage and execution traces automatically:

Token Monitor — Every LLM call records prompt/completion tokens. View with /tokens.
Context Budget — Visual indicator of context window usage. View with /budget.
Execution Traces — Every agent turn is recorded to .smallcode/traces/. View with /trace list.
Trace-to-Test — Generate regression tests from traces: /trace test <id>.
Prompt Evaluations — Measure classifier accuracy and tool selection: /eval classify_accuracy.

# Run evaluations from CLI
smallcode --eval classify_accuracy
smallcode --eval tool_selection

Programmatic API

Use SmallCode as a library in your own tools, CI pipelines, or TypeScript frameworks:

const { SmallCode } = require('smallcode');

const agent = new SmallCode({
  model: 'gemma-4-e4b',
  baseUrl: 'http://localhost:1234/v1',
});

// Run a task
const result = await agent.run("create hello.py that prints hello world");
console.log(result.filesCreated);  // ['hello.py']
console.log(result.toolCalls.length);  // 1
console.log(result.success);  // true

// Subscribe to events
agent.on('tool_start', ({ name, args }) => console.log(`Using: ${name}`));
agent.on('tool_end', ({ name, ms }) => console.log(`Done: ${name} (${ms}ms)`));
agent.on('error', (err) => console.error(err));

Returns a structured RunResult with: response text, tool call records, files created/edited, token usage, duration, and success status.

Tools

Tool	Description
`bone_compile`	Compile .bone to full backend project
`bone_check`	Validate .bone file (type errors, constraints)
`list_projects`	List all indexed projects with stats
`graph_search`	Code graph symbol search
`explain_symbol`	Full symbol explanation (callers, callees)
`read_file`	Read file contents
`write_file`	Create/overwrite files
`patch`	Search-and-replace edit
`bash`	Run shell commands
`search`	Regex search (ripgrep)
`find_files`	Glob file search
`memory_load`	Load relevant project memory
`memory_remember`	Save knowledge to memory
`web_search`	Search the web via DuckDuckGo (requires `SMALLCODE_WEB_BROWSE=true`)
`web_fetch`	Fetch and extract text from a URL (requires `SMALLCODE_WEB_BROWSE=true`)

Web Browsing

SmallCode includes Playwright with stealth mode for undetected web browsing. Disabled by default — enable for medium/large models (20B+) that can synthesize web context effectively:

# In your .env
SMALLCODE_WEB_BROWSE=true

When enabled, the model can search the web and fetch documentation during tasks. Uses headless Chromium with anti-detection to avoid CAPTCHAs and bot blocks. Falls back to simple HTTP fetch if Playwright isn't available.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 139 Commits
.github/workflows		.github/workflows
.qartez		.qartez
.smallcode/plugins/hello-plugin		.smallcode/plugins/hello-plugin
bench		bench
bin		bin
extensions		extensions
knowledge		knowledge
marrow		marrow
profiles		profiles
src		src
tmpsmallcode-artifactslinux		tmpsmallcode-artifactslinux
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.npmignore		.npmignore
BONESCRIPT_INTEGRATION.md		BONESCRIPT_INTEGRATION.md
CHANGELOG.md		CHANGELOG.md
COMPARISON.md		COMPARISON.md
LICENSE		LICENSE
PLAN.md		PLAN.md
PLAN_MARROWSCRIPT.md		PLAN_MARROWSCRIPT.md
README.md		README.md
README_zh-CN.md		README_zh-CN.md
SECURITY.md		SECURITY.md
build.js		build.js
install.ps1		install.ps1
install.sh		install.sh
mcp.json		mcp.json
package-lock.json		package-lock.json
package.json		package.json
smallcode.toml		smallcode.toml

Folders and files

Latest commit

History

Repository files navigation

SmallCode

Why SmallCode?

Quick Start

Prebuilt Binaries (no Node.js needed)

Requirements

Configuration

Architecture

Key Features

MarrowScript Cognition Layer

BoneScript Integration

Model Escalation

Context Budget Engine

2-Stage Tool Routing

Early-Stop Detection

Forgiving Tool Call Parser

Patch-First Editing

TODO-Driven Planning

Model Profiles

Working Memory

Persistent Shell Sessions

Thinking Budget Control

Knowledge Injection

Read-Before-Write Guard

Tool-Call Deduplication

Evidence Store

Plan-Then-Execute Mode

Snapshot & Auto-Rollback

Test-Runner Auto-Discovery

Bootstrap Detection

Adaptive Retry Temperature

Per-Tool Trust Score Decay

Code Intel Category (Rank 2)

Error Diagnosis (Rank 4)

Decompose Task (Rank 5)

Multi-File Edit Coordination (Rank 6)

Semantic Merge (Rank 7)

Adaptive Model Select (Rank 8)

Benchmark Harness

Commands

Observability

Programmatic API

Tools

Web Browsing

License

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 9

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages