Luna Code is an agentic coding assistant that runs entirely on OpenRouter. Point it at any model OpenRouter supports with just an API key and a model id — no other configuration required.
It's built for fast codebase navigation, efficient agentic sessions, and prompt-cache hit optimization so long sessions stay cheap and fast.
- One-line setup. Paste an OpenRouter API key, pick a model, go.
- Three modes
- Standard — approve each file edit and shell command before it runs.
- Auto — runs edits and commands autonomously without prompting (commands
in
lunacode.alwaysDenyCommandsare still hard-blocked). - Plan — read-only research + planning. The agent investigates the code and proposes a concrete plan, but can't edit files or run mutating commands.
- Session history. Every conversation is saved per-workspace. Click the history button in the header to browse, reload, or delete prior sessions.
- Usage & cost analytics. A live meter shows session-total cost plus the last turn's tokens and cache-hit rate. The usage window (bar-chart button) reports spend and tokens over the last 30 / 60 / 90 days, with a daily cost chart, daily token chart, and a per-model cost/usage breakdown.
- Refined "thinking" UX. While the model reasons, an animated Thinking… indicator sits just above the composer; when it finishes it collapses to a quiet "Thought for Ns" marker — no noisy, expandable reasoning blocks.
- Agentic tool loop. Reads files, lists/globs/greps the workspace, runs builds & tests, checks language-server diagnostics, and edits code — looping until the task is done. Independent read-only lookups batched into one response run in parallel, and after every edit the file's language-server errors are auto-appended to the tool result so the model fixes breakage without an extra round-trip.
- Explore sub-agent. An
exploretool delegates open-ended research ("how does auth work here?") to a disposable sub-agent with its own context (lunacode.subagentModel, cheap model recommended). Only the digest returns to the main conversation, keeping it small and cache-friendly. - Surgical reads. A
file_outlinetool (language-server symbols with line ranges) plusread_fileoffset/limit paging — the agent pulls 40 relevant lines instead of whole files. - Project memory. A
LUNA.mdat the workspace root is loaded into the system prompt every session; the agent is instructed to record durable conventions and gotchas there. - Turn checkpoints. Files changed by an agent turn are snapshotted; an ↩ revert chip in the meter restores the last turn's edits (stack of 10).
- MCP servers. Connect stdio Model Context Protocol servers via
lunacode.mcpServers(or the settings GUI); their tools appear to the agent asmcp__<server>__<tool>with approval gating in Standard mode. - Cache-warmth dot. The meter shows whether the provider prompt cache is
still warm (~5 min TTL) — a cold cache means the next message re-writes it
at full input price. Optional pre-warm (
lunacode.prewarmCache) writes the cache when a session opens so the first message starts warm. - Mid-turn steering. Messages sent while the agent is working are injected into the running task at the next step — no waiting for the turn to end.
- Live task checklist. For multi-step work the agent maintains a visible
plan (
set_tasks) rendered above the composer with per-step progress. - Eager tool execution. Read-only tool calls start running while the model is still streaming the rest of its response, overlapping generation with I/O.
- Resilient streaming. Automatic retry with backoff on 429/5xx before any
tokens arrive, a hung-stream watchdog, and optional fallback models
(
lunacode.fallbackModels) via OpenRouter routing — a status line notes when a fallback served the response. - @-file mentions. Type
@in the composer for a fuzzy file picker; the chosen path is inserted so the agent reads exactly the file you meant. - Turn review & commit. The
±chip shows side-by-side diffs of the last turn's edits with one-clickgitcommit (message generated by the cheap summarizer model);↻/✎chips retry or edit-and-resend your last message. - Context inspector. Click the session cost in the meter to see exactly what's in the context window: totals vs budget, system-prompt size, the largest items, and the estimated cost of the next (cached) call.
- Multi-file patches. An
apply_patchtool edits many files in one model round-trip instead of one call per file. - Background processes.
start_process/read_process/stop_processlet the agent run a dev server, probe it, read the logs, and iterate. - Session budget guardrail.
lunacode.sessionBudgetUsdpauses the agent (even in Auto mode) and asks before spending past your limit. - Editor-aware. Each message can carry your active file + selection
(
lunacode.includeActiveFile); right-click menu adds Fix Problems in This File, Refactor Selection…, and Explain Selection; and every diagnostic's lightbulb offers Fix with Luna Code. Multi-root workspaces pick their working folder via Select Working Folder. - Slash commands.
/commit,/review,/testsbuilt in, plus your own templates vialunacode.customCommands— with autocomplete in the composer. - Image paste. Paste screenshots into the composer (up to 3, multimodal models via OpenRouter).
- Worktree sandbox.
lunacode.worktreeModeruns the agent in a separate git worktree; merge or discard its changes via the command palette. - Format after edit. Optionally run the workspace formatter on every file
the agent touches (
lunacode.formatAfterEdit). - Calm, readable streaming. Scrolling up pauses auto-follow; long code
blocks are height-capped with click-to-expand; a live
~N tokcounter shows progress during long silent generations; and an actions menu (⋯) gathers review/revert/retry/edit/export with plain-text labels. - Live tool output. Commands stream their stdout into the tool card as they run (last few lines, click for the full log), background processes show their startup output, and the explore sub-agent's lookups stream into its card so its research is visible.
- Monorepo memory. Nested
LUNA.mdfiles in subdirectories load alongside the root one, each labeled with its path. - Cache-hit optimized. A stable system-prompt prefix plus rolling
cache_controlbreakpoints maximize provider prompt caching (Anthropic / Gemini via OpenRouter; automatic for OpenAI). The composer shows a live cache hit % and token/cost meter. - Context management. Cache-aware compaction: history stays append-only
(so prompt-cache hits keep landing) until a price-aware budget is hit —
sized from the model's context window and its input price, so a fully
cached turn stays under a target cost (
lunacode.autoBudgetCarryCostUsd). A compaction event then supersedes stale duplicate file reads and replaces the oldest turns with a structured checkpoint summary written by a cheap summarizer model (lunacode.summarizerModel), driving the context down to a floor (lunacode.compactionTargetRatio) so events stay rare. - Settings GUI. A gear button in the panel header opens an in-chat settings sheet — models, context/cost budgets, generation, privacy routing, and command allow/deny lists — with instant apply and two-way sync with VS Code's settings editor.
- Modern UI. A clean neutral-dark interface with purple accents, streaming responses, collapsible reasoning, tool cards, and inline diff approvals.
- Open anywhere. Use Luna Code in the Activity Bar sidebar or pop it out into an editor tab (button in the panel title bar). To dock it on the right like Claude Code, drag the Luna Code icon into the Secondary Side Bar — VS Code remembers the placement. All surfaces share one live session.
- Private by default. Every request sends OpenRouter
provider.data_collection: "deny", so traffic is only routed to providers that do not store or train on your prompts. An optional stricter Zero-Data-Retention (ZDR) mode is available. - Secure. Your API key is stored in VS Code's encrypted
SecretStorage, never in settings or files.
- Build the extension:
npm install npm run compile - Press F5 in VS Code to launch the Extension Development Host.
- Click the Luna Code icon in the Activity Bar.
- Click Set OpenRouter API Key and paste your key (
sk-or-v1-…). - Click the model chip in the header to pick a model (or browse all OpenRouter models).
- Type a request and hit Enter.
| Shortcut | Action |
|---|---|
Ctrl/Cmd + Shift + L |
Focus the Luna Code chat |
Ctrl/Cmd + Shift + K |
Add the current editor selection to chat |
All settings live under the lunacode.* namespace (Settings → Extensions → Luna Code):
| Setting | Default | Description |
|---|---|---|
lunacode.model |
deepseek/deepseek-v4-flash |
OpenRouter model id (use the picker / Browse all for current ids). |
lunacode.baseUrl |
https://openrouter.ai/api/v1 |
API base URL (override for proxies). |
lunacode.defaultMode |
standard |
standard | auto | plan. |
lunacode.maxTokens |
0 |
Max tokens per turn. 0 = use the model's full output limit (avoids truncating large write_file calls). |
lunacode.temperature |
0 |
Sampling temperature. |
lunacode.enablePromptCaching |
true |
Insert cache_control breakpoints. |
lunacode.dataCollection |
deny |
deny routes only to providers that don't store/train on prompts; allow permits all. |
lunacode.zeroDataRetention |
false |
Stricter: only route to Zero-Data-Retention endpoints. |
lunacode.maxContextTokens |
180000 |
Budget before older context is compacted. |
lunacode.autoApproveCommands |
common read-only cmds | Auto-approved even in Standard mode. |
lunacode.alwaysDenyCommands |
destructive cmds | Always blocked, any mode. |
| Tool | Mutating | Purpose |
|---|---|---|
read_file |
no | Read a file (with paging). |
list_dir |
no | List a directory. |
glob |
no | Find files by glob pattern. |
grep |
no | Regex search across the workspace. |
get_diagnostics |
no | Read language-server errors/warnings. |
write_file |
yes | Create/overwrite a file. |
edit_file |
yes | Exact-string targeted edit. |
run_command |
yes | Run a shell command (PowerShell on Windows, sh elsewhere). |
Mutating tools are hidden entirely in Plan mode and gated by approval in Standard mode.
OpenRouter forwards cache_control breakpoints to providers that support prompt
caching. Luna Code:
- Keeps the system prompt + tool definitions byte-stable across a session and marks the end of the system prompt as a cache breakpoint.
- Places a rolling breakpoint on the latest message each request so the entire accumulated conversation becomes a cached prefix for the next call.
- Only ever appends to the message list, never reorders, so prefixes stay valid for cache reuse.
For OpenAI models, caching is automatic and these hints are safely ignored.
src/
extension.ts activation + commands
config.ts settings + SecretStorage for the API key
modes.ts Standard / Auto / Plan definitions
openrouter/
client.ts streaming Chat Completions client
types.ts message + cache_control types
agent/
agent.ts the agentic tool loop
systemPrompt.ts system prompt (stable cache prefix)
contextManager.ts cache breakpoints + context compaction
tools/ read/write/edit/list/glob/grep/run/diagnostics
webview/
provider.ts webview host + approval bridge
protocol.ts host <-> webview message types
ui/ the webview front-end (main.ts, markdown.ts)
media/
webview.css the dark-purple theme
MIT