PiForge

Hard enforcement for local LLMs running on Pi coding agent.

Local models (35B and under) spiral, forget, and write 800-line files in one shot. PiForge physically prevents that — at the API boundary, not the prompt level — and gives the model an external brain via .think/ files that survive context compression.

Tested with qwen3.6-35b-a3b at Q2_K_XL quantization via LM Studio on macOS. Yes — a 2-bit quantized model doing structured multi-file coding, codebase distillation, and tool-call workflows. The guard stack makes that possible.

What's in the box

8 hard-enforcement extensions (guards)

Extension	What it enforces	Default
`incremental-guard.ts`	Rejects write/edit calls > 100 lines or 6000 chars — forces skeleton → edit workflow	on
`thinking-guard.ts`	Injects correction when thinking block > 2000 chars — stops reasoning spirals	on
`context-monitor.ts`	Steers model to write state files at 65% context, urgent at 80%	on
`analysis-guard.ts`	Forces findings to `.think/step-NNN.md` when response > 1000 chars with no file write	on
`state-guard.ts`	Blocks source reads until `_state.md` is read; forces updates every 5 turns	on
`first-prompt.ts`	Appends "plan in steps, implement one at a time" to first prompt — preventive, zero context overhead	on
`plan-clarify.ts`	Intercepts `_plan.md` writes — forces model to ask ≤3 clarifying questions before any code	off
`knowledge-injector.ts`	Isolated LLM call selects relevant `~/.pi/knowledge/` files, saves manifest, auto re-injects after compaction. `/forget` to remove.	off

These are hard — the model cannot bypass them. incremental-guard and knowledge-injector physically reject tool calls. The others inject steering messages before the next LLM call.

plan-clarify and knowledge-injector are disabled by default — enable per session with /piforge enable <name>. Use /piforge to see status.

Codebase distillation — zoom levels for local models

A local model with 50k context can't hold a real codebase. Reading files one by one is slow, burns context, and the model forgets file #1 by the time it reads file #10. Distill solves this by building compressed versions of the entire codebase at multiple zoom levels — like Google Maps for your code.

The idea: You distill your codebase once. This creates three levels of compressed summaries, all mirroring the original folder structure:

Source (100%)  →  L1 (~50%)  →  L2 (~25%)  →  L3 (~12%)
full code         key logic     signatures     one-liners

When Pi needs to understand the codebase, it doesn't read source files. It queries the right zoom level:

L3 — "What modules exist? What's the architecture?" — fits in a few hundred tokens
L2 — "How does the auth system work?" — function signatures, key relationships
L1 — "Show me the output pipeline logic" — detailed summaries with key code preserved

Pi zooms in only when needed. Most questions are answered at L2/L3 without ever reading source. When Pi does need the actual code, it knows exactly which file to open because L2 already told it where things live.

How it works: Crawls the directory, builds an import graph, topologically sorts files, and processes each file via isolated sub-Pi calls — the main session LLM stays idle and clean. The distilled knowledge persists across sessions.

Extension	What it does	Default
`distill.ts`	`/distill` command + `distill_codebase` LLM-callable tool	on
`distill-query.ts`	`/l1`, `/l2`, `/l3` query commands + `/distill-status`	on
`explore.ts`	`/explore` + `explore_codebase` tool (superseded by distill-query)	off
`distill-awareness.ts`	Session-start context injection (superseded by distill-query)	off

Additional features:

Purpose-driven notes: --purpose "how does auth work?" takes notes on each file during distillation, then synthesizes a comprehensive answer
LLM-callable tool: Pi can call distill_codebase autonomously — no slash command needed
Single file support: Distill one large file with automatic chunking
Auto-detect level: Point at .think/distill/L1/ and it auto-outputs L2
Resume support: --resume continues interrupted distillation

/distill [path]                        # distill directory (default: .)
/distill [path] --purpose "question"   # distill + take notes on question
/distill --resume                      # resume interrupted run
/distill --level 2                     # compress L1 → L2
/distill [path] --ratio 30            # aggressive compression (30%)
/l1 "how does auth work?"             # query L1 summaries directly
/l2 "what modules exist?"             # query L2 summaries directly
/l3 "high-level architecture?"        # query L3 summaries directly
/distill-status                        # show coverage per level

Output structure:

.think/distill/
├── manifest.json      ← state: files, progress per level, config
├── distill.log        ← append-only log
├── L1/                ← mirrors source folder structure, ~50% of source
│   └── src/
│       └── auth.ts.md
├── L2/                ← same structure, ~25% of source
│   └── src/
│       └── auth.ts.md
├── notes/             ← purpose-driven findings (optional)
│   ├── auth-notes.md
│   └── auth-notes-answer.md
└── tmp/               ← prompt files (auto-cleaned)

Session isolation (per-tab `.think/`)

Extension	What it does	Default
`session-manager.ts`	Auto-creates isolated `.think/` per Pi terminal instance via symlinks	on

Every time you open a new Pi terminal, session-manager creates a fresh directory under .think-sessions/ and points the .think/ symlink to it. The model always writes to .think/ — same hardcoded path, zero tokens wasted on session management.

.think-sessions/
  session-001/          ← first Pi tab's state
  session-002/          ← second Pi tab's state
  session-003/          ← third Pi tab's state
.think/ → .think-sessions/session-003/   ← symlink to active session

If .think/ already exists as a real directory (from before the extension), it gets migrated automatically into session-001.

Commands: /sessions (list all), /resume (list + pick), /resume session-001 (switch directly — injects steer to read _state.md)

Purpose anchor (anti-drift after compaction)

Extension	What it does	Default
`purpose-anchor.ts`	Captures session purpose from first prompt, re-injects purpose + state after compaction	on

When context gets compacted, Pi can lose track of the original goal. purpose-anchor solves this:

Saves first user prompt to .think/_purpose.md
Hooks into Pi's session_compact event
After compaction, steers Pi to re-read .think/_state.md and _summary.md
Pi re-orients and continues without drift

Commands: /purpose (view/set), /purpose-clear (reset)

1 soft-enforcement skill

incremental-codegen — SKILL.md that teaches the model the skeleton → edit workflow. Works alongside the hard guards.

Knowledge folder

knowledge/ — inference-time context injection with zero context pollution.

On turn 1, knowledge-injector makes an isolated LLM call using Pi's own model and endpoint. It passes the user's prompt + the knowledge filenames and asks "which are relevant?". The selection reasoning happens in that isolated call — it never appears in Pi's conversation history. Only the selected file content gets injected as a steer.

This means: smart semantic selection (the LLM knows the task), zero reasoning trace in context.

user prompt → isolated call → selects files → injects content only → Pi's main LLM call

Selected filenames are saved to .think/_knowledge-manifest.md. After compaction or session restart, the extension reads the manifest, rebuilds the content from source files, and re-injects automatically — zero LLM cost, no re-selection needed. Use /forget <name> to remove knowledge mid-session.

Code writes are blocked until .think/_knowledge.md is written — proof the model absorbed the knowledge.

Included samples:

svelte5-gotchas.md — Svelte 5 runes failure patterns
astro-gotchas.md — Astro islands, client directives, frontmatter pitfalls

Add your own — name by tech, keep under 500 tokens, failures only:

~/.pi/knowledge/
├── astro-gotchas.md
├── svelte5-gotchas.md
├── react-hooks.md
└── ...

Project template

project-template/AGENTS.md — drop into any project. Tells the model to use the .think/ external brain workflow: scan knowledge folder at session start, read _state.md first, write one step file per turn, update state after every action.

Install

git clone https://github.com/yourusername/piforge
cd piforge
bash install.sh

Then:

Start LM Studio, load your model, start the server on :1234
Edit ~/.pi/agent/models.json — set the model id to match your LM Studio model
Copy project-template/AGENTS.md into any project you work on
Run pi from your project directory

On startup you should see:

incremental-guard active (max 100 lines / 6000 chars per write/edit)
thinking-guard active (max 2000 chars / 60 lines of thinking per turn)
context-monitor active — warn at 65%, urgent at 80% (window: XXXXX tokens)
analysis-guard active (triggers on responses >1000 chars with no file write)
session-manager: session-001 — .think/ ready

Requirements

Pi coding agent — npm install -g @mariozechner/pi-coding-agent
LM Studio with a model loaded and server running on :1234
Node.js ≥ 20

Recommended model: qwen3.6-35b-a3b at Q2_K_XL quantization (Unsloth). Runs on consumer hardware via LM Studio.

We develop and test PiForge at Q2_K_XL — the most aggressive quantization level. The results at 2-bit are already surprisingly good. At higher quantizations, they only get better.

Also tested with qwen3-coder-30b-a3b-instruct. Should work with any OpenAI-compatible local server.

LM Studio settings

System prompt

Add this in LM Studio → Model → System Prompt:

CRITICAL OUTPUT RULE: You MUST NEVER write more than 2000 tokens in a single tool call.

When generating a new file:
- First call: write ONLY the <head> and <style> section
- Second call: use bash to append the <body> HTML: cat >> file.html << 'CHUNK'
- Third call: use bash to append the <script> section
- NEVER put an entire HTML file in one write call

When the file would be large, ALWAYS use multiple bash append calls.

DO NOT OVERTHINK. Short thinking is better than long thinking.

Note: the Pi incremental-guard extension enforces this at the API layer regardless — the system prompt is a soft nudge on top.

Inference parameters

Parameter	Value	Notes
Temperature	`0.58`	Focused but not robotic
Response length limit	`2000` tokens	Backstop — guards are the real enforcement
Top-K sampling	`30`	Narrows token selection
Repeat penalty	`1.1`	Mild reduction of token-level loops
Top-P sampling	`0.95`	Standard nucleus sampling
Min-P sampling	`0.08`	Cuts low-probability tail tokens

The response length limit is not always respected by local models — treat it as a last-resort backstop, not primary enforcement. The guard stack handles the real enforcement.

Why this exists

Cloud models (GPT-4, Claude, Gemini) self-regulate well enough that you don't need enforcement. Local 35B models don't — they ignore prompt rules, spiral in reasoning loops, and produce truncated garbage when they try to write large files.

The existing local LLM tooling (Cline, Roo, etc.) is designed for cloud models and just pointed at local endpoints. PiForge is built specifically for the constraints of local inference:

Hard limits at the API layer, not suggestions in a prompt
External memory via .think/ files — the model writes everything to disk instead of holding it in context
Distillation — build a knowledge base from a codebase once, reference it across sessions without re-reading source files

A scalpel isn't better than a chainsaw because it's sharper — it's better because you're doing surgery, not cutting trees.

PiForge doesn't make a Q2 quantized model smart. It removes every decision the model is bad at, until what remains is a narrow set of small, recoverable tasks it can do reliably. The right tool constrained to the right task performs well regardless of raw capability.

Full setup guide

See PI-SETUP.md for the complete reference — every config option, tuning guide, benchmark results, and troubleshooting section.

File structure

piforge/
├── README.md
├── install.sh                          ← run this first
├── PI-SETUP.md                         ← full reference guide
├── distill-v2-plan.md                  ← distill design document
├── distill-v2-implementation.md        ← distill implementation spec
├── extensions/
│   ├── incremental-guard.ts            ← blocks oversized write/edit calls
│   ├── thinking-guard.ts               ← stops reasoning spirals
│   ├── context-monitor.ts              ← warns before context degrades
│   ├── analysis-guard.ts               ← forces analysis to disk
│   ├── token-counter.ts                ← tracks tokens + Gemini cost comparison
│   ├── first-prompt.ts                 ← injects planning instruction into first prompt only
│   ├── plan-clarify.ts                 ← clarifying questions after _plan.md (off by default)
│   ├── knowledge-injector.ts           ← isolated LLM call selects knowledge files (off by default)
│   ├── state-guard.ts                  ← blocks reads until _state.md read, forces updates
│   ├── piforge-manager.ts              ← /piforge command to toggle extensions
│   ├── distill.ts                      ← /distill + distill_codebase tool
│   ├── distill-query.ts                ← /l1 /l2 /l3 direct level queries + /distill-status
│   ├── explore.ts                      ← /explore + explore_codebase tool (off by default)
│   ├── distill-awareness.ts            ← session-start awareness (off by default)
│   ├── purpose-anchor.ts              ← anti-drift: re-injects purpose after compaction
│   └── session-manager.ts             ← per-tab .think/ isolation via symlinks
├── knowledge/
│   ├── README.md                       ← how to write knowledge files
│   ├── svelte5-gotchas.md              ← Svelte 5 runes failure patterns
│   └── astro-gotchas.md                ← Astro islands + client directives failure patterns
├── skills/
│   └── incremental-codegen/
│       └── SKILL.md                    ← soft-enforcement skill
├── config/
│   ├── models.json                     ← LM Studio provider config template
│   ├── settings.json                   ← Pi global settings
│   └── piforge.json                    ← extension toggles (plan-clarify + knowledge-injector off by default)
└── project-template/
    └── AGENTS.md                       ← drop in any project

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PiForge

What's in the box

8 hard-enforcement extensions (guards)

Codebase distillation — zoom levels for local models

Session isolation (per-tab `.think/`)

Purpose anchor (anti-drift after compaction)

1 soft-enforcement skill

Knowledge folder

Project template

Install

Requirements

LM Studio settings

System prompt

Inference parameters

Why this exists

Full setup guide

File structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
config		config
extensions		extensions
knowledge		knowledge
project-template		project-template
skills/incremental-codegen		skills/incremental-codegen
.gitignore		.gitignore
PI-SETUP.md		PI-SETUP.md
README.md		README.md
install.sh		install.sh

Folders and files

Latest commit

History

Repository files navigation

PiForge

What's in the box

8 hard-enforcement extensions (guards)

Codebase distillation — zoom levels for local models

Session isolation (per-tab .think/)

Purpose anchor (anti-drift after compaction)

1 soft-enforcement skill

Knowledge folder

Project template

Install

Requirements

LM Studio settings

System prompt

Inference parameters

Why this exists

Full setup guide

File structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Session isolation (per-tab `.think/`)

Packages