Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 6 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,12 +75,12 @@ ShellForge is a **governed agent runtime** — not an agent framework, not an or
It sits between any agent driver and the real world. The agent decides what it wants to do. ShellForge decides whether it's allowed.

```
Agent Driver (Crush, Claude Code, Copilot CLI)
Agent Driver (Goose, Claude Code, Copilot CLI)
→ ShellForge Governance (allow / deny / correct)
→ Your Environment (files, shell, git)
```

**The core insight:** ShellForge's value is governance, not the agent loop. [Crush](https://github.com/charmbracelet/crush) handles agent execution. [Dagu](https://github.com/dagu-org/dagu) handles workflow orchestration. ShellForge wraps them all with [AgentGuard](https://github.com/AgentGuardHQ/agentguard) policy enforcement on every tool call.
**The core insight:** ShellForge's value is governance, not the agent loop. [Goose](https://block.github.io/goose) handles local agent execution. [Dagu](https://github.com/dagu-org/dagu) handles workflow orchestration. ShellForge wraps them all with [AgentGuard](https://github.com/AgentGuardHQ/agentguard) policy enforcement on every tool call.

---

Expand All @@ -90,7 +90,7 @@ Agent Driver (Crush, Claude Code, Copilot CLI)
|-------|---------|--------------|
| **Infer** | [Ollama](https://ollama.com) | Local LLM inference (Metal GPU on Mac) |
| **Optimize** | [RTK](https://github.com/rtk-ai/rtk) | Token compression — 70-90% reduction on shell output |
| **Execute** | [Crush](https://github.com/charmbracelet/crush) | Go-native AI coding agent (TUI + headless) |
| **Execute** | [Goose](https://block.github.io/goose) | AI coding agent with native Ollama support (headless) |
| **Orchestrate** | [Dagu](https://github.com/dagu-org/dagu) | YAML DAG workflows with scheduling and web UI |
| **Govern** | [AgentGuard](https://github.com/AgentGuardHQ/agentguard) | Policy enforcement on every action — allow/deny/correct |
| **Sandbox** | [OpenShell](https://github.com/NVIDIA/OpenShell) | Kernel-level isolation (Docker on macOS) |
Expand All @@ -100,7 +100,6 @@ Agent Driver (Crush, Claude Code, Copilot CLI)
shellforge status
# Ollama running (qwen3:30b loaded)
# RTK v0.4.2
# Crush v1.0.0
# AgentGuard enforce mode (5 rules)
# Dagu connected (web UI at :8080)
# OpenShell Docker sandbox active
Expand All @@ -113,7 +112,7 @@ shellforge status

| Command | Description |
|---------|-------------|
| `shellforge run <driver> "prompt"` | Run a governed agent (claude, copilot, codex, gemini, crush) |
| `shellforge run <driver> "prompt"` | Run a governed agent (goose, claude, copilot, codex, gemini) |
| `shellforge setup` | Install Ollama, create governance config, verify stack |
| `shellforge agent "prompt"` | Run a governed agent — every tool call checked |
| `shellforge qa [dir]` | QA analysis — find test gaps and issues |
Expand All @@ -134,7 +133,6 @@ shellforge run claude "review this code"
shellforge run codex "generate tests"
shellforge run copilot "update docs"
shellforge run gemini "security audit"
shellforge run crush "analyze test gaps"
```

Orchestrate multiple drivers in a single Dagu DAG:
Expand All @@ -156,8 +154,8 @@ See `dags/multi-driver-swarm.yaml` and `dags/workspace-swarm.yaml` for examples.
└────────────────────┬──────────────────────────────┘
│ task
┌────────────────────▼──────────────────────────────┐
Crush (Execution Engine) │
│ Agent loop · Tool calling · TUI + headless
Goose (Execution Engine) │
│ Agent loop · Tool calling · Ollama-native
│ Uses Ollama for inference │
└────────────────────┬──────────────────────────────┘
│ tool call
Expand Down
25 changes: 13 additions & 12 deletions docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,19 +8,19 @@ ShellForge is a single Go binary (~7.5MB) that provides governed local AI agent

```
┌─────────────────────────────────────────────┐
│ Layer 8: OpenShell (Kernel Sandbox) │ NVIDIA Landlock/Seccomp
│ Layer 8: OpenShell (Kernel Sandbox) │ Docker/Colima isolation
├─────────────────────────────────────────────┤
│ Layer 7: DefenseClaw (Supply Chain) │ Cisco AI BoM Scanner
├─────────────────────────────────────────────┤
│ Layer 6: DeepAgents (Multi-Agent) │ LangChain orchestration
│ Layer 6: Dagu (Orchestration) YAML DAG workflows + web UI
├─────────────────────────────────────────────┤
│ Layer 5: OpenCode (AI Coding) Go CLI, native tools
│ Layer 5: Goose / OpenCode (Execution) │ Primary local agent driver
├─────────────────────────────────────────────┤
│ Layer 4: AgentGuard (Governance Kernel) │ Policy enforcement
├─────────────────────────────────────────────┤
│ Layer 3: TurboQuant (Quantization) │ KV cache optimization
│ Layer 3: TurboQuant (Quantization) │ KV cache optimization (optional)
├─────────────────────────────────────────────┤
│ Layer 2: RTK (Token Compression) │ Auto-compress I/O
│ Layer 2: RTK (Token Compression) │ Auto-compress I/O (optional)
├─────────────────────────────────────────────┤
│ Layer 1: Ollama (Local LLM) │ Metal GPU on Mac
└─────────────────────────────────────────────┘
Expand All @@ -47,26 +47,27 @@ internal/

ShellForge uses a pluggable engine system:

1. **OpenCode** (preferred) — subprocess, `--non-interactive` mode, governance-wrapped
2. **DeepAgents** — subprocess, Node.js/Python SDK, governance-wrapped
3. **Native** (fallback) — built-in multi-turn loop with Ollama + tool calling
1. **Goose (Block)** (preferred local driver) — subprocess, native Ollama support, SHELL wrapped via `govern-shell.sh`
2. **OpenCode** (alternative) — subprocess, `--non-interactive` mode, governance-wrapped
3. **DeepAgents** (alternative) — subprocess, Node.js/Python SDK, governance-wrapped
4. **Native** (fallback) — built-in multi-turn loop with Ollama + tool calling

The engine selection is automatic based on what's installed.
The engine selection is automatic based on what's installed. Use `shellforge run goose` for local models, or `shellforge agent` for the native loop.
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This text says engine selection is automatic based on what’s installed, but the CLI requires the user to explicitly choose a driver via shellforge run <driver> (it doesn’t auto-pick Goose/OpenCode/etc.). Please adjust the wording to match the current behavior (e.g., “ShellForge can run different engines; select one with shellforge run <driver>”).

Suggested change
The engine selection is automatic based on what's installed. Use `shellforge run goose` for local models, or `shellforge agent` for the native loop.
ShellForge can run different engines; select one explicitly with `shellforge run <driver>`. For example, use `shellforge run goose` for local models, or `shellforge agent` for the native loop.

Copilot uses AI. Check for mistakes.

## Governance Flow

```
User Request → Engine (OpenCode/DeepAgents/Native)
User Request → Engine (Goose/OpenCode/DeepAgents/Native)
→ Tool Call → Governance Check (agentguard.yaml)
→ ALLOW → Execute Tool → Return Result
→ DENY → Log Violation → Block Execution
→ DENY → Log Violation → Correction Feedback → Retry
```

## Data Flow

1. User invokes `./shellforge qa` (or agent, report, scan)
2. CLI loads `agentguard.yaml` governance policy
3. Detects available engine (OpenCode > DeepAgents > Native)
3. Detects available engine (Goose > OpenCode > DeepAgents > Native)
4. Engine sends prompt to Ollama (via RTK for token compression)
Comment on lines 68 to 71
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data Flow step 3 describes automatic engine detection and a priority order (Goose > OpenCode > DeepAgents > Native). The current CLI flow doesn’t auto-select an engine; it runs the driver the user specifies. Please either describe the explicit selection (shellforge run goose|claude|...) or document where this auto-detection happens if it exists.

Copilot uses AI. Check for mistakes.
5. LLM responds with tool calls
6. Each tool call passes through governance check
Expand Down
16 changes: 16 additions & 0 deletions docs/roadmap.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,22 @@ Foundation types exist (`internal/action/`, `internal/orchestrator/`, `internal/

---

## Bug Backlog (Open Issues)

Bugs identified during v0.6.x development. Fix before v1.0.

| Issue | Package | Severity | Description |
|-------|---------|----------|-------------|
| [#69](https://github.com/AgentGuardHQ/shellforge/issues/69) | `agentguard.yaml` | High | Governance gap: plain `rm` and `rm -r` bypass `no-destructive-rm` policy |
| [#67](https://github.com/AgentGuardHQ/shellforge/issues/67) | `scripts/govern-shell.sh` | Medium | Fragile `sed`-based JSON parsing — denial reason extraction can fail or corrupt |
| [#65](https://github.com/AgentGuardHQ/shellforge/issues/65) | `internal/scheduler` | Medium | `os.WriteFile` error silently ignored — audit log loss |
| [#63](https://github.com/AgentGuardHQ/shellforge/issues/63) | `internal/normalizer` | Medium | `classifyShellRisk` prefix match too broad — `catalog_tool` classified as read-only |
| [#62](https://github.com/AgentGuardHQ/shellforge/issues/62) | `cmd/shellforge` | Medium | `cmdEvaluate` ignores JSON unmarshal error — malformed input defaults to allow |
| [#61](https://github.com/AgentGuardHQ/shellforge/issues/61) | `internal/intent` | Low | Dead code in `flattenParams` — first assignment immediately overwritten |
| [#60](https://github.com/AgentGuardHQ/shellforge/issues/60) | all packages | High | Zero test coverage — critical for a governance runtime |

---

## Stack (as of v0.6.1)

| Component | Role | Status |
Expand Down
3 changes: 3 additions & 0 deletions internal/governance/engine.go
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,8 @@ Policies: cfg.Policies,
}, nil
}

// Evaluate checks a tool call against all policies and returns an allow/deny Decision.
// In enforce mode, deny policies block execution. In monitor mode, they log only.
func (e *Engine) Evaluate(tool string, params map[string]string) Decision {
for _, p := range e.Policies {
if e.matches(p, tool, params) {
Expand Down Expand Up @@ -98,6 +100,7 @@ Mode: e.Mode,
}
}

// GetTimeout returns the first policy-level timeout in seconds, or 300 if none is set.
func (e *Engine) GetTimeout() int {
for _, p := range e.Policies {
if p.Timeout > 0 {
Expand Down
9 changes: 9 additions & 0 deletions internal/logger/logger.go
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,8 @@ entries []Entry
logFile *os.File
)

// Init opens a JSONL log file under outputDir named "<agent>-<timestamp>.jsonl".
// Must be called before any log functions; call Close when done.
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Init GoDoc says it “Must be called before any log functions,” but the logger functions still work without Init (they print to stdout and store in-memory entries; they just won’t write a JSONL file because logFile is nil). Please reword this to reflect the actual behavior (e.g., “Call Init to enable JSONL file logging; otherwise logs are stdout/in-memory only”).

Suggested change
// Must be called before any log functions; call Close when done.
// Call Init before logging to enable JSONL file output; otherwise logs are stdout/in-memory only. Call Close when done.

Copilot uses AI. Check for mistakes.
func Init(outputDir, agent string) error {
if err := os.MkdirAll(outputDir, 0o755); err != nil {
return err
Expand All @@ -59,6 +61,7 @@ logFile = f
return nil
}

// Close flushes and closes the current log file.
func Close() {
if logFile != nil {
logFile.Close()
Expand All @@ -74,6 +77,7 @@ logFile.WriteString("\n")
}
}

// Governance logs a governance evaluation result to stdout and the JSONL log.
func Governance(agent, tool string, params map[string]string, allowed bool, policyName, reason string) {
status := "allow"
Comment on lines +80 to 82
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Governance GoDoc says it logs to stdout and the JSONL log, but JSONL output only happens if Init has been called (logFile != nil). Consider clarifying that file logging is conditional so callers don’t assume a file is always written.

Copilot uses AI. Check for mistakes.
if !allowed {
Expand All @@ -99,6 +103,7 @@ Decision: &DecisionLog{Allowed: allowed, PolicyName: policyName, Reason: reason
})
}

// ToolResult logs the outcome of a tool execution to stdout and the JSONL log.
func ToolResult(agent, tool string, success bool, output string) {
icon := "✓"
if !success {
Expand All @@ -123,6 +128,7 @@ Message: truncate(output, 200),
})
}

// Agent logs a free-form info message from the named agent.
func Agent(agent, message string) {
fmt.Printf("[%s] %s\n", agent, message)
record(Entry{
Expand All @@ -133,6 +139,7 @@ Message: message,
})
}

// ModelCall logs token usage and latency for an Ollama inference call.
func ModelCall(agent string, promptTokens, responseTokens int, durationMs int64) {
record(Entry{
Timestamp: time.Now().UTC().Format(time.RFC3339),
Expand All @@ -143,6 +150,7 @@ Duration: durationMs,
})
}

// Error logs an error message to stderr and the JSONL log.
func Error(agent, message string) {
fmt.Fprintf(os.Stderr, "[%s] ERROR: %s\n", agent, message)
record(Entry{
Expand All @@ -153,6 +161,7 @@ Message: message,
})
}

// GetEntries returns all log entries recorded in this session (in-memory only).
func GetEntries() []Entry { return entries }

func summarize(params map[string]string) string {
Expand Down
3 changes: 3 additions & 0 deletions internal/tools/tools.go
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,8 @@ func ExecuteDirect(tool string, params map[string]string, timeoutSec int) Result
}

// Execute runs a tool call through governance, then executes if allowed.
// Execute evaluates the tool call against governance policy and, if allowed, runs it.
// This is the fully governed path; use ExecuteDirect when governance is already checked.
func Execute(engine *governance.Engine, agent, tool string, params map[string]string) Result {
decision := engine.Evaluate(tool, params)
Comment on lines 82 to 86
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc comment for Execute is now redundant: the first line already provides a valid GoDoc summary, and the following two lines repeat the same information. Consider collapsing this into a single concise GoDoc block (keeping the extra detail about ExecuteDirect without duplicating the summary).

Suggested change
// Execute runs a tool call through governance, then executes if allowed.
// Execute evaluates the tool call against governance policy and, if allowed, runs it.
// This is the fully governed path; use ExecuteDirect when governance is already checked.
func Execute(engine *governance.Engine, agent, tool string, params map[string]string) Result {
decision := engine.Evaluate(tool, params)
// Execute runs a tool call through governance and, if allowed, executes it.
// This is the fully governed path; use ExecuteDirect when governance is already checked.
func Execute(engine *governance.Engine, agent, tool string, params map[string]string) Result {
decision := engine.Evaluate(tool, params)
decision := engine.Evaluate(tool, params)

Copilot uses AI. Check for mistakes.
logger.Governance(agent, tool, params, decision.Allowed, decision.PolicyName, decision.Reason)
Expand Down Expand Up @@ -224,6 +226,7 @@ return Result{Success: true, Output: output}
}

// FormatForPrompt returns tool descriptions for the system prompt.
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FormatForPrompt has two consecutive GoDoc summary lines that both start with the function name and say essentially the same thing. Please remove the duplication and keep a single, precise summary (with any extra detail in subsequent sentences that don’t repeat the opener).

Suggested change
// FormatForPrompt returns tool descriptions for the system prompt.

Copilot uses AI. Check for mistakes.
// FormatForPrompt returns Markdown-formatted tool definitions for inclusion in a system prompt.
func FormatForPrompt() string {
var sb strings.Builder
for _, t := range Definitions {
Expand Down
Loading