feat: self-improving agentic workflow — agent plugin by intel352 · Pull Request #6 · GoCodeAlone/workflow-plugin-agent

intel352 · 2026-04-13T09:20:26Z

Summary

Add blackboard for structured artifact exchange between agents in multi-phase review pipelines (SQLite-backed, SSE broadcast)
Add command safety engine with shell AST analysis (mvdan.cc/sh) — detects destructive commands, pipe-to-shell, script execution, encoded commands, here-doc injection, process substitution
Add hierarchical guardrails module (agent.guardrails) with scoped rules (agent > team > model > provider), glob tool access patterns, immutable config sections, command policy
Wire guardrails into executor tool execution loop
Add self-improvement steps: step.self_improve_validate, step.self_improve_diff, step.self_improve_deploy (hot_reload / git_pr / canary)
Add step.lsp_diagnose for in-process LSP diagnostics
Add multi-agent review pipeline pattern with InputFromBlackboard injection (system_prompt_append / user_message modes)

Design

See: workflow docs/plans/2026-04-13-self-improving-agentic-workflow-design.md

Dependencies

Requires workflow v0.3.57+ (feat: self-improving agentic workflow — core MCP library workflow#402) for in-process MCP library

🤖 Generated with Claude Code

Uses mvdan.cc/sh to parse commands and detect: destructive operations, pipe-to-shell, script execution, encoded commands, chained dangerous. Supports allowlist and blocklist modes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ith step types Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- parseArtifactTime: try sub-second format first, fall back to seconds-only - BlackboardPostStep: generate UUID before Post() so art.ID is non-empty in step output Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Scoped rules (agent > team > model > provider), glob-based tool access, immutable config sections, command safety integration. GuardrailsModule implements executor.TrustEvaluator for direct wiring into the executor tool execution path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…on detection Closes spec gaps identified in review: - Here-doc injection: detect shell commands with heredoc redirects (bash << EOF) - Process substitution: detect shell commands using <(cmd) as input (bash <(curl ...)) - Variable expansion: detect ANSI-C $'\x72\x6d' hex/octal obfuscation - TestAnalyzer_BlocklistMode, TestAnalyzer_HereDocInjection, TestAnalyzer_ProcessSubstitution, TestAnalyzer_VariableExpansionTricks added Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

findGuardrailsModule() searches service registry for any GuardrailsModule. Before each tool call: Evaluate() checks tool access, EvaluateCommand() checks command safety for shell tools. Blocked calls return error result without reaching the tool registry. Tests verify: registry lookup, disallowed tool blocking, dangerous command blocking via the same code path used in the agent loop. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…deploy strategies - step.self_improve_validate: YAML parse + immutability constraint check + graceful MCP skip - step.self_improve_diff: text diff with blackboard posting support - step.self_improve_deploy: hot_reload / git_pr / canary strategies with mandatory pre-deploy validation gate - step.lsp_diagnose: LSPProvider interface wrapper as pipeline step - review_pipeline.go: ReviewPipelineConfig, InputFromBlackboard, InjectBlackboardInput() - step.agent_execute: input_from_blackboard config injects blackboard artifacts into agent context - step.agent_execute: guardrails executor integration (findGuardrailsModule already present) - plugin.go: registered all 4 new step factories Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ails into root executor step - Remove BlockedPatterns check from checkDestructive() — patterns already checked pre-parse, so running them again per-CallExpr caused duplicate risk entries (e.g. 2 destructive risks for 'rm -rf /') - Change isAllowed() from HasPrefix to exact match so 'golang-migrate' is not permitted when only 'go' is in the allowlist - Add 6 tests covering: NoDuplicateRisks, AllowlistNoHasPrefixBypass, HereDocInjection, ProcessSubstitution, VariableExpansionTricks, BlocklistMode - Wire TrustEvaluator into root step_agent_execute.go via interface-based service registry lookup (avoids circular import with orchestrator package) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…mand analyzer mkfs appeared in both DefaultPolicy BlockedPatterns (pre-parse raw check) and alwaysDestructive map (AST walk per-CallExpr), producing duplicate Risk entries. Fix: remove mkfs from alwaysDestructive since BlockedPatterns already covers it. fdisk and wipefs remain in alwaysDestructive (not in BlockedPatterns). Issue #2 (isAllowed HasPrefix bypass) was already using exact match — no change needed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…p misleading comment - Replace string(action) == "deny" with action == executor.ActionDeny for consistency with the rest of the executor package - Remove unused guardrailsAction struct and String() method - Remove misleading "Also handle prefix/*" comment that implied unfinished work; the HasSuffix branch already handles this case Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…test assertion - postToBlackboard now uses countDiffLines() for lines_added instead of len(diff), which was counting all diff lines (added + removed) - Add content assertion to TestSelfImproveDiffStep_PostToBlackboard verifying lines_added is 1, not total diff line count - Replace empty if-body in TestSelfImproveValidateStep_MCPUnavailable with real assertion checking for lsp-unavailable warning - InjectAs rename (InjectKey→InjectAs) + system_prompt_append/user_message modes - RequireApproval field added to ReviewPipelineConfig - LSP diagnostics integrated into SelfImproveValidateStep Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

Adds a “self-improving” multi-agent orchestration workflow with shared blackboard artifacts and enforceable guardrails, plus new validation/diff/deploy/LSP diagnostic steps to support iterative config evolution.

Changes:

Introduces a SQLite-backed Blackboard (with optional SSE broadcast) and steps to post/read artifacts, plus blackboard-to-agent context injection.
Adds Guardrails + shell command static analyzer (mvdan.cc/sh) and wires guardrails into agent tool execution and executor trust evaluation.
Adds self-improvement steps: validate (YAML + immutability + optional LSP/MCP), diff generation, and deployment strategies (hot_reload / git_pr / canary), plus an LSP diagnose step.

Reviewed changes

Copilot reviewed 24 out of 26 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
step_agent_execute.go	Passes a `TrustEngine` (first `executor.TrustEvaluator` in registry) into the executor config.
safety/command_analyzer.go	Implements shell command safety analysis (AST-based + pattern checks) with configurable policy.
safety/command_analyzer_test.go	Adds unit tests covering destructive commands, pipe-to-shell, heredocs, proc-subst, allowlist mode, etc.
orchestrator/step_self_improve_validate.go	Adds validation step (YAML parse, immutability checks, optional LSP + placeholder MCP validation).
orchestrator/step_self_improve_validate_test.go	Tests validation behavior for valid/invalid YAML, immutability violations, missing inputs, warnings.
orchestrator/step_self_improve_diff.go	Adds diff step to compute a simple line-based diff and optionally post to Blackboard.
orchestrator/step_self_improve_diff_test.go	Tests diff generation, force mode, missing inputs, and blackboard posting.
orchestrator/step_self_improve_deploy.go	Adds deploy step with hot_reload / git_pr / canary strategies and a pre-deploy validation gate.
orchestrator/step_self_improve_deploy_test.go	Tests deploy step behavior including validation gate and failure handling.
orchestrator/step_lsp_diagnose.go	Adds LSP diagnose step + `LSPProvider` lookup helper.
orchestrator/step_lsp_diagnose_test.go	Tests LSP diagnose step behavior with/without provider and content.
orchestrator/step_blackboard.go	Adds steps to post/read Blackboard artifacts (including latest_only mode).
orchestrator/step_blackboard_test.go	Tests blackboard post/read steps and error paths when blackboard missing.
orchestrator/step_agent_execute.go	Injects blackboard context into prompts/messages; enforces guardrails before tool execution.
orchestrator/review_pipeline.go	Adds `InputFromBlackboard` config + helper to read artifacts and inject into agent context/prompt.
orchestrator/review_pipeline_test.go	Tests blackboard input injection modes and parsing config helper.
orchestrator/ratchetplugin_test.go	Updates plugin factory/hook expectations to include guardrails + new steps + blackboard hook.
orchestrator/plugin.go	Registers new module/steps and wires a `ratchet.blackboard` hook to init Blackboard from DB + SSE.
orchestrator/guardrails.go	Adds `agent.guardrails` module implementing `executor.TrustEvaluator` + tool policy + command safety.
orchestrator/guardrails_test.go	Tests tool allow/block behavior, scope matching, immutability matching, and trust-evaluator methods.
orchestrator/blackboard.go	Implements Blackboard persistence, subscription fanout, and optional SSE broadcasting.
orchestrator/blackboard_test.go	Tests Blackboard CRUD + latest retrieval + subscriptions.
executor/guardrails_integration_test.go	Integration-style tests asserting GuardrailsModule works as `executor.TrustEvaluator`.
go.mod	Adds `mvdan.cc/sh/v3` dependency and bumps `golang.org/x/sys`.
go.sum	Records new module sums (mvdan sh, qt, x/sys/x/term updates).
.gitignore	Ignores `.worktrees/`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-13T09:25:54Z

+	for _, sec := range guardrails.immutableSections {
+		proposedVal := extractNestedPath(proposed, sec.Path)
+		currentVal := extractNestedPath(current, sec.Path)
+		if fmt.Sprintf("%v", proposedVal) != fmt.Sprintf("%v", currentVal) {
+			violations = append(violations, ImmutabilityViolation{


Immutability diffing compares values via fmt.Sprintf("%v"), which is unreliable for maps/slices (map iteration order is random) and can produce false positives/negatives. Consider using reflect.DeepEqual on normalized YAML-decoded structures, or marshal both values to canonical JSON/YAML before comparison.

Addressed in 04e8de0

Copilot · 2026-04-13T09:25:54Z

+	}
+	if v, ok := d["require_diff_review"].(bool); ok {
+		defaults.RequireDiffReview = v
+	}
+	if v, ok := d["max_iterations_per_cycle"].(int); ok {
+		defaults.MaxIterationsPerCycle = v
+	}


Config parsing here only accepts max_iterations_per_cycle when it is exactly an int. Workflow/YAML configs often decode numbers as int64/float64, so this setting may silently not apply. Consider using a helper that handles numeric types (e.g., extractInt-style conversion) or a type switch covering int/int64/float64.

Addressed in 04e8de0

Copilot · 2026-04-13T09:25:54Z

+	}
+	if v, ok := cfg["enable_static_analysis"].(bool); ok {
+		p.EnableStaticAnalysis = v
+	}
+	if v, ok := cfg["max_command_length"].(int); ok {
+		p.MaxCommandLength = v
+	}


parseCommandPolicy only reads max_command_length when it is exactly an int; if the config loader yields float64/int64, the limit will be ignored. Use numeric coercion (type switch or shared extractInt helper) so the policy behaves as configured.

Addressed in 04e8de0

Copilot · 2026-04-13T09:25:55Z

+// --- executor.TrustEvaluator implementation ---
+
+// Evaluate implements executor.TrustEvaluator.
+// Checks whether a tool call is allowed by the default guardrails rules.
+func (g *GuardrailsModule) Evaluate(_ context.Context, toolName string, _ map[string]any) executor.Action {
+	ok, _ := g.CheckTool(toolName)
+	if ok {
+		return executor.ActionAllow
+	}
+	return executor.ActionDeny
+}


Guardrails scoped rules (agent/team/model/provider) are defined, but Evaluate() always calls CheckTool() with an empty ScopeContext, so scopes never affect actual enforcement when used as executor.TrustEvaluator. Either derive ScopeContext from the provided context (agent/model/provider identifiers) and call CheckToolScoped, or clarify/remove the scoped rule feature to avoid a false sense of enforcement.

Addressed in 04e8de0

Copilot · 2026-04-13T09:25:55Z

+// matchPattern matches a tool name against a glob-style pattern.
+// Supports * wildcard (single segment) and ** (multi-segment via path separator :).
+func matchPattern(pattern, value string) bool {
+	if pattern == "*" || pattern == "**" {
+		return true
+	}
+	if pattern == value {
+		return true
+	}
+	// Suffix wildcard: "mcp:wfctl:validate_*" matches "mcp:wfctl:validate_config"
+	if strings.HasSuffix(pattern, "*") {
+		prefix := strings.TrimSuffix(pattern, "*")
+		return strings.HasPrefix(value, prefix)
+	}
+	return false
+}


matchPattern's comment claims support for "**" and segment-aware globbing, but the implementation only supports exact match and a trailing "*" prefix match. Either implement the documented glob semantics (including mid-string wildcards / **), or update the comment to match actual behavior to prevent misconfigured allow/block patterns.

Addressed in 04e8de0

Copilot · 2026-04-13T09:25:55Z

+	artifactType := s.artifactType
+	if artifactType == "" {
+		artifactType = extractString(pc.Current, "artifact_type", "")
+	}
+
+	if s.latestOnly {
+		art, err := bb.ReadLatest(ctx, phase)
+		if err != nil {
+			return nil, fmt.Errorf("blackboard_read step %q: %w", s.name, err)
+		}


When latest_only is true, artifact_type is ignored (ReadLatest only filters by phase). This makes configs that specify both artifact_type and latest_only behave unexpectedly and can return the wrong artifact if multiple types exist in a phase. Consider adding a ReadLatestByType(phase,type) query, or document that artifact_type is ignored in latest_only mode.

Addressed in 04e8de0

Copilot · 2026-04-13T09:25:55Z

+func InjectBlackboardInput(ctx context.Context, app modular.Application, cfg InputFromBlackboard, pc *module.PipelineContext) (string, error) {
+	if cfg.Phase == "" {
+		return "", nil
+	}
+
+	var bb *Blackboard
+	if svc, ok := app.SvcRegistry()["ratchet-blackboard"]; ok {
+		bb, _ = svc.(*Blackboard)
+	}


InjectBlackboardInput assumes app is non-nil and calls app.SvcRegistry() unconditionally; if a step is constructed/executed without an application context, this will panic. Add an early if app == nil { return "", nil } guard (similar to other steps) to keep the helper safe to call.

Addressed in 04e8de0

Copilot · 2026-04-13T09:25:55Z

+		art, err := bb.ReadLatest(ctx, cfg.Phase)
+		if err != nil {
+			return "", fmt.Errorf("input_from_blackboard: read latest phase %q: %w", cfg.Phase, err)
+		}
+		if art == nil {
+			return "", nil
+		}
+		if promptMode {
+			return fmt.Sprintf("[Blackboard artifact — phase: %s, type: %s]\n%v", art.Phase, art.Type, art.Content), nil
+		}
+		pc.Current["blackboard_input"] = artifactToMap(*art)


In LatestOnly mode this always reads the latest artifact for the phase, ignoring cfg.ArtifactType. If multiple artifact types are posted to the same phase, this can inject unrelated data. Consider filtering by both phase+type for LatestOnly, or make ArtifactType required when LatestOnly is true and enforce it in the query.

Suggested change

art, err := bb.ReadLatest(ctx, cfg.Phase)

if err != nil {

return "", fmt.Errorf("input_from_blackboard: read latest phase %q: %w", cfg.Phase, err)

}

if art == nil {

return "", nil

}

if promptMode {

return fmt.Sprintf("[Blackboard artifact — phase: %s, type: %s]\n%v", art.Phase, art.Type, art.Content), nil

}

pc.Current["blackboard_input"] = artifactToMap(*art)

if cfg.ArtifactType == "" {

return "", fmt.Errorf("input_from_blackboard: artifact_type is required when latest_only is true for phase %q", cfg.Phase)

}

artifacts, err := bb.Read(ctx, cfg.Phase, cfg.ArtifactType)

if err != nil {

return "", fmt.Errorf("input_from_blackboard: read latest phase %q and type %q: %w", cfg.Phase, cfg.ArtifactType, err)

}

if len(artifacts) == 0 {

return "", nil

}

art := artifacts[len(artifacts)-1]

if promptMode {

return fmt.Sprintf("[Blackboard artifact — phase: %s, type: %s]\n%v", art.Phase, art.Type, art.Content), nil

}

pc.Current["blackboard_input"] = artifactToMap(art)

Addressed in 04e8de0

Copilot · 2026-04-13T09:25:56Z

+			bb := NewBlackboard(db, hub)
+			if err := bb.Migrate(context.Background()); err != nil {
+				app.Logger().Warn("blackboard migrate failed", "error", err)
+			}
+
+			_ = app.RegisterService("ratchet-blackboard", bb)
+			return nil


If bb.Migrate fails, the hook still registers the Blackboard service. That can lead to runtime failures later (Post/Read queries against a missing table) while startup continues. Consider returning the migration error (or skipping registration) so the system fails fast or at least doesn't register a non-functional blackboard instance.

Addressed in 04e8de0

Copilot · 2026-04-13T09:25:56Z

+// runCommand executes a shell command and returns stdout or an error.
+func runCommand(name string, args ...string) (string, error) {
+	cmd := exec.Command(name, args...)
+	out, err := cmd.Output()
+	return string(out), err
+}


runCommand uses exec.Command(...).Output() with no context/timeout and discards stderr. For git/gh/docker this can hang indefinitely and error messages will be hard to diagnose. Consider using exec.CommandContext with a bounded timeout (or the step ctx) and CombinedOutput so failures include stderr.

Addressed in 04e8de0

- Use reflect.DeepEqual for immutability diffing (not fmt.Sprintf) - Add int64/float64 type switches for max_iterations_per_cycle and max_command_length YAML decode - Clarify Evaluate() comment: uses default rules only, not scope-aware - Update matchPattern comment to describe exact + trailing-* prefix match behavior - Document that artifact_type is ignored when latest_only=true in BlackboardReadStep - Add app == nil guard in InjectBlackboardInput before SvcRegistry() call - When latest_only=true and artifact_type is set, filter by both phase+type - Skip blackboard registration if Migrate fails (log error, return early) - Replace runCommand with exec.CommandContext (2m timeout) + CombinedOutput to capture stderr Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

intel352 and others added 12 commits April 13, 2026 02:42

chore: add .worktrees to gitignore

164f1f5

feat(orchestrator): add blackboard for structured artifact exchange w…

132aaf8

…ith step types Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings April 13, 2026 09:20

intel352 mentioned this pull request Apr 13, 2026

feat: self-improving agentic workflow — scenarios 85-87 GoCodeAlone/workflow-scenarios#5

Merged

Copilot started reviewing on behalf of intel352 April 13, 2026 09:21 View session

Copilot AI reviewed Apr 13, 2026

View reviewed changes

intel352 enabled auto-merge (squash) April 13, 2026 13:33

intel352 disabled auto-merge April 13, 2026 17:04

intel352 merged commit 0ea591d into master Apr 13, 2026
4 checks passed

intel352 deleted the self-improve/test-agent branch April 13, 2026 17:04

Conversation

intel352 commented Apr 13, 2026

Summary

Design

Dependencies

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants