Expand template script-first safeguards and runbooks#7
Expand template script-first safeguards and runbooks#7alexanderholman merged 1 commit intomasterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Expands the template’s script-first automation safeguards by introducing registry schema validation and dry-run policy checks, enriching the script registry metadata, and adding supporting runbooks/templates/checklists.
Changes:
- Add validation tooling for
scripts/registry.yamlschema and dry-run gating conventions, and wire them intovalidate_agents.sh. - Extend script registry entries with
capabilitiesmetadata and incorporate it into resolver scoring/output. - Add operational documentation (route-and-run guide, script template docs, operator quickstart) and an ML feasibility checklist.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| validate_agents.sh | Adds new SPEC-008 validation steps for registry schema + dry-run gating checks. |
| scripts/validate_registry.py | New validator for scripts/registry.yaml required fields, uniqueness, and top-level policy. |
| scripts/check_dry_run_conventions.py | New policy check intended to ensure executable scripts are gated via dry-run/--execute. |
| scripts/registry.yaml | Adds two new validation scripts and introduces capabilities metadata across existing scripts. |
| scripts/route_and_run.py | Updates scoring to factor in capabilities. |
| scripts/resolve.py | Updates scoring/output to include capabilities. |
| scripts/check_repeatable_script_refs.py | Expands repeatable-task reference scanning to include specs.md. |
| scripts/route-and-run.md | New runbook describing safe usage and troubleshooting. |
| scripts/templates/new_script.py | New script template with --execute dry-run-by-default convention. |
| scripts/templates/README.md | Documents required script conventions (dry-run gating, registry registration, etc.). |
| tasks/ml-feasibility-checklist.md | Adds a checklist for ML feasibility gates and fallbacks. |
| README.md | Links to new tooling/docs/checklist. |
| AGENTS.md | Adds an operator quickstart and commands for the new validations. |
Comments suppressed due to low confidence (1)
scripts/registry.yaml:78
- The
route-and-runregistry entry hardcodes--executein itscommand, which causesscripts/route_and_run.pyto run in execute mode by default (bypassing its dry-run default) and conflicts with the documented safety model (“default mode is dry-run”). Consider removing--executefrom the registrycommandand relying on theexecuteparameter to add it only when explicitly requested (or define separate dry-run vs execute entries).
- id: "route-and-run"
intent:
- "route"
- "execute"
- "nl-command"
- "script-selection"
capabilities:
- "nl-routing"
- "parameterized-execution"
- "dry-run-gating"
description: "Resolve natural language requests to scripts and execute parameterized commands"
command: "python3 scripts/route_and_run.py --query '<request>' --execute"
parameters:
- "query"
- "intent"
- "script-id"
- "arg"
- "execute"
outputs:
| if "execute" not in intent: | ||
| continue | ||
|
|
||
| command = script.get("command", "") | ||
| params = script.get("parameters", []) |
There was a problem hiding this comment.
The dry-run convention check only inspects scripts whose intent contains execute. This misses scripts that can execute but don’t advertise that token in intent (e.g. forge-route has an execute parameter and can trigger side effects). Consider broadening the predicate (e.g., treat scripts as executable if parameters include execute and/or command supports --execute) so the policy can’t be bypassed by omitting the execute intent token.
| if "execute" not in intent: | |
| continue | |
| command = script.get("command", "") | |
| params = script.get("parameters", []) | |
| command = script.get("command", "") | |
| params = script.get("parameters", []) | |
| # Treat scripts as potentially executable if they advertise execute intent, | |
| # take an `execute` parameter, or support a `--execute` flag. | |
| is_potentially_executable = ( | |
| "execute" in intent | |
| or "execute" in params | |
| or "--execute" in command | |
| ) | |
| if not is_potentially_executable: | |
| continue |
| #!/usr/bin/env python3 | ||
|
|
||
| import argparse | ||
| import sys |
There was a problem hiding this comment.
validate_registry.py imports sys but doesn’t use it. Removing unused imports keeps these validation scripts clean and avoids confusing future maintenance.
| import sys |
| data = yaml.safe_load(Path(args.registry).read_text(encoding="utf-8")) | ||
| scripts = data.get("scripts", []) if isinstance(data, dict) else [] | ||
|
|
There was a problem hiding this comment.
check_dry_run_conventions.py reads/parses the registry without any error handling. If scripts/registry.yaml is missing or malformed, this will raise and produce a traceback instead of a clear failure message and exit code. Wrap the load in a try/except (similar to validate_registry.py) and return a non-zero status with a concise error.
Summary