Skip to content

aregmii/agentspecgap

Repository files navigation

AgentSpecGap

AgentSpecGap is a runnable hackathon prototype for mining enforceable specifications from LLM-agent artifacts. It shows how rules scattered across system prompts, tool descriptions, tool schemas, and runtime config can be recovered into an auditable policy layer, resolved against a canonical tool registry, and validated against safe and mutated unsafe tool-call traces.

This is not a SQL regex blocker. SQL is only one fixture. The prototype uses database, object storage, authorization, web retrieval, and public-email-style tools to show a general policy mining and validation flow.

Why prompt-only rules are not enforcement

LLM agents often receive safety and reliability instructions in natural language: "call schema lookup first," "do not write to unapproved buckets," "do not leak secrets," or "return at most 100 rows." If those rules remain only in prompts, they depend on model behavior at the exact moment a tool call is proposed.

AgentSpecGap treats those instructions as candidate specifications. It extracts them with source spans, resolves tool names against a registry, compiles enforceable rules into policy JSON, and evaluates every simulated tool-call event before execution.

What the demo includes

  • SQL analytics agent
    • sql.schema_lookup
    • sql.execute
    • Rules for schema lookup ordering, destructive action denial, table allowlists, row bounds, schema contracts, and business-logic review.
  • Cloud data agent
    • authz.check_permission
    • aws.dynamodb.PutItem
    • scan_for_secrets
    • aws.s3.PutObject
    • web.fetch
    • public_email.send
    • Rules for permission ordering, secret-scan ordering, private ACLs, bucket allowlists, timeouts, schema contracts, and secret flow denial.

All data is local fixture data. No database, AWS, email, or network execution is performed.

Check categories

The prototype uses these category names consistently:

  • Interface validation: input/output/schema contract checks.
  • Authorization check: actor, resource, action, permission, allowlist, and exposure checks.
  • Workflow ordering validation: sequencing checks across tool calls.
  • Data validation: provenance and taint checks from data source to destination.
  • Runtime validation: execution-time bounds such as timeout, row count, retries, and response size.
  • Business-logic validation: semantic task correctness that usually needs review or evals.

How the policy compiler works

The deterministic extractor emits candidate rules with:

  • raw rule text
  • source artifact and source span
  • mentioned tool or action
  • grounding: exact, partial, or inferred
  • check category
  • proposed policy operator
  • confidence

The compiler turns grounded and resolved rules into policy JSON:

{
  "id": "policy-cloud-r1",
  "title": "Permission check before DynamoDB write",
  "checkCategory": "workflow_ordering_validation",
  "operator": "require_before",
  "target": { "toolId": "aws.dynamodb.PutItem" },
  "condition": { "requiredToolId": "authz.check_permission" },
  "source": {
    "artifact": "cloud-system-prompt",
    "span": "Rules bullet 1",
    "grounding": "exact"
  },
  "enforcement": "enforce"
}

Exact grounded and resolved rules become enforce. Partial rules are review_only unless their resolution confidence is high. Inferred business-logic rules are eval_required.

How tool resolution works

Raw LLM output is not trusted as a canonical tool reference. The resolver compares each mention against the registry using:

  • exact canonical ID match
  • alias match
  • context keyword match
  • schema keyword match
  • ambiguity detection

For example, the raw mention PutItem resolves to aws.dynamodb.PutItem through the registry, while other candidates such as aws.s3.PutObject remain visible with lower scores.

Policy operators

The engine uses generic policy operators:

  • schema_contract
  • deny_action
  • require_before
  • resource_in_allowlist
  • arg_within_bound
  • flow_denied
  • timeout_leq
  • review_required

These operators are intentionally not SQL-specific. They evaluate normalized tool-call events with canonical toolId, actor, resource, args, timestamps, and optional input/output labels.

Mutation evaluation

The app includes safe traces and mutated unsafe traces.

SQL mutants:

  • remove schema_lookup
  • replace SELECT with DELETE
  • change table to payments
  • change LIMIT 50 to LIMIT 500

Cloud mutants:

  • remove authz.check_permission
  • remove scan_for_secrets
  • change ACL to public-read
  • change bucket to an unapproved bucket
  • send secret-labeled data to public_email.send

The evaluator computes total traces, safe traces allowed, unsafe mutants blocked, false allows, false blocks, and coverage by check category.

Run locally

npm install
npm run dev

This starts three processes:

  1. Policy watcher — rebuilds policy/policy.json when src/data/artifacts.ts, src/data/toolRegistry.ts, or src/engine/ruleExtractor.ts changes.
  2. Validation server (port 8787) — reloads policy/policy.json and enforces rules on tool-call traces.
  3. UI — reads policy and evaluation from the server; Trace Runner calls POST /api/validate/trace.

Open the Vite URL printed in the terminal. Edit an artifact file (e.g. add a rule bullet in artifacts.ts) and watch the header timestamp update after the watcher rebuilds the bundle.

Local LLM extraction (Ollama + Qwen)

Offline policy builds call Ollama (qwen3-8b-16k:latest by default) to read agent artifacts and emit structured rules. Output is validated (JSON shape, enums, minimum rule counts) and falls back to fixtures if Ollama is down or returns invalid JSON.

# ensure Ollama is running and the model is pulled
ollama list
npm run verify:llm    # health + ping + full extraction smoke test

Environment (see .env.example):

Variable Default Meaning
POLICY_EXTRACTOR auto llm (require Ollama), fixture, or auto (LLM with fixture fallback)
OLLAMA_BASE_URL http://127.0.0.1:11434 Ollama API
OLLAMA_MODEL qwen3-8b-16k:latest Model tag from ollama list
OLLAMA_TIMEOUT_MS 300000 Per-agent extraction timeout (cold start)

policy.json includes an extraction block recording whether each agent used llm or fixture.

Scripts

Script Purpose
npm run build:policy One-shot offline LLM extract + compile to policy/policy.json
npm run verify:llm Validate Ollama connectivity and extraction
npm run policy:watch File-change listener that rebuilds policy.json
npm run server Runtime validation API
npm run dev:ui UI only (requires server + policy.json already running)

Validation API

  • GET /api/policy — latest compiled bundle (policies + extracted rules)
  • GET /api/policy/versiongeneratedAt for reload detection
  • GET /api/evaluate — mutation evaluation against all fixture traces
  • POST /api/validate/trace{ agentId, events[] } → per-event allow/block decisions
  • POST /api/validate/event{ agentId, priorEvents[], event } — single live tool-call check
  • GET /api/llm/health — Ollama reachability + model + last bundle extraction metadata

SQL local integration (SQLite)

Fixture traces in traces.ts are scripted tool-call logs (no real agent). SQL integration adds a real SQLite warehouse at data/sql/analytics.db (customers, orders, products + unapproved payments for negative tests).

Flow per trace: policy check → if allowed, run SQL.

Command Purpose
npm run seed:sql Create/refresh seed database
npm run test:sql Run all SQL fixture traces against policy + DB (CI-friendly)
GET /api/sql/integration-test Same matrix over HTTP
POST /api/sql/run-trace/:id Run one trace (e.g. sql-safe)
POST /api/sql/execute { priorEvents?, event } — middleware-style single step

See TESTING.md for the full testing strategy.

What the demo proves

AgentSpecGap demonstrates the core thesis: prompt-only agent safety is not enforcement. A registry-resolved policy layer can recover scattered rules, classify them by validation category, compile them into auditable policy JSON, and catch unsafe mutated tool-call traces before simulated execution.

The key contribution is normalizing heterogeneous LLM-agent tool calls into registry-resolved events, classifying extracted rules into consistent check categories, compiling enforceable rules into a compact policy IR, and validating the policy through mutation-based trace testing.

Limitations

  • Rule extraction is deterministic fixture-based and simulates LLM extraction.
  • The policy engine implements a compact subset of generic operators.
  • Business-logic validation is marked eval-required rather than enforced in middleware.
  • There is no real orchestrator integration yet.
  • Taint labels are fixture labels rather than propagated through a complete dataflow graph.

Future work

  • Replace fixture extraction with an LLM extractor that emits source spans and uncertainty.
  • Add richer registry metadata for actors, resource hierarchies, and environment-specific permissions.
  • Add taint propagation across tool outputs, intermediate memory, retrieval, and final answers.
  • Export policy JSON and trace results for CI.
  • Add adapters for actual agent orchestrators and MCP-style tool registries.

About

Mining enforceable specifications from LLM-agent artifacts. SPS Hackathon 2026 prototype.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors