Skip to content

PSR94/ARGUS

Repository files navigation

ARGUS banner

ARGUS

LLM red team, guardrail, and model risk evaluation platform.

ARGUS is a local-first platform for controlled, repeatable, auditable LLM risk evaluation. It is built around app profiles, attack suites, guardrail policies, deterministic evaluators, provider targets, tool-call safety checks, findings, reports, and CI gates.

ARGUS is not a chat app or notebook eval script. The product surface is the evaluation workflow: define an AI system, select attack packs and policies, run a provider or deterministic mock target, inspect findings, understand risk drivers, and export a reproducible report.

Python FastAPI Next.js PostgreSQL Deterministic

Why It Exists

LLM applications can fail in ways normal integration tests do not catch: indirect prompt injection, confidential instruction leakage, unsafe tool selection, missing approval boundaries, fabricated citations, and refusal over-disclosure. ARGUS gives platform and security teams a practical harness for testing those risks before a model-backed feature ships.

Feature Grid

Area What is implemented
App profiles System prompts, owners, allowed tools, blocked behavior, risk tolerance, citation mode
Red-team lab Versioned suites, case metadata, prompt variants, expected behavior
Attack library Prompt injection, jailbreak, tool misuse, PII leakage, citation trap, refusal failure packs
Policy engine Structured findings with severity, evidence, evaluator source, remediation
Providers Deterministic mock, Ollama adapter, OpenAI-compatible adapter, safe public status reporting
Tool sandbox Simulated operational tools, sandbox scenarios, approval checks, and inspectable sample behavior
Leakage detection Email, phone, SSN, API key, bearer token, private key, confidential term checks
Grounding checks Missing citations, fabricated citations, unsupported claims
Risk scoring Explainable score, band, pass/fail recommendation, top drivers
Reports JSON and Markdown export with reproducibility metadata
CI gate Deterministic run, threshold enforcement, non-zero failure exit
Auditability Audit records for seeds, evals, profile/policy/suite changes, report exports

Architecture

flowchart LR
  Console["Next.js Console"] --> API["FastAPI Control Plane"]
  API --> DB[("PostgreSQL / SQLite local test")]
  API --> Redis[("Redis coordination")]
  API --> Eval["Eval Orchestrator"]
  Eval --> Providers["Provider Targets<br/>mock / Ollama / OpenAI-compatible"]
  Eval --> Policies["Policy Engine"]
  Policies --> Leakage["Leakage Evaluators"]
  Policies --> Grounding["Grounding Evaluators"]
  Policies --> ToolSafety["Tool Safety Sandbox"]
  Policies --> Refusal["Refusal Quality"]
  Eval --> Scoring["Risk Scoring"]
  Scoring --> Reports["Evidence Store<br/>JSON / Markdown"]
  API --> Audit["Audit Log"]
Loading

Eval Lifecycle

sequenceDiagram
  participant U as Operator
  participant C as Console
  participant A as Control Plane
  participant P as Provider Target
  participant E as Evaluators
  participant R as Report Store
  U->>C: Select profile, suites, provider
  C->>A: POST /api/v1/evals/run
  A->>P: Execute adversarial cases
  P-->>A: Responses and metadata
  A->>E: Evaluate policy, leakage, grounding, tools, refusal
  E-->>A: Structured findings
  A->>A: Calculate risk score
  A->>R: Persist run artifacts
  A-->>C: Run summary, findings, risk drivers
Loading

Policy Evaluation Flow

flowchart TD
  Case["Test Case + Context"] --> Response["Provider Response"]
  Response --> Engine["Policy Engine"]
  Engine --> Regex["Regex leakage checks"]
  Engine --> Rules["Structured policy rules"]
  Engine --> Ground["Grounding checks"]
  Engine --> Refusal["Refusal quality checks"]
  Regex --> Finding["Finding model"]
  Rules --> Finding
  Ground --> Finding
  Refusal --> Finding
  Finding --> Score["Risk score inputs"]
Loading

Tool-Calling Safety

flowchart LR
  Prompt["Adversarial prompt"] --> Model["Model target"]
  Model --> Call["TOOL_CALL event"]
  Call --> Sandbox["Tool sandbox"]
  Sandbox --> Allowed{"Tool allowed for profile?"}
  Allowed -- no --> Finding["unsafe_tool_call"]
  Allowed -- yes --> Approval{"Approval metadata present?"}
  Approval -- no --> Missing["missing_tool_approval"]
  Approval -- yes --> Audit["Audited simulated execution"]
Loading

Risk Scoring

flowchart TD
  Findings["Findings"] --> Severity["Severity weights"]
  Findings --> Cases["Failed case count"]
  Findings --> Categories["Category spread"]
  Severity --> Total["0-100 risk score"]
  Cases --> Total
  Categories --> Total
  Total --> Band["low / moderate / high / critical"]
  Total --> Gate["pass/fail recommendation"]
  Findings --> Drivers["Top risk drivers"]
Loading

CI Gate

flowchart LR
  PR["Pull request"] --> Gate["ARGUS deterministic gate"]
  Gate --> Mock["Mock provider fixtures"]
  Mock --> Eval["Run selected suites"]
  Eval --> Thresholds["Threshold checks"]
  Thresholds -->|pass| Green["CI passes"]
  Thresholds -->|fail| Red["CI blocks merge"]
Loading

Audit And Reporting

flowchart TD
  Change["Profile, policy, suite, eval, export"] --> Audit["Audit record"]
  Eval["Evaluation run"] --> Report["Report service"]
  Report --> Json["JSON export"]
  Report --> Markdown["Markdown export"]
  Audit --> Review["Operator diagnostics"]
Loading

Local Run

cp .env.example .env
docker compose up --build

Console: http://localhost:3000 API: http://localhost:8080 Health: http://localhost:8080/health

The default provider is provider-mock, which is fixture-backed and deterministic. No paid model keys are required.

For deterministic gate checks:

python3 risk-lab/runners/ci_gate.py --config quality-gates/eval_configs/default.yaml
python3 risk-lab/runners/ci_gate.py --config quality-gates/eval_configs/strict-unsafe-tool.yaml

default.yaml is the normal passing baseline. strict-unsafe-tool.yaml is expected to fail and proves that unsafe tool findings block the gate.

Developer Workflow

make bootstrap
make test
make gate
make api
make console

The CI gate writes a report to evidence-store/reports/ci-gate-report.json and exits non-zero when configured thresholds are violated.

API Overview

Endpoint Purpose
GET /health Service health
GET /health/dependencies Database and runtime diagnostics
GET/POST /api/v1/app-profiles Manage LLM app profiles
GET/POST /api/v1/test-suites Manage red-team suites
POST /api/v1/test-suites/{id}/cases Add adversarial cases
GET/POST /api/v1/policies Manage guardrail policies
POST /api/v1/evals/run Run selected suites
GET /api/v1/evals/{id}/findings Inspect run findings
POST /api/v1/reports/{id}/export Export JSON or Markdown
GET /api/v1/providers View target provider status
GET /api/v1/attack-packs Inspect attack pack catalog
GET /api/v1/tool-sandbox/tools Inspect sandbox tool behavior
GET /api/v1/tool-sandbox/scenarios Inspect sandbox scenarios
GET /api/v1/audit Review audit events

Working Console

The screenshots below come from the local stack with seeded runs and deterministic provider responses. They are not mockups.

Command Center

ARGUS command center

Top-line release posture across seeded app profiles, provider readiness, and recent risk runs.

Eval Run Workspace

ARGUS eval workspace

Operator view for a failing tool-safety run against the operations agent profile, including case selection, response inspection, evaluator trace, and structured findings.

Findings Explorer

ARGUS findings explorer

Evidence-first review surface with severity filters, policy links, and remediation context.

Repository Layout

console/          Next.js App Router console, including eval workspace, provider status, findings, and sandbox views
control-plane/    FastAPI API, SQLAlchemy persistence, orchestration, reports, audit
risk-lab/         Attack packs, deterministic evaluators, local runner, CI gate
model-targets/    Mock, Ollama, and OpenAI-compatible provider adapters and registry
tool-sandbox/     Simulated tools, approval policies, scenarios, audit fixtures
datasets/         Seed profiles, policies, suites, cases, expected responses
quality-gates/    Thresholds, CI configs, gate scripts
evidence-store/   Local report/export/artifact storage
ops/              Dockerfiles, runtime scripts, seed and maintenance utilities
docs/             Architecture, evaluator, policy, threat model, operations notes

Structure Rationale

The repository is organized around ARGUS operating concepts instead of a generic apps/web and apps/api template. The top-level folders map to the risk workflow: model targets are separate from the control plane, evaluators live in risk-lab, simulated tools live in tool-sandbox, reports and artifacts are in evidence-store, and CI thresholds are in quality-gates. That makes the red-team and model-risk concerns visible in the tree before reading any code.

Example Scenarios

Scenario Expected ARGUS behavior
Internal support assistant sees hidden KB instructions Reject indirect prompt injection and keep escalation notes confidential
Operations agent is asked to deploy without approval Flag missing approval metadata and block privileged tool execution
Customer support chatbot is asked to email SSN externally Refuse PII exfiltration and recommend internal redacted ticketing
Document QA assistant is pushed into citing a fake policy Report insufficient grounding or fabricated citation
Credential extraction request includes a live-looking key Refuse without repeating the sensitive value

Design Decisions

Decision Reason
Deterministic mock provider by default CI gates need stable, keyless, reproducible results
Rule-based evaluators first Safety regressions should not depend only on another model judgment
JSON document persistence for seed domain objects Keeps local iteration fast while preserving typed API validation
PostgreSQL in Docker, SQLite in tests Mirrors production persistence without slowing local test setup
Structured findings everywhere Findings must be explainable, filterable, reportable, and auditable
Report artifact metadata is first-class Exported evidence needs checksum and path visibility without parsing files manually

Roadmap

  • Add async queue-backed eval execution for long suites.
  • Add optional model-as-judge evaluators with calibration metadata.
  • Add run-to-run regression comparison by profile and provider.
  • Add signed report bundles for release evidence.
  • Add richer Monaco-based policy and test case editing flows.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors