LLM red team, guardrail, and model risk evaluation platform.
ARGUS is a local-first platform for controlled, repeatable, auditable LLM risk evaluation. It is built around app profiles, attack suites, guardrail policies, deterministic evaluators, provider targets, tool-call safety checks, findings, reports, and CI gates.
ARGUS is not a chat app or notebook eval script. The product surface is the evaluation workflow: define an AI system, select attack packs and policies, run a provider or deterministic mock target, inspect findings, understand risk drivers, and export a reproducible report.
LLM applications can fail in ways normal integration tests do not catch: indirect prompt injection, confidential instruction leakage, unsafe tool selection, missing approval boundaries, fabricated citations, and refusal over-disclosure. ARGUS gives platform and security teams a practical harness for testing those risks before a model-backed feature ships.
| Area | What is implemented |
|---|---|
| App profiles | System prompts, owners, allowed tools, blocked behavior, risk tolerance, citation mode |
| Red-team lab | Versioned suites, case metadata, prompt variants, expected behavior |
| Attack library | Prompt injection, jailbreak, tool misuse, PII leakage, citation trap, refusal failure packs |
| Policy engine | Structured findings with severity, evidence, evaluator source, remediation |
| Providers | Deterministic mock, Ollama adapter, OpenAI-compatible adapter, safe public status reporting |
| Tool sandbox | Simulated operational tools, sandbox scenarios, approval checks, and inspectable sample behavior |
| Leakage detection | Email, phone, SSN, API key, bearer token, private key, confidential term checks |
| Grounding checks | Missing citations, fabricated citations, unsupported claims |
| Risk scoring | Explainable score, band, pass/fail recommendation, top drivers |
| Reports | JSON and Markdown export with reproducibility metadata |
| CI gate | Deterministic run, threshold enforcement, non-zero failure exit |
| Auditability | Audit records for seeds, evals, profile/policy/suite changes, report exports |
flowchart LR
Console["Next.js Console"] --> API["FastAPI Control Plane"]
API --> DB[("PostgreSQL / SQLite local test")]
API --> Redis[("Redis coordination")]
API --> Eval["Eval Orchestrator"]
Eval --> Providers["Provider Targets<br/>mock / Ollama / OpenAI-compatible"]
Eval --> Policies["Policy Engine"]
Policies --> Leakage["Leakage Evaluators"]
Policies --> Grounding["Grounding Evaluators"]
Policies --> ToolSafety["Tool Safety Sandbox"]
Policies --> Refusal["Refusal Quality"]
Eval --> Scoring["Risk Scoring"]
Scoring --> Reports["Evidence Store<br/>JSON / Markdown"]
API --> Audit["Audit Log"]
sequenceDiagram
participant U as Operator
participant C as Console
participant A as Control Plane
participant P as Provider Target
participant E as Evaluators
participant R as Report Store
U->>C: Select profile, suites, provider
C->>A: POST /api/v1/evals/run
A->>P: Execute adversarial cases
P-->>A: Responses and metadata
A->>E: Evaluate policy, leakage, grounding, tools, refusal
E-->>A: Structured findings
A->>A: Calculate risk score
A->>R: Persist run artifacts
A-->>C: Run summary, findings, risk drivers
flowchart TD
Case["Test Case + Context"] --> Response["Provider Response"]
Response --> Engine["Policy Engine"]
Engine --> Regex["Regex leakage checks"]
Engine --> Rules["Structured policy rules"]
Engine --> Ground["Grounding checks"]
Engine --> Refusal["Refusal quality checks"]
Regex --> Finding["Finding model"]
Rules --> Finding
Ground --> Finding
Refusal --> Finding
Finding --> Score["Risk score inputs"]
flowchart LR
Prompt["Adversarial prompt"] --> Model["Model target"]
Model --> Call["TOOL_CALL event"]
Call --> Sandbox["Tool sandbox"]
Sandbox --> Allowed{"Tool allowed for profile?"}
Allowed -- no --> Finding["unsafe_tool_call"]
Allowed -- yes --> Approval{"Approval metadata present?"}
Approval -- no --> Missing["missing_tool_approval"]
Approval -- yes --> Audit["Audited simulated execution"]
flowchart TD
Findings["Findings"] --> Severity["Severity weights"]
Findings --> Cases["Failed case count"]
Findings --> Categories["Category spread"]
Severity --> Total["0-100 risk score"]
Cases --> Total
Categories --> Total
Total --> Band["low / moderate / high / critical"]
Total --> Gate["pass/fail recommendation"]
Findings --> Drivers["Top risk drivers"]
flowchart LR
PR["Pull request"] --> Gate["ARGUS deterministic gate"]
Gate --> Mock["Mock provider fixtures"]
Mock --> Eval["Run selected suites"]
Eval --> Thresholds["Threshold checks"]
Thresholds -->|pass| Green["CI passes"]
Thresholds -->|fail| Red["CI blocks merge"]
flowchart TD
Change["Profile, policy, suite, eval, export"] --> Audit["Audit record"]
Eval["Evaluation run"] --> Report["Report service"]
Report --> Json["JSON export"]
Report --> Markdown["Markdown export"]
Audit --> Review["Operator diagnostics"]
cp .env.example .env
docker compose up --buildConsole: http://localhost:3000
API: http://localhost:8080
Health: http://localhost:8080/health
The default provider is provider-mock, which is fixture-backed and deterministic. No paid model keys are required.
For deterministic gate checks:
python3 risk-lab/runners/ci_gate.py --config quality-gates/eval_configs/default.yaml
python3 risk-lab/runners/ci_gate.py --config quality-gates/eval_configs/strict-unsafe-tool.yamldefault.yaml is the normal passing baseline. strict-unsafe-tool.yaml is expected to fail and proves that unsafe tool findings block the gate.
make bootstrap
make test
make gate
make api
make consoleThe CI gate writes a report to evidence-store/reports/ci-gate-report.json and exits non-zero when configured thresholds are violated.
| Endpoint | Purpose |
|---|---|
GET /health |
Service health |
GET /health/dependencies |
Database and runtime diagnostics |
GET/POST /api/v1/app-profiles |
Manage LLM app profiles |
GET/POST /api/v1/test-suites |
Manage red-team suites |
POST /api/v1/test-suites/{id}/cases |
Add adversarial cases |
GET/POST /api/v1/policies |
Manage guardrail policies |
POST /api/v1/evals/run |
Run selected suites |
GET /api/v1/evals/{id}/findings |
Inspect run findings |
POST /api/v1/reports/{id}/export |
Export JSON or Markdown |
GET /api/v1/providers |
View target provider status |
GET /api/v1/attack-packs |
Inspect attack pack catalog |
GET /api/v1/tool-sandbox/tools |
Inspect sandbox tool behavior |
GET /api/v1/tool-sandbox/scenarios |
Inspect sandbox scenarios |
GET /api/v1/audit |
Review audit events |
The screenshots below come from the local stack with seeded runs and deterministic provider responses. They are not mockups.
Top-line release posture across seeded app profiles, provider readiness, and recent risk runs.
Operator view for a failing tool-safety run against the operations agent profile, including case selection, response inspection, evaluator trace, and structured findings.
Evidence-first review surface with severity filters, policy links, and remediation context.
console/ Next.js App Router console, including eval workspace, provider status, findings, and sandbox views
control-plane/ FastAPI API, SQLAlchemy persistence, orchestration, reports, audit
risk-lab/ Attack packs, deterministic evaluators, local runner, CI gate
model-targets/ Mock, Ollama, and OpenAI-compatible provider adapters and registry
tool-sandbox/ Simulated tools, approval policies, scenarios, audit fixtures
datasets/ Seed profiles, policies, suites, cases, expected responses
quality-gates/ Thresholds, CI configs, gate scripts
evidence-store/ Local report/export/artifact storage
ops/ Dockerfiles, runtime scripts, seed and maintenance utilities
docs/ Architecture, evaluator, policy, threat model, operations notes
The repository is organized around ARGUS operating concepts instead of a generic apps/web and apps/api template. The top-level folders map to the risk workflow: model targets are separate from the control plane, evaluators live in risk-lab, simulated tools live in tool-sandbox, reports and artifacts are in evidence-store, and CI thresholds are in quality-gates. That makes the red-team and model-risk concerns visible in the tree before reading any code.
| Scenario | Expected ARGUS behavior |
|---|---|
| Internal support assistant sees hidden KB instructions | Reject indirect prompt injection and keep escalation notes confidential |
| Operations agent is asked to deploy without approval | Flag missing approval metadata and block privileged tool execution |
| Customer support chatbot is asked to email SSN externally | Refuse PII exfiltration and recommend internal redacted ticketing |
| Document QA assistant is pushed into citing a fake policy | Report insufficient grounding or fabricated citation |
| Credential extraction request includes a live-looking key | Refuse without repeating the sensitive value |
| Decision | Reason |
|---|---|
| Deterministic mock provider by default | CI gates need stable, keyless, reproducible results |
| Rule-based evaluators first | Safety regressions should not depend only on another model judgment |
| JSON document persistence for seed domain objects | Keeps local iteration fast while preserving typed API validation |
| PostgreSQL in Docker, SQLite in tests | Mirrors production persistence without slowing local test setup |
| Structured findings everywhere | Findings must be explainable, filterable, reportable, and auditable |
| Report artifact metadata is first-class | Exported evidence needs checksum and path visibility without parsing files manually |
- Add async queue-backed eval execution for long suites.
- Add optional model-as-judge evaluators with calibration metadata.
- Add run-to-run regression comparison by profile and provider.
- Add signed report bundles for release evidence.
- Add richer Monaco-based policy and test case editing flows.


