·
2 commits
to main
since this release
First public release of autoguardrails, an LLM / AI-safety guardrail research library and evaluation harness (autoresearch-style) by Santander AI Lab.
Highlights
- Single mutable surface (
policy.md) searched against a frozen evaluation suite (eval_suite.jsonl) and judge prompt, minimizing attack success rate (ASR) with a benign-pass floor. - Stdlib-only, offline-by-default Python harness; pluggable OpenAI-compatible target/judge endpoints via
AUTOGUARDRAILS_*env vars. - Typed exceptions (
SurfaceDriftError,BaselineRequiredError,CandidateUnchangedError), append-onlyresults.tsvrun log, and arun_autoguardrails.shwrapper. - 27-test suite at 91% branch coverage; CI runs ruff + black + mypy across Python 3.10/3.11/3.12.
Quality & supply chain
- All GitHub Actions pinned to commit SHAs; CodeQL, OpenSSF Scorecard, pip-audit, license + internal-pattern checks, CLA Assistant and stale automation.
- Apache-2.0 licensed; complete community files (CONTRIBUTING, CODE_OF_CONDUCT, SECURITY, CODEOWNERS, issue/PR templates).
See CHANGELOG.md for details.