Release autoguardrails v0.1.0 · SantanderAI/autoguardrails

First public release of autoguardrails, an LLM / AI-safety guardrail research library and evaluation harness (autoresearch-style) by Santander AI Lab.

Highlights

Single mutable surface (policy.md) searched against a frozen evaluation suite (eval_suite.jsonl) and judge prompt, minimizing attack success rate (ASR) with a benign-pass floor.
Stdlib-only, offline-by-default Python harness; pluggable OpenAI-compatible target/judge endpoints via AUTOGUARDRAILS_* env vars.
Typed exceptions (SurfaceDriftError, BaselineRequiredError, CandidateUnchangedError), append-only results.tsv run log, and a run_autoguardrails.sh wrapper.
27-test suite at 91% branch coverage; CI runs ruff + black + mypy across Python 3.10/3.11/3.12.

Quality & supply chain

All GitHub Actions pinned to commit SHAs; CodeQL, OpenSSF Scorecard, pip-audit, license + internal-pattern checks, CLA Assistant and stale automation.
Apache-2.0 licensed; complete community files (CONTRIBUTING, CODE_OF_CONDUCT, SECURITY, CODEOWNERS, issue/PR templates).

See CHANGELOG.md for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

autoguardrails v0.1.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Highlights

Quality & supply chain

Uh oh!