Skip to content

autoguardrails v0.1.0

Latest

Choose a tag to compare

@opensource-SantanderAI opensource-SantanderAI released this 16 Jun 20:24
· 2 commits to main since this release

First public release of autoguardrails, an LLM / AI-safety guardrail research library and evaluation harness (autoresearch-style) by Santander AI Lab.

Highlights

  • Single mutable surface (policy.md) searched against a frozen evaluation suite (eval_suite.jsonl) and judge prompt, minimizing attack success rate (ASR) with a benign-pass floor.
  • Stdlib-only, offline-by-default Python harness; pluggable OpenAI-compatible target/judge endpoints via AUTOGUARDRAILS_* env vars.
  • Typed exceptions (SurfaceDriftError, BaselineRequiredError, CandidateUnchangedError), append-only results.tsv run log, and a run_autoguardrails.sh wrapper.
  • 27-test suite at 91% branch coverage; CI runs ruff + black + mypy across Python 3.10/3.11/3.12.

Quality & supply chain

  • All GitHub Actions pinned to commit SHAs; CodeQL, OpenSSF Scorecard, pip-audit, license + internal-pattern checks, CLA Assistant and stale automation.
  • Apache-2.0 licensed; complete community files (CONTRIBUTING, CODE_OF_CONDUCT, SECURITY, CODEOWNERS, issue/PR templates).

See CHANGELOG.md for details.