SecureVector Guardian 1.3.0
Adds encoded-payload and agent-era injection coverage, and hardens the model's evaluation and data-provenance guarantees. All training data remains 100% SecureVector-original — now enforced by automated tests.
Added
- URL / percent-encoding decode-and-rescan — percent-decodes inline
%xxpayloads and rescans the plaintext (e.g.ignore%20all%20previous%20instructions). Gated so it only activates when%xxis present and decoding changes the text — benign prose and benign encoded URLs produce no false positives. - Broadened agent-era injection coverage via original training templates: tool/plugin misuse, RAG / retrieved-document indirect injection, and memory/conversation poisoning. (Concepts from OWASP LLM06/LLM08 and MITRE ATLAS; all example text authored by SecureVector.)
- Honest, leak-proof evaluation — content-hash–frozen held-out test set, train/test near-duplicate (paraphrase) leak guard, recall-at-FPR frontier, 95% bootstrap CIs, and per-category support flags.
- Adversarial red-team regression eval over a frozen 1,955-example corpus (held out of training, verified by the leak guard).
- Provenance enforcement — internal-source + no-public-dataset-marker checks run during training; a static no-public-dataset-import guard runs in CI.
Changed
- Retrained on the original corpus. Precision held (held-out FPR ≈ 0.02; long-document benign FPR 0.0); obfuscation / buried-in-document / base64/hex robustness maintained.
canonicalize()is now idempotent; malformed rule files warn instead of being silently skipped.
Data & legal posture (unchanged, now enforced)
- 100% original training data; no third-party datasets/prompts/rules/code/model weights. No pretrained checkpoints. Public benchmarks are evaluation-only. Permissive OSS deps only (scikit-learn/NumPy/SciPy — BSD; PyYAML/joblib — MIT). Ships a zero-dependency pure-Python runtime that is byte-exact to the trained model (parity Δ = 0).
Full notes: see CHANGELOG.md.