Skip to content

Releases: aaravanmay/faultline

v0.4.2 — zero-config scan, 5 framework adapters, hardening

11 Jun 14:50

Choose a tag to compare

Zero-config testing, framework integrations, and a hardening pass. Everything additive.

Highlights

  • faultline scan my_agent.py:agent — zero-config: discovers your wrapped tools, breaks them, gates CI. Plus faultline doctor (preflight) and faultline init (scaffold a suite + CI workflow).
  • One-line framework integration: fl.instrument(...) for LangGraph, LangChain, LlamaIndex, pydantic-ai, crewAI — each verified against the real installed library.
  • fl.tools_really_called([...]) — catches an agent answering from a tool it never actually called.
  • replay(transform=...) — catches silent behavioral drift after a context change.
  • fl.assert_resilient(agent, task) — drop into any pytest/unittest suite.
  • scan --explain — shows exactly what was corrupted and why each FAIL.
  • Hardening from 3 adversarial passes (intermittent silent failures now gate; $6,000-style figures now detected; Decimal money values now corruptible).

Full notes: CHANGELOG.md · honest scope: CAPABILITIES.md

pip install -U faultline

faultline v0.4.1

09 Jun 23:17

Choose a tag to compare

Runtime guard (shadow/enforce seatbelt) + tamper-evident attestation report (attest / verify). 11 test suites green. Benchmark: 97.5% recall, 2.2% false-alarm, 100% deterministic (see benchmark/MEASUREMENT.md). pip install faultline

faultline v0.4.0

09 Jun 23:01

Choose a tag to compare

Deterministic silent-failure testing for AI agents — no LLM judge.

Highlights

  • Token-only hosted push (FAULTLINE_TOKEN is all you need)
  • Loud, false-green-proof results (.ok / .failed / .assert_ok())
  • Six CLI modes that all gate CI: run, probe, fuzz, scenarios, replay, mine
  • Marketplace-grade GitHub Action (mode + fail-on-silent inputs, verdict outputs)
  • Detector upgrades: display-arg FP fix, derived-value FN layer, pandas + dict support

Measured (85-case adversarial benchmark, independently audited, reproducible — see benchmark/MEASUREMENT.md): 97.5% recall · 2.2% false-alarm · 100% deterministic.

pip install faultline