FieldAgent — Contract Red-Flag Finder

Portfolio artifact #3 (sibling to Quorum agent infra and Aegis agent-safety). Drop an agent into a messy real-world vertical — commercial-contract review — and measure that the agentic design beats naive single-shot, graded against a public gold dataset (CUAD).

Live demo: https://fieldagent.thomaspeng.ca · Dataset: CUAD v1 (CC BY 4.0)

Headline numbers

(make eval-dry reproduces these offline from committed fixtures, zero cost — a test asserts it)

Claim Result

Detection F1 = 0.548 (P = 0.74 / R = 0.44), 95% CI [0.46, 0.64] — 20 held-out CUAD contracts, 15 risk clause types, 191 gold spans, span-IoU ≥ 0.5. Detection recall 0.59 (right clause type, any overlap): most of the recall gap is clauses found but quoted too tightly to clear IoU 0.5, not true misses.

Agentic lift full pipeline = +0.21 F1 over a keyword/regex floor (0.337 → 0.548) — the clean, baseline-independent comparison. The skeptic verifier shifts precision 0.72 → 0.74 but its F1 effect (−0.014) is not distinguishable from zero at n=20 (overlapping CIs). Caveat on the single-shot LLM baseline: it scores far lower (0.10 F1), but that number is output-budget-confounded — deepseek-v4-pro's reasoning truncates the one-pass response on 17/20 contracts (the committed fixtures use a 4k-token cap); an 8k-token spot-check recovered 5–7 clauses/contract. The single-shot gap is therefore an upper bound; a full fair-baseline re-run is pending DeepSeek credit (make eval-live). See writeup §4.

What it does

Reads a real contract and flags risk-bearing clauses — exact offending span, a severity, and a plain-English "why this is risky" — for 15 CUAD clause types (Uncapped Liability, Cap On Liability, Liquidated Damages, Renewal Term, Non-Compete, Exclusivity, No-Solicit Of Employees, Most Favored Nation, IP Ownership Assignment, License Grant, Termination For Convenience, Anti-Assignment, Change Of Control, Source Code Escrow, Audit Rights).

Architecture

chunk → focused taxonomy extraction (fan-out) → skeptic verification → dedupe/merge → structured findings, fully traced. Vendors Quorum's kernel (core/): the ModelClient seam (Fake/Recorded/DeepSeek/Anthropic), SQLite tracing, the concurrent orchestrator, and per-model pricing. Grading is span-IoU (no LLM judge in the success path). See docs/writeup.md for the full methodology + ablations + threats.

Run it

python -m venv .venv && .venv/bin/pip install -e ".[dev]"
make test          # unit suite, no network, no paid calls
make fetch-cuad    # sha256-pinned CUAD_v1.json (raw text stays local, gitignored)
make eval-dry      # reproduce the headline tables from committed fixtures (zero cost)
# Live (needs a DeepSeek key + opt-in; prints a cost estimate and refuses runs over $1):
export OSSLLM_API_KEY=...   # or source your DeepSeek env file
FIELDAGENT_LIVE=1 make eval-live
# Analyze your own contract:
FIELDAGENT_LIVE=1 .venv/bin/python cli/analyze.py path/to/contract.txt

Attribution

This project evaluates against CUAD v1 — The Contract Understanding Atticus Dataset, The Atticus Project, https://www.atticusprojectai.org/cuad — licensed CC BY 4.0. Raw contract text is not redistributed here; evals/benchmark/fetch_cuad.py reconstitutes it locally (sha256-pinned).

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
cli		cli
core		core
deploy		deploy
docs		docs
evals		evals
fieldagent		fieldagent
tests		tests
web		web
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
STATUS.md		STATUS.md
eval_live.log		eval_live.log
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FieldAgent — Contract Red-Flag Finder

Headline numbers

What it does

Architecture

Run it

Attribution

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FieldAgent — Contract Red-Flag Finder

Headline numbers

What it does

Architecture

Run it

Attribution

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages