assay

Assay every AI agent decision before money moves. A safety and validation library for AI agentic workflows in finance, contributed by VenturFlow to the open-source community.

About this project

This library is an open-source goodwill release from VenturFlow — a separate, commercial Agentic AI platform built for venture-capital firms. We hit the same set of agent-safety problems again and again while building VenturFlow's platform, decided the field shouldn't each have to solve them from scratch, and pulled out the layer that we think is generally useful and ship it here under Apache 2.0.

This repository is not the VenturFlow platform. It's an ad-hoc, focused library: just the validation and safety layer that sits between an AI agent and a downstream action — a trade, a wire, a filing, a report.

Use this if you're shipping AI agents that act on financial workflows and you want guardrails you can read, version, and audit.

Visit VenturFlow if you want the full agentic stack purpose-built for VC firms (research, sourcing, diligence, portfolio, LP reporting, …). The two interoperate but neither is required for the other.

What it does

Sits between an AI agent and the action it's about to take, at four boundaries (and more to come):

                ┌─────────────────────────────────────────┐
                │            YOUR AI AGENT                │
                │   (Claude / GPT / Gemini / any LLM)     │
                └────────────────┬────────────────────────┘
                                 │
   ┌─────────────────────────────┼─────────────────────────────┐
   │ 1. OUTPUT VALIDATION        │  before the agent's          │
   │    schema dispatch +        │  recommendation is acted on  │
   │    regulatory rule packs    │                              │
   ├─────────────────────────────┼─────────────────────────────┤
   │ 2. TOOL-CALL GATE           │  before any side-effectful   │
   │    typed args + policies    │  tool runs                   │
   │    + approval thresholds    │                              │
   ├─────────────────────────────┼─────────────────────────────┤
   │ 3. TRAJECTORY VALIDATOR     │  during / after a whole      │
   │    predecessors + budget    │  agent session               │
   │    + loop detection         │                              │
   ├─────────────────────────────┼─────────────────────────────┤
   │ 4. ENTITY RESOLVER          │  any time a ticker/CUSIP/    │
   │    ground every named       │  CIK/ISIN/counterparty is    │
   │    identifier               │  mentioned                   │
   └─────────────────────────────┼─────────────────────────────┘
                                 │
                          Downstream action
                  (order / wire / filing / report)

Every check writes to an append-only audit log. Every regulatory rule pack is stamped with its citation and effective date.

Status & disclaimers

Early. API is stabilising; we'll keep the slash-command and validate_tool_call / AssayValidator surfaces stable as v2.0 lands. Internals will keep moving.
Scaffolding, not legal advice. The shipped regulatory packs (SEC 15c3-1, Reg T, Volcker, FINRA 4210, MiFID II suitability, OFAC sanctions) are machine-checkable subsets of named requirements with citations. They are not a substitute for compliance counsel and they do not auto-track regulatory amendments.
BYOD. Your firm's data — restricted securities, sanctions lists, sector allowlists, portfolio state — stays local. Nothing is sent to VenturFlow or any third party unless you configure a remote LLM provider for the optional semantic-consistency check.

Install

Requirements

Python 3.11+
pip install pydantic pyyaml structlog requests
Optional, only for the semantic-consistency layer:
- pip install anthropic (default provider) or pip install openai
- ANTHROPIC_API_KEY or OPENAI_API_KEY

From source

git clone https://github.com/VenturFlow/Assay.git
cd assay
pip install -e ".[dev]"

Configure your environment

cp .env.example .env
# edit .env — at minimum set FIRM_DATA_PATH to your BYOD JSON

Quick start — one example per layer

1. Validate an agent output

from assay.validator import AssayValidator

v = AssayValidator(
    firm_data_path="data/my_firm.json",
    enable_semantic_check=False,
)

result = v.validate({
    "ticker": "MSFT",
    "asset_class": "equity",
    "action": "BUY",
    "confidence": 0.82,
    "position_size_pct": 0.03,
    "reasoning": "Liquid name, stable bid-ask, clean fit for the rotation.",
    "risk_score": 4.0,
    "time_horizon": "medium",
    "flags": ["liquidity_checked"],
})

print(result["passed"], result["action"])         # True passed
print(result["workflow_type"])                    # trade_recommendation (auto-detected)
print(result["active_packs"])                     # [{"name": "firm_base", "version": "...", ...}]

2. Gate a tool call before it runs

from assay.tools import ToolGate

decision = ToolGate().validate_tool_call(
    "wire_transfer",
    {
        "source_account_id": "A1",
        "beneficiary_name": "OK Counterparty",
        "beneficiary_account": "US-12345",
        "beneficiary_country": "US",
        "amount_usd": 2_000_000,
        "purpose": "vendor payment",
    },
)

print(decision.action)        # require_approval
print(decision.reason)        # amount_usd=2000000.0 crosses approval threshold 1000000

3. Validate a whole agent trajectory

from assay.trajectory import AgentSession, TrajectoryValidator

session = AgentSession(agent_id="trade-bot-1", goal="rotate into MSFT")
session.record_plan("read state, check market data, place a small order")
session.record_tool_call("read_position", {"account_id": "A1"})
session.record_tool_call("read_market_data", {"ticker": "MSFT"})
session.record_tool_call("place_order", {
    "account_id": "A1", "ticker": "MSFT", "side": "buy",
    "quantity": 100, "order_type": "limit", "limit_price": 412.20,
})

result = TrajectoryValidator().check(session)
print(result.passed)                  # True
print(result.budget_consumed)         # {"tool_calls": 3, "cost_usd": 0.0, ...}

4. Resolve a named entity

from assay.entities import MockResolver

print(MockResolver().resolve("AAPL", "ticker").found)            # True
print(MockResolver().resolve("FAKETICKER123", "ticker").found)   # False

The four layers in detail

1. Output validation (`AssayValidator`)

Dispatches to one of 10 typed workflow schemas (auto-detected if workflow_type is absent), then runs five sub-layers:

Schema — Pydantic model validation per workflow.
Business rules — the active rule packs filtered by workflow type.
Risk guardrails — position-size / concentration / confidence checks for trade-like workflows.
Firm BYOD — restricted securities, watchlists, approved sectors.
Semantic consistency (optional, AI-powered) — does the agent's reasoning actually support its conclusion?
Entity check (optional) — ticker/CUSIP/CIK/ISIN/counterparty grounding.

Every call appends a structured entry to logs/audit.jsonl. Outputs that violate hard rules trigger escalation through the configured channel (Slack today; PagerDuty/email are pluggable).

Workflow schemas

`workflow_type`	Pydantic model	Typical agent
`trade_recommendation`	`TradeRecommendation`	Single-name trade or position-sizing bot
`portfolio_summary`	`PortfolioSummary`	Portfolio analyst
`portfolio_rebalance`	`PortfolioRebalance`	Rebalancer / tax-loss harvester
`nav_reconciliation`	`NavReconciliation`	Fund-ops agent
`due_diligence_report`	`DueDiligenceReport`	DD bot / IC pre-read
`kyc_flag`	`KycFlag`	AML / sanctions screening
`margin_call`	`MarginCall`	Margin monitor
`credit_memo`	`CreditMemo`	Credit underwriting agent
`options_strategy`	`OptionsStrategy`	Options structurer
`ocio_recommendation`	`OcioRecommendation`	OCIO / IPS-aligned allocator

Shipped rule packs (limited testing for now)

Pack	Citation	Applies to
`firm_base`	(internal default)	trade, rebalance, OCIO
`sec_15c3_1`	17 CFR 240.15c3-1	trade, rebalance, margin call
`reg_t_margin`	12 CFR 220	trade, options, margin call
`volcker`	12 USC 1851 / Reg VV	trade, rebalance
`finra_4210`	FINRA Rule 4210	margin, options, trade
`mifid_ii_suitability`	Dir 2014/65/EU Art. 25	trade, OCIO, rebalance
`ofac_sanctions`	31 CFR Ch. V; SDN List	trade, KYC, credit memo, rebalance

Each pack carries version, effective_date, regulator, citation. Compose them in your firm BYOD file:

{
  "firm_name": "Acme Capital",
  "rule_packs": ["firm_base", "sec_15c3_1", "ofac_sanctions"]
}

Add a firm-specific pack by writing one more YAML under assay/rules/packs/ and referencing it. See assay/rules/packs/_README.md for the rule grammar.

2. Tool-call gate (`ToolGate`)

Pre-execution validation for any agent tool call. Returns allow, deny, or require_approval.

Registered tools out of the box: place_order, cancel_order, wire_transfer, submit_filing, request_quote, read_position, read_market_data. Read-only tools auto-allow once typed args pass. Side-effectful tools go through:

Typed-args validation (Pydantic per tool).
Tool-policy packs in assay/tools/policies/.
Firm rule packs with applies_to: ["tool:<name>"] (e.g. OFAC auto-applies to tool:wire_transfer).
Approval thresholds — defaults: wire_transfer.amount_usd >= $1M and place_order.estimated_notional_usd >= $5M → require_approval.

Add a new tool by:

Defining a Pydantic args model in assay/tools/schemas.py.
Registering it in assay/tools/registry.py.
(Optional) adding policy YAML in assay/tools/policies/.

3. Trajectory validator (`TrajectoryValidator`)

Catches failures that only show up at the session level — not a single bad output, but a bad sequence.

Built-in policy types:

RequiredPredecessor — place_order must come after read_position and read_market_data.
BudgetCap — caps on tool_calls, cost_usd, retries_per_tool, elapsed_seconds.
LoopDetection — flags N consecutive identical (tool, args) calls.
ToolFrequencyCap — e.g. request_quote ≤ 50/session.

Default policies are tuned for trade-execution agents; override per-session via the policies argument.

4. Entity resolver (`EntityChecker`)

Catches the single most common agent failure in finance: invented tickers, CUSIPs, CIKs, ISINs, fund/counterparty names.

Pluggable resolvers:

MockResolver — small built-in map, for tests and offline runs.
LocalRegistryResolver — reads CSVs from a directory you control (BYOD). Validates symbol shape (CUSIP-9, ISIN-12, etc.).
OpenFigiResolver — stub for the OpenFIGI API (drop in an API key + a small POST).
ChainedResolver — try registries in order; return the first hit.

default_resolver() builds a chain from env + firm data: LocalRegistry (if configured) → OpenFIGI (if key set) → Mock.

Bring your own data (BYOD)

Your firm's data never leaves the local environment. The example file data/my_firm_example.json shows the supported keys:

Key	Used by	Purpose
`rule_packs`	rules engine	Which packs are active
`restricted_securities`	firm-data validator + tool gate	Tickers that can never be recommended or ordered
`watchlist`	firm-data validator	Tickers that require human sign-off
`approved_sectors`	firm-data validator	Sector allowlist
`sanctioned_entities`	OFAC pack + tool gate	Names blocked everywhere (load your SDN list here)
`sanctioned_jurisdictions`	OFAC pack + tool gate	Country blocklist
`portfolio_positions`	risk guardrails	Current weights for concentration checks
`sector_weights`	risk guardrails	Current sector exposure
`custom_rules`	rules engine	Extra YAML-style rules merged at runtime
`risk_overrides`	risk guardrails	Per-firm threshold overrides
`entity_registry_dir`	entity resolver	Directory of CSVs for the LocalRegistry resolver

Get started:

cp data/my_firm_example.json data/my_firm.json
echo "FIRM_DATA_PATH=./data/my_firm.json" >> .env

Audit log

Every output validation appends one JSON line to logs/audit.jsonl:

{
  "timestamp": "2025-01-15T18:11:14Z",
  "agent_id": "trade-bot-1",
  "firm": "Acme Capital",
  "output": { ...validated output... },
  "validation": {
    "passed": true,
    "action": "passed",
    "workflow_type": "trade_recommendation",
    "active_packs": [...],
    "all_violations": [],
    "all_warnings": []
  },
  "action_taken": "passed"
}

Drop it into Splunk, Datadog, or any append-only log pipeline. For SOC 2 / regulated firms we recommend adding hash-chained or signed entries (see roadmap below).

Claude Code plugin

Install:

claude plugin add ./claude-code-plugin

Slash commands you get:

Command	Boundary
`/vf-validate <json-or-path>`	output validation
`/vf-tool-check <tool> <args>`	pre-execution tool gate
`/vf-trajectory <session-or-path>`	whole-session check
`/vf-entity <kind> <values...>`	entity grounding
`/vf-rules`, `/vf-packs`, `/vf-audit`, `/vf-firm-init`	inspection / setup

Plus a assay-validation skill (auto-suggests the right layer) and a assay-validator subagent (runs the check, refuses workarounds). Details in claude-code-plugin/README.md.

Cowork plugin

Install:

Claude → Settings → Integrations → Cowork Plugins → Add local plugin
→ point at: cowork-plugin/manifest.json

Exposes six commands over the Cowork stdin/stdout JSON protocol: validate, validate_tool_call, validate_trajectory, resolve_entity, rules, audit. Details in cowork-plugin/README.md.

Provider configuration (semantic check)

The optional semantic-consistency layer runs an independent LLM call to ask "does this reasoning actually support this conclusion?" It's model-agnostic via a provider abstraction in assay/providers/.

Claude (Anthropic) — default

export ASSAY_PROVIDER=claude
export ANTHROPIC_API_KEY=sk-ant-...

OpenAI-compatible

Works with OpenAI, Azure OpenAI, Together AI, Groq, Mistral, and local LM Studio:

export ASSAY_PROVIDER=openai        # or: azure | together | groq | mistral | local
export OPENAI_API_KEY=sk-...

Project structure

assay/
├── assay/
│   ├── validator.py             # main orchestrator
│   ├── schema/                  # 10 Pydantic workflow models + registry
│   ├── rules/
│   │   ├── engine.py            # pack-aware rule evaluator
│   │   └── packs/               # YAML rule packs (firm_base + 6 regulatory)
│   ├── risk/                    # numeric guardrails
│   ├── byod/                    # firm-data loader + validator
│   ├── entities/                # resolver ABC + Mock/Local/OpenFIGI/Chained
│   ├── tools/                   # tool gate + typed schemas + policies
│   ├── trajectory/              # AgentSession + TrajectoryValidator + policies
│   ├── semantic/                # AI consistency checker
│   ├── audit/                   # append-only logger
│   ├── escalation/              # Slack / log-only / email
│   ├── providers/               # Claude + OpenAI-compatible
│   └── cowork/                  # legacy Cowork plugin shim
├── claude-code-plugin/          # Claude Code v2 plugin (commands/skill/agent/scripts)
├── cowork-plugin/               # Cowork v2 plugin (manifest + entry script)
├── data/
│   └── my_firm_example.json
├── examples/
│   ├── run_demo_agent.py        # output-validation demo
│   └── run_agentic_loop.py      # tool gate + trajectory + entity demo
├── tests/                       # 40 tests covering all four layers
└── pyproject.toml

Testing

pip install -e ".[dev]"
PYTHONPATH=. pytest tests/ -v

Current state: 40 tests passing across schema dispatch, rule packs, tool gate, trajectory, entity resolver.

For UI/agent integration, run the demos:

PYTHONPATH=. FIRM_DATA_PATH=data/my_firm_example.json python examples/run_demo_agent.py
PYTHONPATH=. FIRM_DATA_PATH=data/my_firm_example.json python examples/run_agentic_loop.py

Roadmap

These are things we have working internally at VenturFlow and want to upstream once they stabilise:

Adversarial eval harness — red-team trajectory suite (prompt-injected term sheets, restricted-ticker traps, sycophancy probes). Wires into eval frameworks like inspect-ai.
Tamper-evident audit log — hash-chained or signed entries with provenance fields (prompt template version, model snapshot, retrieved-doc hashes, firm-data revision).
Citation grounding — when an agent claims "per Q3 2025 10-K, revenue grew 12%," verify the claim against the cited document.
Determinism + replay — backtest historical agent outputs against today's rule packs to find what would have been blocked.
Mid-trajectory HITL — pause-and-approve gates with timeouts, not just post-hoc escalation.
Confidence calibration tracking — log claimed confidence vs. realised outcome; auto-tune per (agent, asset class).
Policy-as-code — optional OPA/Rego/Cedar/CEL backend for complex multi-field conditions.
More regulatory packs — Form PF, SEC Marketing Rule, EU AI Act high-risk classification, GDPR-for-financial-data, Basel III concentration limits.

We're not in a hurry — better to ship one correct pack than ten approximate ones.

Contributing

This is an early, ad-hoc project. The bar for contributions:

Tests. Anything you add should have a test that fails without your change.
Citations. New regulatory packs must reference an authority and stamp version + effective_date. No "vibes-based compliance."
No silent rule weakening. Don't make a recommendation pass by editing a pack. Either fix the recommendation or escalate.
Backwards-compat for slash commands and the AssayValidator / ToolGate public API. Internals are fair game.

Open issues / PRs at github.com/VenturFlow/Assay.

Who is VenturFlow?

VenturFlow is an Agentic AI platform purpose-built for venture-capital firms — sourcing, diligence, portfolio support, LP reporting. We're a commercial product. This validation library is something we hit the need for repeatedly while building VenturFlow's agentic stack, and we believe other teams shipping AI agents into financial workflows shouldn't have to re-solve it.

If this library is useful to you, great — we're glad. If you want the full agentic platform on top, come talk to us.

License

Apache License 2.0 — see LICENSE.

Copyright (c) 2026 VenturFlow
Licensed under the Apache License, Version 2.0.

Disclaimer

This library and the regulatory rule packs it ships are not legal, compliance, tax, or investment advice. The packs encode named requirements as machine-checkable rules; they do not substitute for review by qualified counsel and they do not track regulatory amendments automatically. You are responsible for the rules you choose to apply to your agents.

An open-source contribution from VenturFlow.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assay		assay
claude-code-plugin		claude-code-plugin
cowork-plugin		cowork-plugin
data		data
examples		examples
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

assay

About this project

What it does

Status & disclaimers

Install

Requirements

From source

Configure your environment

Quick start — one example per layer

1. Validate an agent output

2. Gate a tool call before it runs

3. Validate a whole agent trajectory

4. Resolve a named entity

The four layers in detail

1. Output validation (AssayValidator)

Workflow schemas

Shipped rule packs (limited testing for now)

2. Tool-call gate (ToolGate)

3. Trajectory validator (TrajectoryValidator)

4. Entity resolver (EntityChecker)

Bring your own data (BYOD)

Audit log

Claude Code plugin

Cowork plugin

Provider configuration (semantic check)

Claude (Anthropic) — default

OpenAI-compatible

Project structure

Testing

Roadmap

Contributing

Who is VenturFlow?

License

Disclaimer

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Output validation (`AssayValidator`)

2. Tool-call gate (`ToolGate`)

3. Trajectory validator (`TrajectoryValidator`)

4. Entity resolver (`EntityChecker`)

Packages