AgentFT

Agent Flow Test - Pytest for AI agents

AgentFT (Agent Flow Test) is a pytest-style evaluation framework for AI agents. It provides composable primitives for tasks, scenarios, agents, and judges, with production-ready infrastructure including async execution, retries, rate limiting, and comprehensive reporting.

Features

Pytest-like simplicity: Minimal boilerplate, clear abstractions
Production-ready: Async execution with retries and rate limiting
Full observability: Traces, costs, metadata, HTML reports
Composable judges: Layer multiple evaluation strategies (sequential, voting, weighted)
Environment support: Not just Q&A - full stateful agent evaluation
CLI tools: Run evals, view summaries, compare runs

Quick Start

Install from PyPI:

pip install agentft

Run an example:

git clone https://github.com/Geddydukes/agentflowtest
cd agentflowtest
aft run --config examples/config_example.py

Or use in code:

from agentft import Task, RunConfig, run
from agentft.presets import build_math_basic_scenario, ExactMatchJudge

# Define your agent
class MyAgent:
    name = "my_agent"
    version = "0.1.0"
    provider_key = None
    
    async def setup(self): pass
    async def reset(self): pass
    async def teardown(self): pass
    
    async def run_task(self, task, context=None):
        # Your agent logic here
        return {"response": "42"}

# Create evaluation config
config = RunConfig(
    name="quick_test",
    agents=[MyAgent()],
    scenarios=[build_math_basic_scenario()],
    judges=[ExactMatchJudge()],
)

# Run evaluation
results = run(config)
print(f"Passed: {sum(r.passed for r in results)}/{len(results)}")

View results:

Console: Summary printed automatically
HTML: Open runs/<run_id>/report.html in browser
Raw data: runs/<run_id>/results.jsonl

Why AgentFT?

Existing eval frameworks are either too simple or too complex:

Simple tools: Just run prompts, no structure or observability
Academic benchmarks: Steep learning curve, not designed for iteration
Vendor platforms: Lock-in, limited customization

AgentFT hits the sweet spot:

Pytest-like API (minimal boilerplate)
Production-ready features (async, retries, rate limiting)
Full transparency (traces, costs, metadata)
Works for simple Q&A through complex multi-step agents

CLI Commands

`aft run`

Run an evaluation from a Python config file.

aft run --config path/to/config.py

The config file should define a RunConfig object named config.

`aft summary`

Show a summary of a completed run.

aft summary --run-dir runs/my_run-abc123/

`aft compare`

Compare two runs to identify regressions and improvements.

aft compare --run-a runs/run1/ --run-b runs/run2/

Run Artifacts

Running an evaluation creates a directory under runs/<run_id>/ with:

results.jsonl: All evaluation results in JSONL format
traces.jsonl: Event traces for debugging and analysis
run_metadata.json: Run configuration, environment, and version information
report.html: Interactive HTML report with statistics, per-agent performance, and failing tasks

Open report.html in your browser to view the detailed evaluation report.

Testing

AgentFT includes a comprehensive test suite with 43+ tests covering all major components:

Core types (Task, Cost, EvaluationResult, Trace)
Scenarios and presets
Judges (ExactMatchJudge, CompositeJudge)
Runner and lifecycle hooks
Storage and reporting functions
Integration tests

Run tests:

# Install dev dependencies
pip install -e ".[dev]"

# Run all tests
pytest

# Generate HTML test report
pytest --html=test-results/report.html --self-contained-html

# Generate coverage report
pytest --cov=src/agentft --cov-report=html:htmlcov

Test reports are generated in test-results/ and coverage reports in htmlcov/.

Project Status

Agent Flow Test (AgentFT) is in active development. The core framework is functional and ready for evaluation use cases.

Current version: 0.1.0

Links

PyPI: https://pypi.org/project/agentft/
GitHub: https://github.com/Geddydukes/agentflowtest

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
examples		examples
scripts		scripts
src/agentft		src/agentft
tests		tests
.gitignore		.gitignore
DESIGN_OVERVIEW.md		DESIGN_OVERVIEW.md
IMPLEMENTATION_PLAN.md		IMPLEMENTATION_PLAN.md
LICENSE		LICENSE
Makefile		Makefile
PUBLISHING.md		PUBLISHING.md
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AgentFT

Features

Quick Start

Why AgentFT?

CLI Commands

`aft run`

`aft summary`

`aft compare`

Run Artifacts

Testing

Project Status

Links

About

Uh oh!

Releases

Packages

Languages

License

Geddydukes/agentflowtest

Folders and files

Latest commit

History

Repository files navigation

AgentFT

Features

Quick Start

Why AgentFT?

CLI Commands

aft run

aft summary

aft compare

Run Artifacts

Testing

Project Status

Links

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`aft run`

`aft summary`

`aft compare`

Packages