ReputAgent Data
A structured, open dataset for understanding AI agent systems — how they fail, how to evaluate them, where they're deployed, and what the key concepts are.
Maintained by ReputAgent — reputation for AI agents, earned through evaluation.
| Category | Count | Description | Browse |
|---|---|---|---|
| Glossary | 112 | Terms spanning agents, evaluation, trust, governance, and failures | Search terms |
| Research Index | 97 | Curated arXiv papers on multi-agent systems, evaluation, and agent coordination | Read summaries |
| Ecosystem Tools | 70 | Curated agent frameworks and tools with classification, metrics, and protocol support | Compare tools |
| Use Cases | 47 | Domain-specific agent challenges in finance, healthcare, legal, cybersecurity, and 26 more domains | Explore by domain |
| Failure Modes | 35 | Documented failure modes with severity ratings, symptoms, root causes, and mitigations | View failure library |
| Evaluation Patterns | 34 | Patterns for LLM-as-Judge, Human-in-the-Loop, Red Teaming, orchestration, and more | Browse patterns |
| Protocols | 9 | Agent communication protocols: MCP, A2A, ANP, AG-UI, and others | Compare protocols |
Every JSON entry includes a canonical_url linking to its full page on reputagent.com.
Over 40% of agentic AI projects will be canceled by 2027 due to escalating costs and inadequate risk controls (Gartner). About 90% of high-value AI use cases remain stuck in pilot (McKinsey).
The gap isn't capability — it's trust. Teams can't answer: "Can I trust this agent?"
ReputAgent exists to close that gap. This dataset is the structured foundation: documented failure modes so teams learn from others' mistakes, evaluation patterns so they can test systematically, and a shared vocabulary so the field can communicate clearly.
import json
from pathlib import Path
from collections import Counter
data = Path("data")
glossary = json.loads((data / "glossary.json").read_text())
ecosystem = json.loads((data / "ecosystem.json").read_text())
papers = json.loads((data / "papers.json").read_text())
usecases = json.loads((data / "usecases.json").read_text())
failures = json.loads((data / "failures.json").read_text())
patterns = json.loads((data / "patterns.json").read_text())
protocols = json.loads((data / "protocols.json").read_text())
total = len(glossary) + len(ecosystem) + len(papers) + len(usecases) + len(failures) + len(patterns) + len(protocols)
print(f"{total} entries across 7 categories")
# Critical failure modes
critical = [f for f in failures if f["severity"] == "critical"]
print(f"{len(critical)} critical failure modes")
# Glossary by category
for cat, count in Counter(t["category"] for t in glossary).most_common():
print(f" {cat}: {count} terms")
# Top ecosystem tools by stars
for tool in ecosystem[:10]:
print(f" {tool['stars']:>6} stars {tool['fullName']}")
# Every entry links back to its canonical page
print(f"\nExample: {failures[0]['canonical_url']}")import { readFileSync } from "fs";
const load = (file) => JSON.parse(readFileSync(`data/${file}`, "utf-8"));
const glossary = load("glossary.json");
const ecosystem = load("ecosystem.json");
const failures = load("failures.json");
const patterns = load("patterns.json");
console.log(`${glossary.length} glossary terms`);
console.log(`${ecosystem.length} ecosystem tools`);
console.log(`${failures.length} failure modes`);
console.log(`${patterns.length} evaluation patterns`);
// Every entry has a canonical_url to its full page
failures.forEach(f => console.log(` ${f.title}: ${f.canonical_url}`));| File | Entries | Description |
|---|---|---|
glossary.json |
112 | Term, category, definition, related terms |
papers.json |
97 | Title, arXiv ID, tags — read full summaries on the site |
ecosystem.json |
70 | Tool name, stars, language, layer, maturity, protocols, use cases |
usecases.json |
47 | Title, domain, description, challenges, related patterns |
failures.json |
35 | Title, category, severity, description, symptoms, causes, mitigations |
patterns.json |
34 | Title, category, complexity, problem, solution, when to use, trade-offs |
protocols.json |
9 | Title, description, maturity, spec URL |
This dataset provides structured metadata and summaries — enough to be useful for research, filtering, and integration. Full detailed analysis, editorial content, and interactive features live on reputagent.com:
-
Here: Failure mode title, severity, description, symptoms, causes, mitigations
-
On the site: Full markdown analysis, impact scoring visualizations, cross-referenced links, interactive search
-
Here: Tool name, stars, layer classification, one-liner description
-
On the site: AI-synthesized editorial summaries, trend charts, comparison views
-
Here: Paper title, arXiv ID, tags
-
On the site: Full AI-synthesized summaries with key takeaways and related patterns
| Category | Failures | Examples |
|---|---|---|
| Protocol | 12 critical–high | Prompt Injection Propagation, Agent Impersonation, Permission Escalation |
| Coordination | 8 medium–high | Coordination Deadlock, Sycophancy Amplification, Goal Drift |
| Cascading | 7 critical–high | Hallucination Propagation, Cascading Reliability Failures |
| Systemic | 4 medium–high | Accountability Diffusion, Agent Washing |
| Communication | 2 high | Infinite Handoff Loop, Inter-Agent Miscommunication |
| Reliability | 1 high | Infinite Loop / Stuck Agent |
| Category | Patterns | Examples |
|---|---|---|
| Orchestration | 11 | Supervisor Pattern, ReAct Pattern, Agentic RAG |
| Coordination | 11 | A2A Protocol Pattern, MCP Pattern, Handoff Pattern |
| Evaluation | 6 | LLM-as-Judge, Human-in-the-Loop, Red Teaming |
| Discovery | 5 | Capability Discovery, Agent Registry |
| Safety | 3 | Defense in Depth, Guardrails, Mutual Verification |
47 documented use cases spanning: financial trading, fraud investigation, clinical diagnosis, contract review, security operations, software development, research synthesis, supply chain management, customer support, and 38 more.
70 curated agent frameworks, SDKs, and evaluation tools including:
| Stars | Tool | Layer |
|---|---|---|
| 126k | LangChain | Tools |
| 73k | RAGFlow | Tools |
| 64k | MetaGPT | Tools |
| 54k | AutoGen | Tools |
| 22k | A2A | Protocols |
| 22k | Langfuse | Operations |
View all 70 tools with comparisons and trend data
Informed by published research and industry analysis:
- MAST Framework — Multi-Agent System Trust
- Microsoft AI Agent Failure Taxonomy
- OWASP ASI08 Cascading Failures
- Agent Hallucinations Survey
- 97 additional papers indexed in
papers.json— read synthesized summaries
We welcome new entries, corrections, and real-world examples from practitioners.
- Submit on the web: reputagent.com/contribute
- Submit via GitHub: Open a pull request following the schemas in
schemas/ - Report an issue: Contact us
See CONTRIBUTING.md for detailed guidelines.
If you use this dataset in research, please cite:
@dataset{reputagent_data_2026,
title = {ReputAgent Data: AI Agent Failure Modes, Evaluation Patterns, Use Cases, Glossary, Ecosystem, Protocols, and Research Index},
author = {ReputAgent},
year = {2026},
url = {https://reputagent.com},
repository = {https://github.com/ReputAgent/reputagent-data}
}See CITATION.cff for machine-readable citation metadata.
- RepKit SDK — Log agent evaluations, compute reputation scores, expose trust signals for downstream systems (GitHub)
- Agent Playground — Pre-production testing where agents build track record through structured multi-agent scenarios
- Research Papers — AI-synthesized summaries of the latest multi-agent systems research
- Blog — Essays on agent evaluation, trust, and reputation
- Consulting — Custom evaluation frameworks and RepKit integration
Data: CC-BY-4.0 — use freely with attribution. Code examples: MIT.