agents.txt

A proposed web standard that lets documentation sites serve a machine-readable index for AI agents — like robots.txt for crawlers and sitemap.xml for search engines, but designed for LLMs.

Reference implementation: https://agentnav.baekenough.com

The Problem

Vercel discovered something alarming when they studied AI agents using Playwright MCP to browse the web: a single button click returns 12,891 characters of accessibility tree data. Their solution (agent-browser) reduced token usage by 93% by creating a purpose-built AI interface instead of forcing AI through a human-designed one.

The same problem exists for documentation sites — and it is worse.

When an AI agent needs to find documentation, it navigates the way a human would: fetching pages, reading navigation menus, following links. But documentation sites are designed for humans, not AI agents. Every HTML page a bot fetches is padded with navigation chrome, sidebars, headers, and footers — most of which is irrelevant noise. For a 651-page documentation site, a naive crawl could consume millions of tokens. Even finding the right page to fetch requires multiple round-trips.

The existing web standards do not solve this:

Standard	Designed For	What It Gives AI Agents
`robots.txt`	Web crawlers	Crawl exclusion rules — not navigation help
`sitemap.xml`	Search engines	A flat URL list — no semantic structure
`llms.txt`	LLMs reading site summaries	Free-text description — not navigable
`agents.txt`	AI agents navigating docs	Structured index with types, sections, SDK patterns

The Solution

agents.txt is a proposed web standard that lets any documentation site publish a structured, machine-readable index of its content — served at a well-known URL, in the format AI agents actually need.

https://your-docs.com/.well-known/agents.txt

An AI agent fetches this single file (~3,200 tokens for a 651-page site) and can immediately answer:

"Which page documents prompt caching?" — without fetching a single content page
"Where is the Python SDK quickstart?" — from the index, not a crawl
"What API endpoints exist for batch processing?" — from the type annotations

The same idea that reduced Vercel's token usage by 93% — build the interface for AI, not for humans — applied at the web standard level.

How It Works

Site Owner                         AI Agent
──────────                         ────────
Create agents.txt                  GET /.well-known/agents.txt
Serve at /.well-known/agents.*     Parse sections, pages, SDK patterns
Annotate pages with 12 types       Answer navigation queries directly
Use SDK Pattern Compression        Fetch only the relevant page

A site that serves agents.txt gives every AI agent a navigation map on first contact — no crawling required.

The Analogy

Protocol	Who Reads It	What It Enables
`robots.txt`	Search engine crawlers	Know which pages to skip
`sitemap.xml`	Search engine indexers	Know which pages to prioritize
`agents.txt`	AI agents	Navigate to the right page without crawling

Format Options

agents.txt is available in four serialization variants. Sites should serve all four:

https://your-docs.com/.well-known/agents.txt    # Plain text  (~3,200 tokens for 651 pages)
https://your-docs.com/.well-known/agents.md     # Markdown    (~5,400 tokens)
https://your-docs.com/.well-known/agents.json   # JSON        (~9,600 tokens)
https://your-docs.com/.well-known/agents.xml    # XML         (~7,000 tokens)

Format	Token Cost	Best For
TXT	Lowest	Token-constrained agents, quick lookups
MD	Low	LLMs processing the index as natural language
JSON	Higher	Programmatic parsing, building navigation indexes
XML	Moderate	Enterprise toolchains, existing XML pipelines

For Site Owners: Adding agents.txt to Your Site

This is the primary goal of this project. If you maintain a documentation site, here is how to add agents.txt support in four steps.

Step 1: Create your documentation index

Choose the JSON format as your canonical source. At minimum, you need:

{
  "agents_txt_version": "0.2",
  "site": {
    "name": "Your Docs",
    "url": "https://docs.yoursite.com",
    "total_pages": 42,
    "last_updated": "2026-03-07"
  },
  "sections": [
    {
      "name": "Getting Started",
      "path_prefix": "/docs/getting-started",
      "page_count": 2,
      "pages": [
        { "path": "/docs/getting-started/quickstart", "title": "Quickstart", "type": "tutorial" },
        { "path": "/docs/getting-started/overview",   "title": "Overview",   "type": "overview" }
      ]
    },
    {
      "name": "API Reference",
      "path_prefix": "/docs/api",
      "page_count": 3,
      "pages": [
        { "path": "/docs/api/messages",        "title": "Messages",       "type": "api-hub"       },
        { "path": "/docs/api/messages/create", "title": "Create Message", "type": "api-endpoint", "method": "POST" },
        { "path": "/docs/api/rate-limits",     "title": "Rate Limits",    "type": "api-reference" }
      ]
    }
  ]
}

Step 2: Annotate each page with a content type

Every page must carry a type from the 12-type taxonomy (see Specification section below). The type tells AI agents what kind of content to expect — a tutorial reads differently from an API endpoint reference.

Step 3: Serve at the well-known URL

Add routing rules to serve your index at /.well-known/agents.*:

# nginx example
location /.well-known/agents.json { alias /path/to/agents.json; }
location /.well-known/agents.md   { alias /path/to/agents.md;   }
location /.well-known/agents.txt  { alias /path/to/agents.txt;  }
location /.well-known/agents.xml  { alias /path/to/agents.xml;  }

Set Content-Type headers appropriately and add Access-Control-Allow-Origin: * for cross-origin agent access.

Step 4: Verify your implementation

Use the NAV-AGENT verification agent in this repository to test that your agents.txt achieves Grade B (70%+) or higher before publishing.

If You Have Multiple SDKs: Use Pattern Compression

If your documentation has multiple SDKs with identical endpoint structures (Python, TypeScript, Java, etc.), do not enumerate every page. Use the sdk_pattern field instead:

"sdk_pattern": {
  "sdks": ["python", "typescript", "java", "go"],
  "pages_per_sdk": 30,
  "url_template": "/docs/api/{sdk}/{endpoint}",
  "endpoint_paths": ["overview", "client", "messages", "messages/create", "models", "models/list"]
}

For 4 SDKs × 30 pages = 120 pages, this single block replaces 120 explicit entries — roughly 90% token reduction for the SDK section.

Specification: agents.txt v0.2

Full specification: docs/spec/agents-txt-v0.2.md

Well-Known URL Convention (RFC 8615)

agents.txt follows RFC 8615, which defines the .well-known directory as the standard location for site-wide machine-readable metadata. This is the same convention used by robots.txt alternatives and security policy files.

Content Type Taxonomy (12 types)

Type	Description	Example URLs
`tutorial`	Getting started guides and quickstarts	`quickstart`, `get-started`
`reference`	Static reference information	`glossary`, `pricing`, `deprecations`
`guide`	Feature-specific how-to guides	`prompt-caching`, `vision`, `streaming`
`overview`	Section entry points and landing pages	`models/overview`, `features/overview`
`use-case`	Use-case-specific implementation guides	`content-moderation`, `ticket-routing`
`tool-reference`	Documentation for a specific tool	`bash-tool`, `web-search-tool`
`sdk-guide`	SDK-specific integration guides	`agent-sdk/python`, `agent-sdk/typescript`
`api-reference`	API infrastructure documentation	`rate-limits`, `errors`, `versioning`
`api-endpoint`	Documentation for a single API endpoint	`messages/create`, `models/list`
`api-hub`	API section hub aggregating endpoints	`messages`, `admin`, `beta`
`best-practices`	Best practice and guardrail guidance	`reduce-hallucinations`, `reduce-latency`
`changelog`	Change log and release notes	`release-notes/overview`

Classification rule: choose the most specific type. api-endpoint takes precedence over api-reference; tool-reference takes precedence over guide.

SDK Pattern Compression

Repeating SDK structures are expressed once using a template, not enumerated per-page. For the Claude platform documentation (10 SDKs × 45 endpoints = 450 pages), the sdk_pattern block replaces 450 explicit page entries with a single template — approximately 90% token reduction for the SDK section.

Changes from v0.1

Area	v0.1	v0.2
Content types	Informal, ad hoc	12-type controlled vocabulary
URL convention	None	Well-known URL (RFC 8615)
Multi-provider	Not addressed	Provider subdirectory structure
SDK compression	Not defined	`sdk_pattern` schema
Navigation hints	Not defined	Optional `navigation` field
Verification	Not defined	NAV-AGENT 5-metric framework

Reference Implementation

AgentNav at agentnav.baekenough.com demonstrates the standard with two real documentation sites:

Documentation Site	Pages	Sections	Prefix
Claude Code (Anthropic)	651	9	`/claude-code/`
GPT Codex (OpenAI)	68	14	`/gpt-codex/`

Each site's agents.txt is available in all four formats at the well-known paths:

# Claude Code documentation index — plain text (lowest token cost)
curl https://agentnav.baekenough.com/.well-known/agents.txt

# JSON for programmatic parsing
curl https://agentnav.baekenough.com/.well-known/agents.json

# Provider-scoped access
curl https://agentnav.baekenough.com/claude-code/.well-known/agents.md
curl https://agentnav.baekenough.com/gpt-codex/.well-known/agents.json

Self-Hosting

git clone <repo>
cd AgentNav
docker build -t agentnav .
docker run -p 8080:80 agentnav

The container is a static nginx:alpine server with no backend or database. The only runtime dependency is nginx.

Agent Specifications

NAVIGATOR.md — Consumer Agent

Any LLM can implement this spec to parse agents.txt and answer navigation queries without fetching content pages.

Workflow:

Discover — Try .well-known/agents.json, .md, .xml, .txt in order; accept the first HTTP 200
Parse — Extract sections, pages, SDK patterns, and navigation hints
Build map — Construct an internal representation of the site structure
Answer queries — Match intent against titles, paths, types; return the full URL

The spec is LLM-agnostic: it works with Claude, GPT, Gemini, or any agent with HTTP fetch capability.

NAV-AGENT.md — Verification Agent

Tests how effectively each format conveys documentation structure. Uses embedded ground truth for the Claude platform docs (651 pages) as the scoring baseline.

5 weighted metrics:

Metric	Weight	Question
Section Discovery	20%	How many top-level sections can be identified?
Page Coverage	25%	How many representative pages can be located?
Navigation Accuracy	30%	Can intent-based queries reach the correct page?
Pattern Recognition	15%	Are repeating SDK structures recognized and usable?
Content Classification	10%	Are page types correctly annotated?

Grading: A (90%+), B (70-89%), C (50-69%), F (<50%). A compliant implementation should score Grade B or higher.

Verification Results

NAV-AGENT (Claude)

All four format variants of the Claude Code documentation index were tested:

All formats: Grade A (100%)
Content Classification: 100% after v0.2 taxonomy normalization

Cross-LLM Validation (GPT via Codex CLI)

NAVIGATOR.md was validated against a different LLM to confirm the specification is portable across models.

Total queries: 100 across 10 categories
Overall score: 97.7/100 (Grade A)
Date: 2026-03-07

Category	Score
Structure Discovery	10/10
Direct Page Lookup	10/10
API Navigation	10/10
SDK Endpoint Lookup	10/10
Natural Language Navigation	10/10
Section Enumeration	9.7/10
Page Type Classification	10/10
Full URL Construction	10/10
Cross-Section Navigation	9/10
Edge Cases	9/10

Full results: tests/navigator-codex-report.md

Project Structure

AgentNav/
├── README.md                    # This file
├── README_ko.md                 # Korean version
├── Dockerfile                   # nginx:alpine container
├── nginx.conf                   # URL rewrite rules for .well-known
├── NAVIGATOR.md                 # Consumer agent spec (parse & navigate)
├── NAV-AGENT.md                 # Verification agent spec (test & grade)
├── .github/
│   └── workflows/               # CI/CD pipelines
├── dags/
│   └── agentnav_docs_drift.py   # Airflow DAG for drift detection
├── docs/
│   ├── spec/
│   │   └── agents-txt-v0.2.md  # Formal specification
│   └── plan/                   # Design notes and analysis
├── public/
│   ├── index.html               # Landing page
│   ├── claude-code/
│   │   ├── agents.json          # JSON format (~9,600 tokens)
│   │   ├── agents.md            # Markdown format (~5,400 tokens)
│   │   ├── agents.xml           # XML format (~7,000 tokens)
│   │   └── agents.txt           # Plain text format (~3,200 tokens)
│   └── gpt-codex/
│       ├── agents.json
│       ├── agents.md
│       ├── agents.xml
│       └── agents.txt
├── scripts/
│   └── generate_formats.py      # Generate MD/XML/TXT from JSON
└── tests/
    ├── navigator-codex-test.py  # Cross-LLM verification (100 queries)
    └── navigator-codex-report.md

Architecture

The reference implementation is intentionally minimal:

Runtime: nginx:alpine Docker container
Content: Static files only — no backend, no database
HTTPS: Cloudflare Tunnel
CORS: Access-Control-Allow-Origin: * for public read access
Routing: .well-known/agents.* rewrites to provider-scoped paths

Contributing

Adding a New Documentation Set

Create a provider directory under public/ using {vendor}-{product} naming:
```
public/your-product/
```
Create the four format files following the v0.2 specification
Run NAV-AGENT verification and confirm Grade B (70%+) or higher
Add your provider card to public/index.html

Spec Changes

Before proposing changes to the specification:

Use NAV-AGENT to verify any format changes maintain Grade B or higher
Test with NAVIGATOR.md to confirm a generic LLM agent can parse the result
Do not introduce new content types without updating the 12-type taxonomy in docs/spec/agents-txt-v0.2.md

License

MIT

Links

Resource	Location
Live reference implementation	https://agentnav.baekenough.com
Formal specification	`docs/spec/agents-txt-v0.2.md`
Consumer agent spec	`NAVIGATOR.md`
Verification agent spec	`NAV-AGENT.md`
Test results	`tests/navigator-codex-report.md`
Inspiration	Vercel agent-browser article

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.github/workflows		.github/workflows
docs		docs
guides		guides
public		public
scripts		scripts
tests		tests
.gitignore		.gitignore
.mcp.json		.mcp.json
.omcustom.lock.json		.omcustom.lock.json
.omcustomrc.json		.omcustomrc.json
0_skeleton.md		0_skeleton.md
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
NAV-AGENT.md		NAV-AGENT.md
NAVIGATOR.md		NAVIGATOR.md
README.md		README.md
README_ko.md		README_ko.md
nginx.conf		nginx.conf

Folders and files

Latest commit

History

Repository files navigation

agents.txt

The Problem

The Solution

How It Works

The Analogy

Format Options

For Site Owners: Adding agents.txt to Your Site

Step 1: Create your documentation index

Step 2: Annotate each page with a content type

Step 3: Serve at the well-known URL

Step 4: Verify your implementation

If You Have Multiple SDKs: Use Pattern Compression

Specification: agents.txt v0.2

Well-Known URL Convention (RFC 8615)

Content Type Taxonomy (12 types)

SDK Pattern Compression

Changes from v0.1

Reference Implementation

Self-Hosting

Agent Specifications

NAVIGATOR.md — Consumer Agent

NAV-AGENT.md — Verification Agent

Verification Results

NAV-AGENT (Claude)

Cross-LLM Validation (GPT via Codex CLI)

Project Structure

Architecture

Contributing

Adding a New Documentation Set

Spec Changes

License

Links

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages