Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
228 changes: 228 additions & 0 deletions .agents/skills/agent-readiness-audit/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,228 @@
---
name: agent-readiness-audit
description: >
Audit a documentation site for agent-friendliness: discovery, markdown
delivery, crawlability, semantic structure, machine-readable surfaces,
and content legibility. Use when asked to assess docs.docker.com or any
docs site for AI/agent readiness, produce a scored report, compare with
external scanners, or generate a remediation list. Triggers on:
"audit docs for agent readiness", "how agent-friendly is docs.docker.com",
"score our docs for AI agents", "review llms.txt / markdown / crawlability",
"create an agent-readiness remediation plan".
argument-hint: "<base-url>"
---

# Agent Readiness Audit

Audit the live site, not the source tree alone. Prefer the same fetch path
an external agent would use in the wild: direct HTTP requests, sitemap
sampling, and page-level inspection.

Do not reduce the result to a homepage-only scan or a binary checklist.

## 1. Set scope

Use `$ARGUMENTS` as the base URL when provided. Otherwise infer the base
URL from context and state the assumption.

Decide whether the host being audited is:

- a docs-only host
- an app/tool host
- a mixed host

This matters for optional checks such as MCP, plugin manifests, or other
tool discovery files. Do not penalize a docs-only host for missing
tooling manifests that belong on a separate service.

For `docs.docker.com`, treat the public docs host as docs-only. Docker's
MCP server is published separately, so missing MCP files on the docs host
should be reported as `N/A`, not as a failure.

## 2. Gather sitewide signals

Always check these resources first:

- `/llms.txt`
- `/llms-full.txt`
- `/robots.txt`
- `/sitemap.xml`

Only check host-level tool manifests when the host is an app/tool host,
mixed host, or explicitly advertises them:

- `/.well-known/ai-plugin.json`
- `/.well-known/agent.json`
- `/.well-known/agents.json`

Use the bundled script for a baseline:

```bash
bash .agents/skills/agent-readiness-audit/scripts/baseline-probes.sh \
"$ARGUMENTS"
```

The script produces baseline evidence only. You still need to interpret
what matters for a docs property and score it with the rubric.

For docs-only hosts, you may skip tool-manifest probes to reduce noise:

```bash
CHECK_TOOL_MANIFESTS=0 \
bash .agents/skills/agent-readiness-audit/scripts/baseline-probes.sh \
"$ARGUMENTS"
```

## 3. Sample representative pages

Use the sitemap when available. Do not rely on the homepage alone.

If `llms.txt` exists, sample some URLs from it as well. This helps catch
stale or misleading discovery surfaces that a sitemap-only sample would miss.

Sample at least 12 pages when the site is large enough, and cover multiple
page types:

- homepage or docs landing page
- section landing pages
- task guides
- product manuals
- reference or API pages
- tutorial or learning pages

If the sitemap is missing or unusable, discover pages through internal
links and note the lower confidence.

If the site has distinct delivery patterns, sample each one. For example:

- normal content pages
- generated reference pages
- versioned docs
- localized docs

## 4. Run fetch-path checks on each sample

For each sampled page, verify:

- HTML fetch status, content type, and final URL
- `Accept: text/markdown` behavior
- direct markdown route behavior such as `<page>.md` or another stable path
- page-level markdown alternate links and whether they actually resolve
- whether page actions such as "Open Markdown" agree with the working route
- whether the HTML title or H1 matches the markdown H1 closely enough for
retrieval parity
- whether main content is present in the initial HTML
- redirect chain length and canonical URL consistency
- obvious chrome/noise in the markdown response

Do not assume a `.md` mirror exists just because another site uses one.
Verify the actual markdown path the site exposes.

Treat these as separate signals:

- negotiated markdown works
- a stable direct markdown URL works
- the page advertises the correct markdown URL

If the page advertises dead markdown alternates but a working markdown route
exists, do not fail markdown delivery outright. Score it as a discoverability
and consistency problem instead.

For API or generated reference pages, also verify whether a machine-readable
asset such as OpenAPI YAML is directly linked and fetchable.

## 5. Judge structure and legibility

Measure structural signals:

- exactly one `h1`
- sane heading hierarchy
- `main` and `article` presence where appropriate
- canonical tags
- JSON-LD or breadcrumb structured data
- stable anchors and deep-linkable headings

Also make a qualitative judgment about agent legibility:

- markdown strips site chrome cleanly
- headings are specific and task-oriented
- code blocks stay intelligible without client-side JS
- the page is not dominated by banners, injected chat, or nav noise

Measure code block labeling explicitly when code samples are common. A page
type with many untagged fenced blocks should lose points even if the prose is
otherwise clean.

For page types that intentionally render interactive UIs with JavaScript,
judge them separately from normal docs pages. If the HTML shell is thin,
check whether the page still provides:

- a fetchable markdown summary
- a directly linked machine-readable asset
- a usable non-JS fallback

## 6. Score with the rubric

Use [references/rubric.md](references/rubric.md).

Rules:

- score only what you verified
- mark non-applicable checks as `N/A`
- normalize the final score against applicable points only
- do not let optional manifest checks dominate the grade

Apply the foundational caps from the rubric. A site with broken discovery
or broken markdown delivery should not earn a high grade because it has
clean metadata.

Do not average away a weak page type. If one major page type, such as API
reference, is materially worse than the rest of the corpus, call it out as
the weakest segment and reflect it in the category notes.

## 7. Compare with external scanners when useful

If external scanner results are available, compare them to your live
findings. Treat them as secondary evidence.

If a scanner and the live fetch disagree:

- trust the live fetch
- report the mismatch explicitly
- explain whether the scanner is testing a different assumption

## 8. Produce a remediation list

Turn findings into a short backlog:

- `P0`: fetchability or discovery blockers
- `P1`: recurring structural or parity issues
- `P2`: polish, optional manifests, or low-impact enhancements

For each remediation, include:

- the failing signal
- why it matters to agents
- a concrete fix
- whether it is sitewide or page-type-specific

## 9. Report in a stable format

Use [references/report-template.md](references/report-template.md).

Always include:

- overall score and grade
- confidence level
- sampled URLs or sample strategy
- category scores
- highest-priority findings
- remediation backlog

## Notes

- Favor docs-delivery checks over marketing-site heuristics.
- Do not fail a docs host for lacking MCP or plugin manifests unless the
host itself is meant to expose tools.
- Treat raw byte size as supporting evidence, not as a primary scoring input.
- Prefer short evidence excerpts and commands over long copied page text.
63 changes: 63 additions & 0 deletions .agents/skills/agent-readiness-audit/references/report-template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Agent Readiness Report Template

Use this structure for final audit output.

```markdown
## Agent Readiness Audit

**Site:** <base-url>
**Date:** <YYYY-MM-DD>
**Overall score:** <score>/100
**Grade:** <A-F>
**Confidence:** <High|Medium|Low>

### Summary

<2-4 sentence verdict focused on what an external agent can actually
discover, fetch, and interpret on this site.>

### Category Scores

| Category | Score | Notes |
| --- | ---: | --- |
| Discovery and policy | <x>/<y> | <short note> |
| Retrieval and markdown delivery | <x>/<y> | <short note> |
| Structure and semantics | <x>/<y> | <short note> |
| Crawlability and delivery behavior | <x>/<y> | <short note> |
| Machine-readable surfaces | <x>/<y> | <short note or N/A> |
| Content legibility | <x>/<y> | <short note> |

### Sample

- Sample strategy: <sitemap / internal links / explicit URLs>
- Sampled pages: <count>
- Page types covered: <landing, guide, manual, reference, ...>
- Weakest page type: <if any>

### Findings

- `P0`: <highest-priority blocker with evidence>
- `P1`: <important recurring issue with evidence>
- `P2`: <lower-priority or optional improvement>

### Remediation

- `P0`: <fix>, because <why it matters to agents>
- `P1`: <fix>, because <why it matters to agents>
- `P2`: <fix>, because <why it matters to agents>

### Evidence

- Sitewide checks: <llms.txt, robots.txt, sitemap.xml, manifests>
- Fetch-path checks: <markdown negotiation, direct markdown routes,
advertised alternates, parity>
- Structural checks: <h1/main/article/canonical/json-ld/title-h1 parity>
- Code block checks: <fence count, language-tag coverage>
- Scanner comparison: <optional>
```

## Notes

- Keep the summary short and outcome-oriented.
- Findings should refer to concrete URLs or page types.
- If a criterion is `N/A`, say why instead of leaving it blank.
Loading