Summary
When the single-page-sample diagnostic fires, the overall score can still land in the B range despite no meaningful page-level signal. The diagnostic is severity: warning and only marks page-level checks as notApplicable (excluding them from numerator and denominator). This leaves the overall score driven by a tiny subset of checks, none of which reflect site-wide agent-friendliness.
Concrete case
Scoring https://docs.readthedocs.com/platform/stable/ produced:
- Overall: 81 (B)
- Diagnostics:
single-page-sample (warning)
- Cap: none
The 81 was computed from only 3 checks out of 19:
| Check |
Earned/Max |
| llms-txt-exists |
10/10 |
| llms-txt-valid |
0/4 |
| llms-txt-size |
7/7 |
| Total |
17/21 = 81% |
Every other check (markdown availability, page size, content structure, URL stability, observability, auth) was excluded as notApplicable. The 81 effectively scores the structural validity of the llms.txt file itself, not the documentation site.
Why this site falls through the existing safety nets
The site has a real llms.txt at /llms.txt that's the right size but uses non-spec-compliant link syntax (- Name: url instead of - [Name](url): description). So llms-txt-exists and llms-txt-size pass, leaving only the low-weight llms-txt-valid (weight 4) to fail. The existing llms-txt-exists critical-check cap doesn't fire because the file exists. llms-txt-valid isn't a critical check, so its failure doesn't trigger a cap. single-page-sample is severity: warning, so it doesn't cap either. Net result: a site where afdocs can reach exactly 1 page lands at 81 (B).
By contrast, sites that fire single-page-sample and lack any llms.txt at all are rescued — llms-txt-exists failing either zeroes the numerator entirely or triggers its critical cap. Sites that have a structurally invalid llms.txt aren't covered.
Proposed fix
Have single-page-sample apply a score cap when it fires. A few options:
- Cap at F threshold (~59): matches the existing
llms-txt-exists cap pattern. Conveys "this score is unreliable" via the letter grade.
- Cap at a lower value (e.g., 40): stronger signal that the result shouldn't be trusted.
- Elevate severity to
critical: requires adding cap-via-diagnostic plumbing if not already present.
Option 1 feels most consistent with how the codebase already handles other "we can't actually evaluate this site" failures.
Summary
When the
single-page-samplediagnostic fires, the overall score can still land in the B range despite no meaningful page-level signal. The diagnostic isseverity: warningand only marks page-level checks asnotApplicable(excluding them from numerator and denominator). This leaves the overall score driven by a tiny subset of checks, none of which reflect site-wide agent-friendliness.Concrete case
Scoring
https://docs.readthedocs.com/platform/stable/produced:single-page-sample(warning)The 81 was computed from only 3 checks out of 19:
Every other check (markdown availability, page size, content structure, URL stability, observability, auth) was excluded as
notApplicable. The 81 effectively scores the structural validity of the llms.txt file itself, not the documentation site.Why this site falls through the existing safety nets
The site has a real llms.txt at
/llms.txtthat's the right size but uses non-spec-compliant link syntax (- Name: urlinstead of- [Name](url): description). Sollms-txt-existsandllms-txt-sizepass, leaving only the low-weightllms-txt-valid(weight 4) to fail. The existingllms-txt-existscritical-check cap doesn't fire because the file exists.llms-txt-validisn't a critical check, so its failure doesn't trigger a cap.single-page-sampleisseverity: warning, so it doesn't cap either. Net result: a site where afdocs can reach exactly 1 page lands at 81 (B).By contrast, sites that fire
single-page-sampleand lack any llms.txt at all are rescued —llms-txt-existsfailing either zeroes the numerator entirely or triggers its critical cap. Sites that have a structurally invalid llms.txt aren't covered.Proposed fix
Have
single-page-sampleapply a score cap when it fires. A few options:llms-txt-existscap pattern. Conveys "this score is unreliable" via the letter grade.critical: requires adding cap-via-diagnostic plumbing if not already present.Option 1 feels most consistent with how the codebase already handles other "we can't actually evaluate this site" failures.