single-page-sample diagnostic should apply a score cap

## Summary

When the `single-page-sample` diagnostic fires, the overall score can still land in the B range despite no meaningful page-level signal. The diagnostic is `severity: warning` and only marks page-level checks as `notApplicable` (excluding them from numerator and denominator). This leaves the overall score driven by a tiny subset of checks, none of which reflect site-wide agent-friendliness.

## Concrete case

Scoring `https://docs.readthedocs.com/platform/stable/` produced:

- Overall: **81 (B)**
- Diagnostics: `single-page-sample` (warning)
- Cap: none

The 81 was computed from only 3 checks out of 19:

| Check | Earned/Max |
|-------|------------|
| llms-txt-exists | 10/10 |
| llms-txt-valid | 0/4 |
| llms-txt-size | 7/7 |
| **Total** | **17/21 = 81%** |

Every other check (markdown availability, page size, content structure, URL stability, observability, auth) was excluded as `notApplicable`. The 81 effectively scores the structural validity of the llms.txt file itself, not the documentation site.

## Why this site falls through the existing safety nets

The site has a real llms.txt at `/llms.txt` that's the right size but uses non-spec-compliant link syntax (`- Name: url` instead of `- [Name](url): description`). So `llms-txt-exists` and `llms-txt-size` pass, leaving only the low-weight `llms-txt-valid` (weight 4) to fail. The existing `llms-txt-exists` critical-check cap doesn't fire because the file exists. `llms-txt-valid` isn't a critical check, so its failure doesn't trigger a cap. `single-page-sample` is `severity: warning`, so it doesn't cap either. Net result: a site where afdocs can reach exactly 1 page lands at 81 (B).

By contrast, sites that fire `single-page-sample` and lack any llms.txt at all are rescued — `llms-txt-exists` failing either zeroes the numerator entirely or triggers its critical cap. Sites that have a structurally invalid llms.txt aren't covered.

## Proposed fix

Have `single-page-sample` apply a score cap when it fires. A few options:

1. **Cap at F threshold (~59)**: matches the existing `llms-txt-exists` cap pattern. Conveys "this score is unreliable" via the letter grade.
2. **Cap at a lower value (e.g., 40)**: stronger signal that the result shouldn't be trusted.
3. **Elevate severity to `critical`**: requires adding cap-via-diagnostic plumbing if not already present.

Option 1 feels most consistent with how the codebase already handles other "we can't actually evaluate this site" failures.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

single-page-sample diagnostic should apply a score cap #73

Summary

Concrete case

Why this site falls through the existing safety nets

Proposed fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Check	Earned/Max
llms-txt-exists	10/10
llms-txt-valid	0/4
llms-txt-size	7/7
Total	17/21 = 81%

Uh oh!

single-page-sample diagnostic should apply a score cap #73

Description

Summary

Concrete case

Why this site falls through the existing safety nets

Proposed fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions