Summary
When the scored URL is a subdirectory (e.g. https://www.swift.org/documentation/), afdocs checks for a sitemap at <base-url>/sitemap.xml — in this case https://www.swift.org/documentation/sitemap.xml. If that path returns 404, afdocs falls back to testing only the root URL and emits a single-page-sample diagnostic, even when a valid sitemap exists at https://www.swift.org/sitemap.xml.
Steps to reproduce
npx afdocs check https://www.swift.org/documentation/ --sampling deterministic --max-links 50 --format json --score
Expected: afdocs discovers and samples pages from the sitemap at https://www.swift.org/sitemap.xml.
Actual: discoverySources: ["fallback"], testedPages: 1, single-page-sample diagnostic fires.
Root cause
The discovery sequence:
- Checks robots.txt at
https://www.swift.org/robots.txt — found, but no Sitemap: directive
- Tries
https://www.swift.org/documentation/sitemap.xml — 404
- Falls back to testing only the root URL
Step 2 is path-scoped to the base URL. It never tries https://www.swift.org/sitemap.xml — the conventional root-domain location.
Expected behavior
When the base URL is a subdirectory and the path-relative sitemap returns 404, fall back to checking <scheme>://<host>/sitemap.xml (and <host>/sitemap-index.xml variants) before giving up on sitemap discovery.
Sitemaps are almost never placed under a subdirectory path — they're nearly always at the root. The scoped path check has low hit rate, while the root-domain fallback would recover a significant fraction of cases like this one.
Notes
https://www.swift.org/sitemap.xml returns HTTP 200 with 446 URLs, 47 of which are under /documentation/
https://www.swift.org/documentation/sitemap.xml returns HTTP 404
- The robots.txt (
User-agent: *, Disallow: /builds/) has no Sitemap: line
Summary
When the scored URL is a subdirectory (e.g.
https://www.swift.org/documentation/), afdocs checks for a sitemap at<base-url>/sitemap.xml— in this casehttps://www.swift.org/documentation/sitemap.xml. If that path returns 404, afdocs falls back to testing only the root URL and emits asingle-page-samplediagnostic, even when a valid sitemap exists athttps://www.swift.org/sitemap.xml.Steps to reproduce
Expected: afdocs discovers and samples pages from the sitemap at
https://www.swift.org/sitemap.xml.Actual:
discoverySources: ["fallback"],testedPages: 1,single-page-samplediagnostic fires.Root cause
The discovery sequence:
https://www.swift.org/robots.txt— found, but noSitemap:directivehttps://www.swift.org/documentation/sitemap.xml— 404Step 2 is path-scoped to the base URL. It never tries
https://www.swift.org/sitemap.xml— the conventional root-domain location.Expected behavior
When the base URL is a subdirectory and the path-relative sitemap returns 404, fall back to checking
<scheme>://<host>/sitemap.xml(and<host>/sitemap-index.xmlvariants) before giving up on sitemap discovery.Sitemaps are almost never placed under a subdirectory path — they're nearly always at the root. The scoped path check has low hit rate, while the root-domain fallback would recover a significant fraction of cases like this one.
Notes
https://www.swift.org/sitemap.xmlreturns HTTP 200 with 446 URLs, 47 of which are under/documentation/https://www.swift.org/documentation/sitemap.xmlreturns HTTP 404User-agent: *,Disallow: /builds/) has noSitemap:line