Skip to content

docs(seo): tier 2 — canonicalise URLs to no-extension form#261

Closed
Komsomol wants to merge 1 commit intodocs/seo-meta-onlyfrom
docs/seo-canonical-urls
Closed

docs(seo): tier 2 — canonicalise URLs to no-extension form#261
Komsomol wants to merge 1 commit intodocs/seo-meta-onlyfrom
docs/seo-canonical-urls

Conversation

@Komsomol
Copy link
Copy Markdown
Contributor

@Komsomol Komsomol commented May 5, 2026

Summary

Tier 2 of 3-part Ahrefs cleanup. Stops emitting .html URLs in our own surfaces (sitemap, layout nav, markdown link targets). Cloudflare's existing 308 from /Page.html/Page stays in place for inbound legacy links — we just stop generating new ones.

Layered on top of Tier 1 (#260). Merge order matters.

Changes

  • _layouts/default.html: strip .html from 122 internal hrefs (sidebar nav, hero links, footer)
  • sitemap.xml: new Liquid template overrides jekyll-sitemap output to emit clean URLs and skip pages with sitemap: false. Plugin defers when an explicit file exists.
  • robots.txt: add Disallow: /Data-flow alongside the existing /Data-flow.html rule. Necessary because internal links no longer surface the .html form, so the existing rule alone would no longer block crawlers from reaching the page.
  • 109 markdown files: strip .html from raw <a href> embeds and .md from [text](path.md) link targets. Kramdown auto-converts .md.html in the built output, which we no longer want. Code blocks preserved (used a Python script with proper fence-tracking, not a blind sed).

Coverage

Fixes ~1,000 of the remaining ~1,290 Ahrefs issues after Tier 1:

  • 3XX redirect in sitemap: 193 ✓
  • Page has links to redirect: 195 ✓
  • Indexable page not in sitemap: 195 ✓
  • Redirected page has no incoming internal links: many ✓
  • Only one dofollow inlink: 8 ✓
  • 4XX page in sitemap (mcp-worker/README.html): 1 ✓ (skipped by sitemap: false on stubs / not present)

Health score projected: 65 → ~85.

SEO risk

Medium. This is a layout-wide URL canonicalization. Expect 2–4 weeks of Google ranking volatility while it re-crawls and consolidates signals to the canonical URLs.

Mitigations:

  • Cloudflare's existing 308 from .html → no-extension stays in place — all inbound legacy .html links keep working
  • The canonical tag emitted in Tier 1 already tells Google which URL to consolidate to
  • Live data shows Google was already preferring the no-extension URL (it's the 308 target); sitemap was the contradicting signal we're now fixing

What's NOT touched

  • Cloudflare Page Rules (no access; not needed)
  • permalink: pretty or build output structure (would be a bigger change; not needed)
  • Page front-matter on real pages (only the 5 stub pages got minimal front-matter in Tier 1)
  • _headers, .well-known/, bin/copy-md-siblings.mjs (unrelated v3/MCP work)

Build verification

Verified locally:

  • 0 .html hrefs left anywhere in _site/
  • Sitemap entry count: 190 (excluding 5 stubs); all clean URLs; homepage / included
  • Code blocks preserved: Rate-limiting-for-API.md JSON example and Data-integration-service.md YAML comment both retain their .html URLs
  • 128–129 internal links per page, all clean

Test plan

  • Tier 1 deployed and stable for 1 week before merging this
  • Merge → wait for CF Pages deploy
  • curl -s https://developers.fliplet.com/sitemap.xml | grep -c '\.html' — should be 0
  • Spot-check 5 random pages for clean internal links
  • Resubmit sitemap in Google Search Console
  • Watch GSC coverage for ranking changes — expect volatility, that's normal
  • Re-run Ahrefs site audit after 1 week

🤖 Generated with Claude Code

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented May 5, 2026

Deploying fliplet-cli with  Cloudflare Pages  Cloudflare Pages

Latest commit: c86d08f
Status: ✅  Deploy successful!
Preview URL: https://b5656018.fliplet-cli.pages.dev
Branch Preview URL: https://docs-seo-canonical-urls.fliplet-cli.pages.dev

View logs

@Komsomol Komsomol force-pushed the docs/seo-meta-only branch from 78bd6f5 to 7f7daf0 Compare May 5, 2026 19:25
@Komsomol Komsomol force-pushed the docs/seo-canonical-urls branch from b72b041 to 13f70f3 Compare May 5, 2026 19:25
Sitemap, layout nav, and markdown link text all now point at the clean
no-extension URL. Cloudflare's existing 308 from /Page.html → /Page
stays in place for any external/inbound .html links — we just stop
emitting them ourselves.

- _layouts/default.html: strip .html from 122 internal hrefs
- sitemap.xml: new Liquid template overrides jekyll-sitemap output to
  emit clean URLs and skip pages with sitemap:false
- robots.txt: add Disallow: /Data-flow alongside existing /Data-flow.html
  rule (since internal links no longer surface the .html form)
- 109 markdown files: strip .html from <a href> embeds and .md from
  [text](path.md) link targets — kramdown auto-converts .md to .html
  in built output, which we no longer want. Code blocks preserved.

Tier 2 of 3 — URL canonicalisation. Layered on top of Tier 1's SEO meta.
Fixes ~1,000 Ahrefs issues (3XX-redirect-in-sitemap x193, page-has-links-
to-redirect x195, indexable-not-in-sitemap x195, redirected-page-no-
inlinks, only-one-dofollow-inlink, etc.).

Expect 2-4 weeks of Google ranking volatility while it re-crawls and
consolidates signals to canonical URLs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Komsomol
Copy link
Copy Markdown
Contributor Author

Komsomol commented May 6, 2026

Folded into #263 — single consolidated PR for the full SEO + standards cleanup. Branch remains in the repo for reference; CF Pages preview URL still resolves.

@Komsomol Komsomol closed this May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant