Skip to content

chore: SEO and metadata improvements for cli.internetcomputer.org #507

@marc0olo

Description

@marc0olo

Background

Comparison of cli.internetcomputer.org against skills.internetcomputer.org (which recently received SEO improvements, tracked in dfinity/developer-docs#104) revealed several gaps. This issue tracks everything needed to close them.

Versioned deployment: what this means for SEO

The docs site uses a versioned folder structure on the IC asset canister (/0.1/, /0.2/, /main/). Root-level files (index.html, matomo.js, versions.json) are regenerated by CI on every main push via the publish-root-files job, which already reads LATEST_VERSION from versions.json.

This shapes the implementation strategy:

  • Root-level changes (dynamically generated by publish-root-files) — robots.txt, root sitemap, OG image file. These never require rebuilding old version folders.
  • Build-time changes (in astro.config.mjs) — meta tags, JSON-LD, RSS link. These apply to all future version builds automatically. The current 0.2/ folder needs a one-time rebuild (push to docs/v0.2 branch) to pick them up, after which no old-version rebuilds are ever needed again.
  • Old versions should be blocked from indexing via robots.txt (/0.1/, /main/), so their missing in-HTML improvements are SEO-irrelevant anyway.

Implement now

1. robots.txt (missing entirely)

No /robots.txt exists at the root. Add dynamic generation to the publish-root-files CI job (.github/workflows/docs.yml) alongside the existing index.html generation:

User-agent: *
Allow: /<latest-version>/
Disallow: /main/
Disallow: /0.1/        # (list all older versions except latest)

Sitemap: https://cli.internetcomputer.org/<latest-version>/sitemap-index.xml

# LLM crawlers
User-agent: GPTBot
Allow: /<latest-version>/

User-agent: ClaudeBot
Allow: /<latest-version>/

LATEST_VERSION is already computed in the CI step — reuse it. Disallow lines for old versions should be generated from versions.json. /main/ is always disallowed (development branch, not authoritative).

2. <meta name="robots" content="index, follow, max-image-preview:large">

Starlight doesn't add this. max-image-preview:large tells Google it can show large image previews in search results — a genuine improvement. index, follow is the browser default but makes intent explicit.

Add globally via Starlight head config in docs-site/astro.config.mjs:

{ tag: 'meta', attrs: { name: 'robots', content: 'index, follow, max-image-preview:large' } },

3. <meta name="author" content="DFINITY Foundation">

Standard HTML meta tag. Add globally via Starlight head config. Semantically correct for a DFINITY-owned docs site.

4. JSON-LD structured data

Starlight injects no structured data. Add WebSite + Organization schemas to the site (via Starlight head config or a custom Head component). Organization with DFINITY Foundation as publisher covers the publisher signal without needing a non-standard <meta name="publisher"> tag.

Example for the home page:

{
  "@context": "https://schema.org",
  "@type": "WebSite",
  "name": "ICP CLI",
  "url": "https://cli.internetcomputer.org",
  "description": "Command-line tool for developing and deploying applications on the Internet Computer Protocol (ICP)",
  "inLanguage": "en-US",
  "publisher": {
    "@type": "Organization",
    "name": "DFINITY Foundation",
    "url": "https://dfinity.org"
  }
}

5. og:image / twitter:image

Starlight sets twitter:card: summary_large_image but without an actual image — this is actively misleading and renders as a blank preview when shared on LinkedIn, Slack, and Twitter/X.

  • Create docs-site/public/og-image.svg (a simple branded SVG — match the style of skills.internetcomputer.org/og-image.svg)
  • Copy og-image.svg to the root folder in publish-root-files (so it's accessible at /og-image.svg, version-independent)
  • Reference with an absolute URL in Starlight head config:
    { tag: 'meta', attrs: { property: 'og:image', content: 'https://cli.internetcomputer.org/og-image.svg' } },
    { tag: 'meta', attrs: { name: 'twitter:image', content: 'https://cli.internetcomputer.org/og-image.svg' } },
  • Note: Twitter/X does not support SVG for OG images. Before production launch, convert to PNG (statically or via a build-time step using @resvg/resvg-js or similar).

6. og:locale and og:type fixes

Starlight currently outputs og:locale: en (should be en_US) and og:type: article on the home page (should be website for index/landing pages — article is correct for content pages).

These are Starlight defaults. Override the home page (src/content/docs/index.mdx or equivalent) with custom frontmatter or a custom Head component to set og:type: website on the landing page only.

For og:locale, override globally via Starlight head config if Starlight exposes this, or via a custom component.

7. llms-full.txt

A bulk concatenation of all pages as defined by the llmstxt.org spec — distinct from the existing llms.txt:

  • llms.txt + individual .md endpoints → interactive AI agents (selective fetching)
  • llms-full.txt → scrapers, RAG pipelines, and fine-tuning datasets (bulk ingestion)

Add to docs-site/plugins/astro-agent-docs.mjs alongside the existing llms.txt generation. Also expose at root via publish-root-files (same pattern as llms.txt).

8. RSS feed

Useful for developers tracking documentation changes. Add feed.xml generation to plugins/astro-agent-docs.mjs (or a dedicated plugin). Reference from Starlight head config:

{ tag: 'link', attrs: { rel: 'alternate', type: 'application/rss+xml', href: '/feed.xml', title: 'ICP CLI documentation updates' } },

Copy to root in publish-root-files (same pattern as llms.txt).

9. Sitemap lastmod timestamps

Starlight generates sitemaps without lastmod. Add accurate timestamps using git commit history at build time — not the build date, which would mark every page as changed on every build and actively harm crawl budget.

Explicitly configure @astrojs/sitemap in astro.config.mjs with a serialize callback that runs git log -1 --format=%cI -- <source-file> per page.

10. Root-level sitemap

Currently there is no /sitemap.xml at the root domain — each version has its own at /<version>/sitemap-index.xml. Add a root-level sitemap.xml to publish-root-files that references only the latest version's sitemap:

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://cli.internetcomputer.org/<latest-version>/sitemap-index.xml</loc>
  </sitemap>
</sitemapindex>

Reference this from robots.txt (Sitemap: https://cli.internetcomputer.org/sitemap.xml).

One-time rebuild of current version (0.2)

After the build-time changes above land on main, push to the docs/v0.2 branch to trigger a rebuild of the 0.2/ folder. This is the only old-version rebuild ever needed — all future versions pick up the improvements automatically.

Skipped (not actual improvements)

  • Font preconnect — fonts are served via @fontsource npm packages (local), no external CDN to preconnect to
  • Standalone <meta name="publisher"> — not a standard HTML meta tag; publisher signal is covered by JSON-LD Organization
  • index, follow only — browser default; only meaningful as part of max-image-preview:large (covered in item 2)
  • Making the root / indexable — the root meta name="robots" content="noindex" on the redirect page is intentional and correct; the redirect target (/0.2/) is what should be indexed, not the redirect shell itself

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions