Citation Intelligence MCP

A free, self-hosted MCP server that tells your agent what LLMs cite - across Perplexity, Google AI Overviews, ChatGPT, Claude, Gemini, and Bing.

What this is

An MCP server for agents and developers who need to know which URLs get cited by AI search engines for any query. Install once, query from any MCP-compatible client (Claude Desktop, Cursor, Claude Code, Continue, Cline, n8n, LangGraph). Self-hosted, no account, no centralized backend. Bring your own API keys; nothing is stored on a remote server.

Who this is for

Install this if you're:

Building an agent that does research and want it to cite sources LLMs already trust
A solo dev or indie hacker checking whether your SaaS is showing up in AI search
A content creator confirming your articles are being cited by ChatGPT, Claude, or Perplexity
An SEO or GEO practitioner who wants programmatic citation data without a $295-$499/mo dashboard
Running an editorial pipeline and want citation-deficit-driven topic selection
Comparing competitor visibility across AI engines for any niche

Do NOT install this if you want:

A polished marketing dashboard with charts and team seats - try Profound, AthenaHQ, or Otterly.AI
A hosted service with SLAs - this is self-hosted by design
Citation tracking for academic papers - try citecheck
350M+ pre-modeled prompts - that's Ahrefs Brand Radar

Why this exists

The AI citation tracking market is dominated by VC-funded dashboards starting at $295/mo. None ships MCP-first. If you're an agent or developer who wants citation data piped directly into your workflow - not into a SaaS login - there isn't a tool for you. This is that tool.

Tools

Tool	Purpose
`check_citations`	URLs cited by Perplexity / Claude / ChatGPT / Gemini / Bing / Brave / Google AI Mode for a query
`am_i_cited`	Presence + rank for a domain across a query cluster
`ai_overview`	Google AI Overview presence + cited sources
`cited_for`	Queries the domain has been cited for, from local cache
`predict_citation`	Citation likelihood from public signals - no LLM fired
`track_queries`	Save / load / list named query panels (editorial watchlists)
`run_panel`	Run a panel through `am_i_cited` and snapshot to disk
`citation_trend`	Time-series report of citation rate + per-query gained/lost deltas
`compare_domains`	Side-by-side `predict_citation` across 2-10 URLs
`wikipedia_mentions`	List Wikipedia articles referencing a domain (zero keys)
`audit_sitemap`	Bulk `predict_citation` across every URL in a sitemap, worst-first
`gsc_citation_gap`	Join Google Search Console performance with AI citation status
`compete_for_query`	End-to-end competitive snapshot: your URL vs top cited competitors
`citation_freshness_score`	Recency score (halflife=365d) for the pages an engine cites
`cited_for_diff`	Diff of `cited_for` between two time windows for a domain
`schema_audit`	Deep schema.org validation - required fields per `@type`, malformed JSON-LD
`llms_txt_generator`	Generate an `llms.txt` (https://llmstxt.org) from a sitemap
`answer_box_position`	Bin each citation's first mention in `raw_answer` into early/middle/late thirds
`citation_provenance`	Fan a query across engines, report per-URL cross-engine consensus
`citation_evidence`	Extract the cited snippet from `raw_answer` for each citation (why, not just that)
`crawler_access_audit`	Verify GPTBot / ClaudeBot / PerplexityBot / CCBot / Google-Extended etc. can fetch a URL
`sitemap_citation_map`	Cross-reference sitemap URLs with cached citations (inverse of audit_sitemap)
`canonical_competitor_set`	Top cited domains per query, aggregated across engines

Prompts

Server-side prompt templates the client can offer end users (call via the MCP prompt list):

audit_citation_readiness(url) - chains predict_citation + schema_audit
competitor_snapshot(query, your_url?) - chains canonical_competitor_set + compete_for_query
ai_crawler_checkup(url) - runs crawler_access_audit and writes a remediation list
citation_gap_analysis(domain, days?) - drives gsc_citation_gap and suggests next moves
sitemap_coverage_review(sitemap_url) - runs sitemap_citation_map and recommends priorities

Resources

Cache views the client can read or subscribe to (no tool call required):

citation://cache/summary - entry counts by type/engine, unique queries/URLs, oldest/newest
citation://panels - saved panels + per-panel snapshot counts
citation://docs/llms-txt - llms.txt primer (markdown)
citation://docs/ai-crawlers - AI crawlers cheatsheet (markdown)
citation://domain/{domain}/cited-for - dynamic template: citations for {domain}

Quick start

npx -y @automatelab/citation-intelligence

Requires Node 20 or later.

Claude Desktop

Add to %APPDATA%\Claude\claude_desktop_config.json (Windows) or ~/Library/Application Support/Claude/claude_desktop_config.json (macOS):

{
  "mcpServers": {
    "citation-intelligence": {
      "command": "npx",
      "args": ["-y", "@automatelab/citation-intelligence"],
      "env": {
        "PERPLEXITY_API_KEY": "pplx-...",
        "SERPAPI_KEY": "...",
        "ANTHROPIC_API_KEY": "sk-ant-...",
        "OPENAI_API_KEY": "sk-...",
        "GEMINI_API_KEY": "..."
      }
    }
  }
}

Set only the keys you have. Any MCP client that supports stdio transport works - same command / args pattern.

How it stays free

No central backend. The server runs on your machine. Nothing is uploaded.
Free tier first. SerpAPI gives 100 free Google AI Overview lookups/month. Bing Web Search has a free tier. Perplexity offers free Sonar access on signup.
Bring your own paid keys if you want the premium engines (Claude, ChatGPT, Gemini). Keys pass through to the vendor and never touch any third party.
Local cache at ~/.config/citation-intelligence/cache.json. Repeated queries hit cache, not API. Default TTL: 7 days.
predict_citation runs with zero keys - it scores citation likelihood from public signals (Wikipedia, schema.org, llms.txt, GitHub) without firing any LLM.

Privacy

All API calls go from your machine directly to the vendor (Anthropic, OpenAI, Google, Perplexity, Bing, SerpAPI).
No proxy. No analytics. No telemetry by default.
API keys are read from environment variables on the MCP process - never logged, never persisted.
Cache file lives at ~/.config/citation-intelligence/cache.json. Delete it any time.

Environment variables

Var	Purpose	Free tier?
`PERPLEXITY_API_KEY`	check_citations (Perplexity)	Yes
`SERPAPI_KEY`	ai_overview	100/month free
`BING_API_KEY`	check_citations (Bing)	Yes
`ANTHROPIC_API_KEY`	check_citations (Claude)	Paid only
`OPENAI_API_KEY`	check_citations (ChatGPT)	Paid only
`GEMINI_API_KEY`	check_citations (Gemini)	Yes
`CITATION_CACHE_TTL_DAYS`	Cache TTL for citation_check entries (default 7)	n/a
`CITATION_AI_OVERVIEW_TTL_DAYS`	Cache TTL for ai_overview entries (default 1)	n/a
`CITATION_CONFIG_DIR`	Override config dir (default `~/.config/citation-intelligence`)	n/a

Example: am I cited?

You: For the queries "best AI citation tracker", "MCP for AI search", "self-hosted GEO tool",
     is automatelab.tech cited?

(agent invokes am_i_cited)

Result:
{
  "domain": "automatelab.tech",
  "engine": "perplexity",
  "results": [
    { "query": "best AI citation tracker",   "cited": true,  "rank": 4 },
    { "query": "MCP for AI search",          "cited": true,  "rank": 1 },
    { "query": "self-hosted GEO tool",       "cited": false, "matching_urls": [] }
  ],
  "summary": {
    "queries_total": 3,
    "queries_cited": 2,
    "citation_rate": 0.67,
    "average_rank": 2.5
  }
}

Example: predict citation likelihood (no key required)

You: How likely is https://example.com/blog/post to be cited by AI?

(agent invokes predict_citation)

Result:
{
  "url": "https://example.com/blog/post",
  "score": 62,
  "grade": "C",
  "signals": {
    "wikipedia_linked": false,
    "github_referenced": false,
    "reddit_referenced": true,
    "llms_txt_present": true,
    "https": true,
    "has_article_schema": true,
    "has_faq_schema": false,
    "has_breadcrumb_schema": true,
    "canonical_clean": true,
    "word_count": 1850,
    "reading_time_minutes": 8,
    "h2_count": 7,
    "h2_question_count": 1,
    "authority_link_count": 2,
    "external_link_count": 6,
    "internal_link_count": 11,
    "last_modified_days_ago": 42,
    "has_open_graph": true
  },
  "fixes": [
    { "signal": "has_faq_schema", "suggestion": "Page already has question-style H2s. Wrap them in FAQPage JSON-LD - high-leverage win.", "estimated_lift": "high" },
    { "signal": "h2_question_count", "suggestion": "Reframe at least 2 H2s as questions users actually ask...", "estimated_lift": "medium" }
  ]
}

The Wikipedia signal is measured (it correlates with citation) but no "go get a Wikipedia article" suggestion is emitted - the advice would be non-actionable. Scoring is split across six buckets - domain authority, structured data, content depth, link graph, freshness, metadata - so a thin page and a deep page on the same domain get meaningfully different scores.

Workflow recipes

Concrete patterns that compose the 12 tools into something useful. Costs assume ChatGPT or Perplexity at ~$0.01-0.03/query.

1. Weekly citation tracker

The single highest-ROI pattern. Pick 20-30 queries from your editorial backlog, snapshot weekly, watch the rate trend.

# One-time setup
track_queries name="editorial-watchlist" domain="example.com" action="save"
              queries=["best widget tutorial", "how to set up X", ...]

# Weekly cron (5 min, ~$0.20-0.60 per run)
run_panel name="editorial-watchlist"

# Anytime
citation_trend panel="editorial-watchlist"

citation_trend returns per-query deltas: which queries flipped from cited: false to cited: true since the first snapshot. That's your real editorial-impact metric.

2. Pre-publish gate

Before publishing a post, find out who owns the citation slot and whether the slot is worth competing for.

# 1. Is there an AI Overview to compete for?
ai_overview query="<target query>"

# 2. Who is cited today?
check_citations query="<target query>"

# 3. After publish + 14 days: did the post break in?
am_i_cited domain="example.com" queries=["<target query>"]

If check_citations returns 5+ strong incumbents on a low-volume query, pick a different angle. If ai_overview_present: false, the query has no AI surface - reconsider.

3. Bulk site audit

Catch site-wide structural issues across every page in one pass. Zero API spend.

audit_sitemap sitemap_url="https://example.com/sitemap.xml" limit=200

Returns worst_first sorted by citation-likelihood score. Surfaces missing schema, conflicting canonicals, missing /llms.txt, broken HTTPS.

4. Competitor signal gap

You're not cited; they are. Why?

# 1. Find the top-cited URLs for your target query
check_citations query="<query>"

# 2. Compare your URL to theirs signal-by-signal
compare_domains urls=[
  "https://example.com/your-post",
  "https://competitor-1.com/their-post",
  "https://competitor-2.com/their-post"
]

diverging_signals is the list of where you're losing. Usually obvious once you see it - they have FAQ schema, GitHub references, Wikipedia links - you don't.

5. Google-rank vs AI-citation gap

The closest editorial wins are queries where you already rank in Google's top 10 but are invisible to AI. Requires a GCP service account with webmasters.readonly scope.

gsc_citation_gap
  domain="example.com"
  queries=["...editorial watchlist..."]
  start_date="2026-04-01"
  end_date="2026-05-01"

closest_wins returns queries with position <= 10 and ai_cited: false, sorted by impressions desc. Push citation signals on those specific URLs first.

6. Wikipedia mention monitor

Wikipedia is the top-correlation signal but the advice "get on Wikipedia" is useless. So instead: watch when it happens organically.

wikipedia_mentions domain="example.com" limit=50

Returns Wikipedia article URLs that already link to the domain. Re-run quarterly; the diff is your "we got a Wikipedia citation" alert.

Schema.org

{
  "@context": "https://schema.org",
  "@type": "SoftwareApplication",
  "name": "Citation Intelligence MCP",
  "applicationCategory": "DeveloperApplication",
  "operatingSystem": "Cross-platform",
  "description": "Self-hosted MCP server for querying AI citation data from Perplexity, Claude, ChatGPT, Gemini, Bing, and Google AI Overviews.",
  "offers": { "@type": "Offer", "price": "0" },
  "url": "https://github.com/AutomateLab-tech/citation-intelligence"
}

Contributing

Bug reports, feature ideas, and PRs welcome. See CONTRIBUTING.md.

Security

Report a vulnerability via SECURITY.md.

License

MIT - see LICENSE.

Built by automatelab.tech

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/workflows		.github/workflows
src		src
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
glama.json		glama.json
package-lock.json		package-lock.json
package.json		package.json
server.json		server.json
tsconfig.json		tsconfig.json
tsconfig.test.json		tsconfig.test.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Citation Intelligence MCP

What this is

Who this is for

Why this exists

Tools

Prompts

Resources

Quick start

Claude Desktop

How it stays free

Privacy

Environment variables

Example: am I cited?

Example: predict citation likelihood (no key required)

Workflow recipes

1. Weekly citation tracker

2. Pre-publish gate

3. Bulk site audit

4. Competitor signal gap

5. Google-rank vs AI-citation gap

6. Wikipedia mention monitor

Schema.org

Contributing

Security

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Citation Intelligence MCP

What this is

Who this is for

Why this exists

Tools

Prompts

Resources

Quick start

Claude Desktop

How it stays free

Privacy

Environment variables

Example: am I cited?

Example: predict citation likelihood (no key required)

Workflow recipes

1. Weekly citation tracker

2. Pre-publish gate

3. Bulk site audit

4. Competitor signal gap

5. Google-rank vs AI-citation gap

6. Wikipedia mention monitor

Schema.org

Contributing

Security

License

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages